File based data exchange between pods and daemon-set

beatrausch · July 15, 2021, 9:38am

Hi everyone,

I am searching for a solution to create a (persistent) volume on each node in the cluster where multiple pods can write into and a daemon set can read from.

My first idea was to use the hostPath volume. But after reading the documentation, it comes with some downsides, especially that containers must run as root when writing. After that I thought the local volume could solve my problem. But with local persistent volumes, the scheduler ensures that a pod using a local persistent volume is always scheduled to the same node. This pod to node pinning is not desired in my case.

Is there a way to disable the pod to node pinning while using local volumes? Or is there an even better way to achieve a file-based data exchange between pods and a daemonset.

Thx!

Cluster information:

Kubernetes version: 1.20
Cloud being used: azure
Installation method: aks

protosam · July 15, 2021, 2:57pm

Sounds like you want a ReadWriteMany volume. Check out the Access Modes section here.

There is rook.io, which can make ceph clusters. It’s complex, has a ton of options, and as such is not for the feint of heart.

You can use longhorn.io for cloud native storage. It’s very easy to deploy and can provide ReadWriteMany volumes, but it’s over NFS, so you still have the shortcomings that come with that.

There’s also the option to deploy an NFS server. Here’s an example of how to do that. The problem here though is that it doesn’t scale up. You can only have 1 instance and if it does, your app will be unresponsive until Kubernetes reschedule it.

I’m working on something related to solving this problem myself. I’m building a StatefulSet for gluster that provides high availability and access via the gluster client and nfs client. I’ll be pushing updates to this repo by the end of this week with a working StatefulSet. Right now I’m just using the repo to build the container I need for testing.

While exploring stateful applications, something I did to orchestrate cockroachdb was to make a sidecar to sync keys between nodes. Depending on what you’re doing, this could be a stop-gap for you.

If the data you want synced doesn’t actively mutate, there’s also the git sync sidecar.

beatrausch · July 16, 2021, 8:06am

Hi,

thx for the reply. But if possible, the data should be kept on the local node.

The basic use case is somehow similar to collecting log data. The goal is that apps/pods can write files to a directory on the local node and a daemon-set can read and process such files. If I would use a persistent volume backed by a remote storage it will probably lead to performance issues in case millions of pods writing to the same persistent volume.

br

protosam · July 16, 2021, 2:15pm

I see.

If you’re trying to work with persistent data and it has to be on one now, you’re pretty much stuck with hostpath or local volume. The pod will be limited to the affinity of that node. Kubernetes doesn’t have a way to move that around.

If you’re on Google Cloud, AWS, or Azure, you probably just need to schedule the right storage to get the IOPS you want.

Topic		Replies	Views
On premises k8s PV General Discussions development , network	1	486	November 22, 2023
Best pratices to manage pvc files? General Discussions	7	3009	February 16, 2023
Discussion about storage class and persistent volume General Discussions development	3	1191	August 29, 2022
K8s and Volumes with existing data General Discussions	1	365	July 2, 2023
Kubernetes.io Blog: Kubernetes 1.14: Local Persistent Volumes GA General Discussions k8s-blog	0	1019	April 4, 2019

File based data exchange between pods and daemon-set

Cluster information:

Related topics