I’m trying to offer a all-in-one solution to provide a on-premise kubernetes cluster for my application. The goal is that customers just provide a few linux virtual machine and no other external resources required. So I opted for microk8s and it works well and allow for multiple nodes HA. Using the NFS addon it creates a storageClass that is accessible from all nodes and store the data on the nodes themselves with no outside storage service required.
I then created a PVC resource and managed to mount it in my different pods as ReadWriteMany so all my services can share the data if needed, and this too seems to work well.
Now, this is fine for data that is fully managed by the services themselves, but I need to offer a way for my application administrators to modify that data that is served by the services, and I wouldn’t want them to have to ssh into the nodes data-storage folders directly.
My first idea was to host a web service to serve them. Using a busybox docker image and httpd, I managed to offer a web service that you can use to see and download them. That’s great but it’s read only.
So I tried to do the same concept by installing an sftp service. I’m still unable to get it working properly because the volumeMount appear as owned by “root” and the sftp service requires an user to connect, so I get permission denied errors when uploading.
I’ll probably manage to get it working playing with security contexts, but doing all of this makes me believe I’m probably going completely out of bound of the normal standardized practices since I see nobody else online doing something similar.
So, how are you usually managing the data for your services?
Nobody is really pursuing ReadWriteMany volumes these days. People have had tons of success adopting object storage using minio or cloud prover object storage solutions. Most commonly, people are just trying to solve for consistently maintaining their volumes and attaching them between nodes.
The first option that I find easy is longhorn.io. It uses sparse files for replicated block storage. It also provides RWX via an nfs workload, but your IO performance just suffers with scale. Also the nfs workload is a single point of falure.
Another option might be to use openebs. This storage provider has a few different configuration modes and it also provides RWX via an nfs workload with the same issues as longhorn’s nfs workload.
A more robust solution might be rook.io. Cephfs is reliable and the rook-io provider includes a CNI to support it. It doesn’t have the same IO profile as the prior two solutions and it doesn’t have the single point of failure problem. However it does have a complexity problem and has a high barrier to entry resource-wise. I recommend running a cephfs cluster outside of kubernetes and just using rook.io to consume that cluster, intead of trying to leverage the operator. Managing a ceph cluster comes with less effort than the operator unfortunately.
Kinda tells me I’m still far from understanding the technology stack, I unfortunately don’t even know what some of the words you used mean, and why I would need them. Like, I get that those are big systems (just installing longhorn rolls out 23 pods?!?) that offer an abstraction layer over multiple storage tech, and also offers some kind of replication redundancy protection and management tools. That’s all great, I want a on-premise solution just as well as a AWS one, so that will help. But reading the pages of these options I can’t even mentally visualize for my on-premise solution where the data is physically stored, like where are the disks under all that. Can it just use the underlying filesystem the kubernetes nodes runs on or do you need some special hardware?
At the end of the day, I just want the easiest solution to deploy using pure software, and a way to view and edit the storage content, and that this storage can be shared between my pods.
Any way, I guess I’ll start installing longhorn and see where that leads me.
When you’re running a cluster on a cloud provider, you normally want to just install their CSI. It will handle their volume solutions for you.
You might not need Kubernetes for what you want. This situation is often fine and can scale reasonably for most situations:
cloud servers that run copies of the web application (the application might be easier to update/maintain as a container image running on the cloud servers)
a cloud load balancer to route traffic to the web application
2 cloud servers running the database software configured for passive failover
object storage for any non-ephemeral data
maybe a redis cluster if you want to share data between application nodes
Without getting into some of the architecture choices that can be made with it, Kubernetes is probably great if you need to orchestrate multiple workloads like the above mentioned at scale.
Regarding this question:
Can it just use the underlying filesystem the kubernetes nodes runs on or do you need some special hardware?
I don’t want to turn this discussion into specifics on my architecture. I’m already well advanced on it and it’s too late to go back. Moving from native windows processes to linux dockers running on kubernetes is a big step, but a profitable one. Using the deployments for zero downtime updates and helm charts to manage values via configmap will make everything much more manageable. I already have pretty much everything working. I have 13 dotnet micro-services (only one is a web server) and most of them scale to multiple instances, a rabbitmq cluster (for events), a redis cluster (for quick data sharing) and now a prometheus/grafana service for metrics and alerts, and the ingress for route mapping is a godsend. I also have a sql database but it is external (aws rds, or on-prem ms sql) so I don’t have to worry about it from my cluster point of view.
I need to be capable to deploy it both offline on basic vm for certain customers as well as a cloud provider one for our main server. I wanted to get everything running on my test dev vm on site before moving on the cloud provider version, hopefully most of the configuration will remain compatible.
As you can see, the part I ignored until now was storage for files that is more than just simple data that can by shared via redis. I need services to read and expose some device firmware binaries, some python scripts that could be editable externally, some mp3/gif/xlsx files that are created by the services. And some of these files need to be both writable by some services and readable by others, the best example is the service generating reports (.xlsx) should store it someone and then the web service should be capable of exposing it. That’s why I wanted to keep it simple and have one big read/write volume that all pods could access.
microk8s basic plugins (hostpath and nfs) don’t seem efficient enough and error prone. The NFS one actually stored everything on just one node and filled its disk while the others had lot of space available. So a solution like longhorn/openebs that spread it around look great.
I installed longhorn and… everything crashed. Nodes went offline.
I believe that was because the previously motioned node was already full and just the new docker images filled the disk. I see events about After about 30min, it actually managed to repair itself. And now I had the longhorn-driver-deployer pod crashing, but managed to fix it by adding a values file for the csi.kubeletRootDir variable to point to /var/snap/microk8s/common/var/lib/kubelet as explained here.
I’ll try to create a volume via the longhorn web interface and point my service to use it and then delete the microk8s plugin. I’ll see how it goes.
Yes I have 3 nodes, rabbitmq and redis also requires this for quorum if you want HA.
The part that confuse me is that I actually doesn’t quite grasp the difference between object storage and block storage. From my simple software developer point of view, I only know about files and drives. My services use file open and file write operations.
I’m not sure if there is some other meaning for the phrase object storage, but my take on it is that object storage is used when referring to an API that can service arbitrary data given a key path. Minio and Riak are selfhosted services capable of providing S3 compatible object storage. The data you put over HTTP is arbitrary, the API doesn’t care if it conforms to any file format. It’s also not a file system.
Block storage is the raw block device that a file system lives on top of. Most CNIs will provide you with a file system that it already allocated for you, sometimes it does so overtop of a block device that it will automatically manage for you. There are a few cloud provider CNIs that wont present raw block storage at all.