Understanding CSI architecture and communication

I’m trying to understand CSI architecture, using kubernetes-csi/csi-driver-host-path as a reference. In particular, I’m looking to see what would be required to implement something like asteven/local-zfs-provisioner with CSI (it currently uses external volume provisioning).

My question centres around how the daemons on each node would best communicate with CSI.

I can see that in the csi-driver-host-path deployment, there are separate statefulsets for csi-hostpath-provisioner, csi-hostpath-resizer etc, and these pods have an affinity to run where the csi-hostpath-plugin is running. I can also see they communicate using a unix domain socket stored in a hostPath volume. The socket to use is passed as the -csi-address flag to kubernetes-csi/external-provisioner etc, which in turn uses connection.Connect from kubernetes-csi/csi-lib-utils, which supports various URL schemes as per grpc/grpc/blob/master/doc/naming.md.

(Aside: why use a hostPath volume for the socket, rather than putting all these components in a single pod which communicate via an emptyDir? This isn’t really important though)

My main question is this. How would you go about changing this so that it could provision volumes on multiple nodes?

Clearly, the lowest level hostpath-plugin can run as a DaemonSet across all the nodes, but I don’t know what the recommended way of deploying the CSI components would be.

One approach would be to replicate the CSI components on every node too: every node gets a csi-hostpath-provisioner, a csi-hostpath-resizer etc. They can all communicate with the hostpath-plugin using the Unix domain socket on that node easily. However, they would all be watching the same PVCs in the API, so would have to race against each other to decide which one picks up a particular PVC. That doesn’t seem right.

The other way would be to have a single, cluster-wide instance of the CSI components. This seems to make more sense. But then, how would those best communicate with the hostpath-plugin on each node? Does the k8s API provide some channel for this? Should the hostpath-plugin on each node expose its grpc endpoint as a “service”? If so, is it responsible for securing/authenticating connections over that service? Can RBAC be used to lock down access to these services?

Since the existing containers like kubernetes-csi/external-provisioner can only talk to a single fixed endpoint, it seems to me that in any case there would need to be some “middleware” container which knows where all the DaemonSet pods/containers are and can talk to them for provisioning. It could, I guess, even just ‘exec’ commands directly inside them. But what’s the standard way of doing this?

I apologise if the answer is obvious to someone with a better overview of k8s architecture than me!

Thanks in advance,

Brian.

Hi candlerb,

(Aside: why use a hostPath volume for the socket, rather than putting all these components in a single pod which communicate via an emptyDir? This isn’t really important though)

Sorry, I don’t know this reason.

My main question is this. How would you go about changing this so that it could provision volumes on multiple nodes?

I don’t think this is possible.
GitHub - kubernetes-csi/csi-driver-host-path: A sample (non-production) CSI Driver that creates a local directory as a volume on a single node works as below:

I mean GitHub - kubernetes-csi/csi-driver-host-path: A sample (non-production) CSI Driver that creates a local directory as a volume on a single node is not intended to operate on multiple nodes.

I recommend reading other CSI drivers(e.g. GitHub - kubernetes-sigs/aws-ebs-csi-driver: CSI driver for Amazon EBS https://aws.amazon.com/ebs/).


And you should read Kubernetes CSI’s design proposal.
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md

CSI plugin works as below:

  • The application of Daemonset Pod of the image works on all nodes.
  • The application of StatefulSet/Deployment Pod of the image works as Kubernetes controller.
  • In the DaemonSet Pod, CSI Driver communicates from Kubelet using gRPC and a socket.
  • The node-driver-registrar registers a CSI Driver to Kubelet as a plugin.
  • In the StatefulSet/Deployment Pod, CSI Driver communicates from CSI sidecar applications using gRPC and a socket.
  • CSI sidecar applications work as a Kubernetes controller.
  • For example, if create a PVC resource, external-provisoner watch this event, and call CreateVolume rpc to CSI Driver.

So

Should the hostpath-plugin on each node expose its grpc endpoint as a “service”?

No, because CSI driver communicates using gRPC and a socket.

If so, is it responsible for securing/authenticating connections over that service?

No.

Can RBAC be used to lock down access to these services?

No, but you need setting RBAC for CSI sidecar applications to CSI sidecar applications can communicate to api-server.
(e.g. external-provisioner: https://github.com/kubernetes-csi/external-provisioner/blob/6019c43382549945cf7b80f1bffa826c6d14392c/deploy/kubernetes/rbac.yaml)

Thank you very much for your detailed reply. https://speakerdeck.com/bells17/kubernetes-and-csi?slide=48 was particularly useful. Things are slowly starting to fall into place, although it’s unfortunate I don’t speak Japanese!

The spec talks about a “CSI volume driver” as a single entity, but it’s clear now that it acts in both “controller” and “node” roles - as far as I can see, in both cases it’s just a gRPC server that listens for incoming requests.

If I understand correctly: the “controller” part responds to volume create/deletion requests (from the provisioning/resizing sidecar which monitors PVCs); and the “node” part responds to attach/detach requests (from the kubelet).

The fact that the provisioning and resizing takes place on a central controller reflects a SAN-like view of the world. Volume creation is a global operation, because a SAN volume can be attached from anywhere. Of course, this makes sense for things like EBS.

I wonder then what would be the correct way to write something like csi-driver-host-path but which works across multiple nodes. At volume creation time, it needs to choose a node to create the volume on (unless the PVC requested a specific node); and it would need to communicate with the node to create the volume, unless it deferred volume creation to attachment time.

I had a look at what local-zfs-provisioner does (which is an external provisioner[^1] rather than a CSI volume driver). As far as I can see, it runs a pod on the target node to perform the provisioning - then waits for it to finish, much like a “job”. If it were changed to listen for CSI requests, I think it could still work this way.

I recommend reading other CSI drivers(e.g. GitHub - kubernetes-sigs/aws-ebs-csi-driver: CSI driver for Amazon EBS https://aws.amazon.com/ebs/).

Thanks. That code makes the logical separation clearer (pkg/driver/controller.go, pkg/driver/node.go). In fact controllerService and nodeService are two different structs.

[^1] The whole reason I started down this path is because local-zfs-provisioner doesn’t support volume resize operations, and in turn this is because it uses sig-storage-lib-external-provisioner which doesn’t support resize.

It’s not that I need this functionality - it’s more that I want to understand how this all hangs together, and storage provisioning in k8s in general. So thank you for helping my understanding :slight_smile:

Things are slowly starting to fall into place, although it’s unfortunate I don’t speak Japanese!

Sorry, I describe your questions as much as I can in English :sweat:

and the “node” part responds to attach/detach requests (from the kubelet).

To be exact, “controller” part responds to attach/detach requests too(from GitHub - kubernetes-csi/external-attacher: Sidecar container that watches Kubernetes VolumeAttachment objects and triggers ControllerPublish/Unpublish against a CSI endpoint).
The main role of “node” part is reponding to mount/umount request(from kubelet).
(mount/umount request = Node(Un)stageVolume/Node(Un)publishVolume)

I wonder then what would be the correct way to write something like csi-driver-host-path but which works across multiple nodes. At volume creation time, it needs to choose a node to create the volume on (unless the PVC requested a specific node);

Kubernetes CSI supports the Topology feature.

If you want to select a node, you need to use the Topology feature.

I don’t know much local-zfs-provisioner, but maybe GitHub - topolvm/topolvm: Capacity-aware CSI plugin for Kubernetes is closed to you want to know.

The whole reason I started down this path is because local-zfs-provisioner doesn’t support volume resize operations, and in turn this is because it uses sig-storage-lib-external-provisioner which doesn’t support resize.

CSI Driver can inform what feature was supported using the capability to sidecar apps and kubelet(also whether support a resize feature).

And an application that requests to a resizes request to CSI Driver is GitHub - kubernetes-csi/external-resizer: Sidecar container that watches Kubernetes PersistentVolumeClaims objects and triggers controller side expansion operation against a CSI endpoint.
Maybe sig-storage-lib-external-provisioner is a library that only executes provisioning a volume(create/delete a volume).
sig-storage-lib-external-provisioner is being used from an external-provisioner.

It’s not that I need this functionality - it’s more that I want to understand how this all hangs together, and storage provisioning in k8s in general. So thank you for helping my understanding :slight_smile:

:smiley_cat: