User namespaces and persistent volumes

:wave: I have some questions around stateful pods and user namespaces.

The blog post announcing support for stateful pods notes that stateful volumes are unblocked, but I’m not fully understanding how persistent volumes will work in a reasonable way.

Digging through the KEP history, this comment is the closest to my concern:

  • You create a pod, is using mapping A. This pod uses a volume with this PVC mode and writes some files
  • This pod is destroyed and a new one is created and attached that very same volume. This pod uses mapping B, non overlap with A
  • This pod can’t read the files, as they were created using a different effective UID/GID

KEP-127: add support for stateful pods by giuseppe · Pull Request #4084 · kubernetes/enhancements · GitHub removed notes of stateless vs stateful distinctions, but I don’t understand how idmapped mounts solve the problem of retained PVCs with writes: kubelet can choose different host-side mappings for the userns and volume across different pods attaching the volume. The files in the volume will then by owned by nobody inside the container, and IO on the PVC is not possible.

It’s possible I’m missing something crucial about the implementation here, happy to hear it if so! Otherwise: are there any plans for how to deal with persistent volumes?

Hey, author of userns in kubernetes here.

The trick is using idmaps in a way that a write to a file stores the same UID as if you were not using a userns. Then, each pod gets a mount that “reverts” their mapping, and all pods just write as they UID they are inside the container. So they can share volumes.

I explained exactly this in detail in this blog post I did: User Namespaces in Kubernetes, Part II: Mappings and File Ownership | Rat against the machine

With examples of creating userns and idmap mounts in the console (just using unshare and Mount from the terminal, no containers)

Feel free to ask any questions!