Persisting a (database) volume in GCP/GKE

Hello,

Not sure if this is off-topic because it depends on underlaying cloud provider but basically I want to come up with the yaml manifest file(s) to:

  1. Create a new volume to attach to a database container
  2. Use a pre-existing volume from 1) to attach to the database container

The idea is to run the first file the first time I create a cluster and then to have the option to run the second one once the cluster is created and there’s data I want to keep in that db volume, in order to survive db node going down. I want to be able to go all-out chaos monkey on my nodes and not lose any data.

The first part is easy enough, in GKE I did:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: postgres-volume-claim
spec:
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 200Gi

and added that persistentVolumeClaim to my db container.

GKE adds for you the PersistentVolume.

Now, following https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/preexisting-pd , in order to do the second manifest, I need to add a PersistentVolume with the ID of the existing disk:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-volume
spec:
  storageClassName: ""
  capacity:
    storage: 200G
  accessModes:
    - ReadWriteOnce
  gcePersistentDisk:
    pdName: gke-my-cluster-id_string_here
    fsType: ext4

and in my PersistentVolumeClaim I need to add:

  storageClassName: ""
  volumeName: postgres-volume

Now, one I have my cluster and my data volume, how do I apply this new file?

It cannot be applied directly because PersistentVolumeClaim cannot be changed (immutable error), also it cannot be deleted until the pod using it is gone (kubernetes.io/pvc-protection), so I would have to delete the db pod first, which is a Deployment and will be immediately Replica-ted.

I was able to make this work manually after a lot of trial and error, but the question is, what’s the “standard” / best practice way of doing this?

I’ve looked at KubeDB but seems still immature and I want to avoid to add more layers of magic.

You should be able to do part1 with initContainers.

So, there are a few ways to do this but why don’t you just create the PV/PVC as a part of your database deploy? As long as you have your reclaim policy set correctly, you will never lose your disk. Just do a helm install stable/mysql to look at an example of how this works.

That said, you should always have backup solutions that move your data out of the PV (GPD really) so that accidental or malicious data loss doesn’t happen.