I have a need to do leader election in my application. I am aware that there is a leader election utility already implemented in client-go .
I am thinking of implementing following algorithm that I believe would be simpler, wouldn’t depend on clocks on nodes running pods and will detect leader failure very quickly without waiting for a “leaseExpiry”. I have to admit that I am not an expert on kubernetes internal management of resources, so my understanding could be wrong. So, I would like to know if following algorithm makes sense.
Let say there are 3 pods P1, P2 and P3 among which the leader is to be elected. User provides a “key”(e.g. “test-leader-election” ) that all pods know about upfront. (For simplicity, assume there is only one app process/container in the pod).
High level idea is that each pod will try to create a configMap named “test-leader-election” . Pod will put its own identity in a known annotation in the configMap. Pod will also set the metadata.ownerReferences field in the configMap so that GC always deletes the configMap if owner pod is deleted/disappeared.
whichever candidate pod manages to create the configMap becomes the leader and others place a watch on that configMap (using field selector on name). If any non-leader pod notices disappearance of the configMap, it tries to create the configMap to become leader.
I “think” above algorithm would provide “fencing” as well. (Assuming k8s can guarantee that above configMap can only disappear if the pod who created it disappears. But I have “heard” about corner cases where pods might become zombies still running the process and k8s not knowing about those.)
What do you guys think?