Leader election utility

Himanshu_Gupta · June 6, 2019, 2:09pm

I have a need to do leader election in my application. I am aware that there is a leader election utility already implemented in client-go .

I am thinking of implementing following algorithm that I believe would be simpler, wouldn’t depend on clocks on nodes running pods and will detect leader failure very quickly without waiting for a “leaseExpiry”. I have to admit that I am not an expert on kubernetes internal management of resources, so my understanding could be wrong. So, I would like to know if following algorithm makes sense.

Let say there are 3 pods P1, P2 and P3 among which the leader is to be elected. User provides a “key”(e.g. “test-leader-election” ) that all pods know about upfront. (For simplicity, assume there is only one app process/container in the pod).
High level idea is that each pod will try to create a configMap named “test-leader-election” . Pod will put its own identity in a known annotation in the configMap. Pod will also set the metadata.ownerReferences field in the configMap so that GC always deletes the configMap if owner pod is deleted/disappeared.
whichever candidate pod manages to create the configMap becomes the leader and others place a watch on that configMap (using field selector on name). If any non-leader pod notices disappearance of the configMap, it tries to create the configMap to become leader.

I “think” above algorithm would provide “fencing” as well. (Assuming k8s can guarantee that above configMap can only disappear if the pod who created it disappears. But I have “heard” about corner cases where pods might become zombies still running the process and k8s not knowing about those.)

What do you guys think?

rata · June 7, 2019, 10:49pm

Instead of the leaseExpiry you have to wait for the GC doing it’s sweep, right?

Not sure how the GC is implemented, but wouldn’t you be exchanging a configuration (like the leaseExpiry) that you can tune as you want for the kubernetes GC? Probably changes to the GC behavior are global and you may loose flexibility (if that matters to your use case).

Am I missing something?

Sorry if I’m saying something obvious or, even worse obviously wrong, I have not used this before and don’t know the client-go implementation

Himanshu_Gupta · June 8, 2019, 5:34am

current leaseExpiry based implementation will notice a leader going away only after whole leaseExpiry has passed which introduces delays before next leader can be elected. You can’t configure leaseExpiry to be too low or else you risk “thinking” that leader is down where it actually isn’t . Also this approach depends on clocks on pretty much all the nodes because pods could be running anywhere.

AFAIK GC of configMap on deletion of pod would be faster than the leaseExpiry you can afford to set.

Topic		Replies	Views
Kubernetes Podcast from Google: Leader Election, with Mike Danese General Discussions podcast	0	846	September 29, 2020
Leaderelections failing, lease unable to be renewed automatically General Discussions	0	1630	January 17, 2023
Trying to understand Lease Lock General Discussions	13	703	March 11, 2025
Leadership General Discussions	0	316	March 6, 2023
Kube Controller Manager is restarting due to leader election General Discussions	0	802	April 7, 2024

Leader election utility

Related topics