Many restarts of controll-manager & scheduler

Hi all,

I’ve installed an high availability kubernetes cluster and verified the configuration with sonobuoy successfully:

sonobuoy results $results

Plugin: e2e
**Status: passed**
Total: 5232
Passed: 303
Failed: 0
Skipped: 4929

Plugin: systemd-logs
**Status: passed**
Total: 6
Passed: 6
Failed: 0
Skipped: 0

But my cluster has many restarts of all 3 controll-managers and all 3 schedulers (about 50 in 2 days), and I don’t know why.

I didnt’t find anything in the logs.

Could you please help me?

Many thanks for your feedback.

Regards,
Marco

the last message I got from kube-controller and kube-scheduler before dying:

leaderelection lost

After I’ve added the following two lines to the kube-controller and kube-scheduler YAML manifest (and reboot of the master nodes), I have no more restarts of these two components:

--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s

I didn’t worked, I had a restart of kube-scheduler and controller-manager.
The last logs before restarting:

E1101 05:23:37.499241       1 leaderelection.go:357] Failed to update lock: etcdserver: request timed out
E1101 05:24:16.459824       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E1101 05:24:28.545087       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E1101 05:25:52.746054       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E1101 05:26:38.562530       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E1101 05:27:36.189180       1 leaderelection.go:357] Failed to update lock: etcdserver: request timed out
E1101 05:27:52.873997       1 leaderelection.go:357] Failed to update lock: resource name may not be empty
E1101 05:28:00.415547       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E1101 05:28:09.177483       1 leaderelection.go:357] Failed to update lock: resource name may not be empty
I1101 05:28:09.177583       1 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F1101 05:28:09.177622       1 server.go:199] leaderelection lost

Any suggestions?