Hi all,
I’ve installed an high availability kubernetes cluster and verified the configuration with sonobuoy successfully:
sonobuoy results $results
Plugin: e2e
**Status: passed**
Total: 5232
Passed: 303
Failed: 0
Skipped: 4929
Plugin: systemd-logs
**Status: passed**
Total: 6
Passed: 6
Failed: 0
Skipped: 0
But my cluster has many restarts of all 3 controll-managers and all 3 schedulers (about 50 in 2 days), and I don’t know why.
I didnt’t find anything in the logs.
Could you please help me?
Many thanks for your feedback.
Regards,
Marco
the last message I got from kube-controller and kube-scheduler before dying:
leaderelection lost
After I’ve added the following two lines to the kube-controller and kube-scheduler YAML manifest (and reboot of the master nodes), I have no more restarts of these two components:
--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s
I didn’t worked, I had a restart of kube-scheduler and controller-manager.
The last logs before restarting:
E1101 05:23:37.499241 1 leaderelection.go:357] Failed to update lock: etcdserver: request timed out
E1101 05:24:16.459824 1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E1101 05:24:28.545087 1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: request timed out
E1101 05:25:52.746054 1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E1101 05:26:38.562530 1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E1101 05:27:36.189180 1 leaderelection.go:357] Failed to update lock: etcdserver: request timed out
E1101 05:27:52.873997 1 leaderelection.go:357] Failed to update lock: resource name may not be empty
E1101 05:28:00.415547 1 leaderelection.go:321] error retrieving resource lock kube-system/kube-scheduler: etcdserver: leader changed
E1101 05:28:09.177483 1 leaderelection.go:357] Failed to update lock: resource name may not be empty
I1101 05:28:09.177583 1 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F1101 05:28:09.177622 1 server.go:199] leaderelection lost
Any suggestions?