Created 4 node cluster, use to scale down and scale up where recently found that pod/kube-dns and metrics-server went into CrashLoopBackOff state. And have 4 node cluster but only 3 kube-proxy are available.
Kindly help, what could be the issue?
Without any other info that is very hard to make any suggestions. What OS are you using? What Kubernetes version are you running? How was the cluster provisioned? In these scenarios the more info you can provide the easier it will be to troubleshoot and figure out a possible solution.
Container-Optimized OS (cos), 1.11.6-gke.2 version and n1-standard-1 (1 vCPU, 3.75 GB memory) with 100GB boot disk.
Maybe you can do a kubectl describe pod to see what is wrong?
But GKE is managed, I think you don’t have much control (but never used it, so I might be wrong). That shouldn’t happen and I think Google (who is managing it) should fix it. Have you tried opening a support ticket?
Here are events of pods.
kube-dsn
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 55m (x1220 over 39h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 45m (x9671 over 39h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Back-off restarting failed container
Normal Killing 25m (x490 over 39h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Killing container with id docker://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff 10m (x6711 over 39h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Back-off restarting failed container
Warning BackOff 5m3s (x6286 over 39h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Back-off restarting failed container
metrics-server
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 26m (x413 over 40h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Container image "k8s.gcr.io/metrics-server-amd64:v0.2.1" already present on machine
Warning BackOff 97s (x9485 over 40h) kubelet, gke-clustertest-mo-mi-default-pool-9cc9e0c8-9rk1 Back-off restarting failed container
Unless those containers are yours, and not system, I think you should open a ticket to GKE. I really doubt if there is something you can do about it