I’ve just had a problem with a Kubernetes node on GKE. The pods that were scheduled on the node were killed and failed to start. When I checked I see such error below:
Normal Killing <invalid> (x130 over <invalid>) kubelet, gke-pool-01-91602ba2-rdpc Killing container with id docker://portus-postgresql:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff <invalid> (x1532 over <invalid>) kubelet, gke-pool-01-91602ba2-rdpc Back-off restarting failed container
Warning Unhealthy <invalid> (x2330 over <invalid>) kubelet, gke-pool-01-91602ba2-rdpc (combined from similar events): Readiness probe failed: OCI runtime exec failed: write /tmp/runc-process647055078: no space left on device: unknown
The error clearly was OCI runtime exec failed: write /tmp/runc-process647055078: no space left on device: unknown
.
This looks like an issue with the node rather than with a docker container or docker volume. I then tried to SSH to the K8s node that was hosting the pod, but the server wouldn’t respond. I couldn’t ssh.
But interestingly it is still considered “Ready” by K8s:
gke-pool-01-91602ba2 Ready <none> 13d v1.13.5-gke.10
I had to cordon the node and drain it so the pods rescheduled to other nodes.
But I’m afraid this can happen again, as the there was no way to figure out the reason.
Anybody seeing same problem?
Cluster information:
Kubernetes version: 1.13.5-gke.10
Cloud being used: GKE
Installation method:
Host OS: Container-Optimized OS (cos)
CNI and version: Not sure
CRI and version: Not sure