Why do my pods keep crashing?

mickt · December 28, 2023, 2:57pm

I’ve checked logs and described pods and the only thing of note is.

Pod sandbox changed, it will be killed and re-created.

I’m thinking that there may be a misconfiguration. I’ve shown versions first so maybe someone can confirm that they are ok.

Cluster information:

Kubernetes version: 1.29
Cloud being used: AWS
Installation method: terraform
Host OS: RHEL9
CNI and version: flannel v0.24.0
CRI and version: containerd 1.7.11

[ec2-user@vagrant-tf-master01 ~]$ rpm -qa | grep kube
kubernetes-cni-1.3.0-150500.1.1.x86_64
kubelet-1.29.0-150500.1.1.x86_64
kubectl-1.29.0-150500.1.1.x86_64
kubeadm-1.29.0-150500.1.1.x86_64

[ec2-user@vagrant-tf-master01 ~]$ containerd --version
containerd GitHub - containerd/containerd: An open and reliable container runtime v1.7.11 64b8a811b07ba6288238eefc14d898ee0b5b99ba

[ec2-user@vagrant-tf-master01 ~]$ runc --version
runc version 1.1.10
commit: v1.1.10-0-g18a0cb0f
spec: 1.0.2-dev
go: go1.20.10
libseccomp: 2.5.4

[ec2-user@vagrant-tf-master01 ~]$ sudo crictl version
Version: 0.1.0
RuntimeName: containerd
RuntimeVersion: v1.7.11
RuntimeApiVersion: v1

I am limited as to what I can add here. In summary, all is good immediately upon completion, i.e. nodes show “Ready” & CONTAINER-RUNTIME is ‘containerd’ and all pods are running. But pods start failing shortly after.

[ec2-user@vagrant-tf-master01 ~]$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-mns4f 1/1 Running 5 (108s ago) 7m10s 172.31.23.45 vagrant-tf-worker01
kube-flannel kube-flannel-ds-sj557 0/1 CrashLoopBackOff 5 (47s ago) 7m13s 172.31.24.102 vagrant-tf-master01
kube-system coredns-76f75df574-49j9q 1/1 Running 1 (65s ago) 7m13s 10.244.0.4 vagrant-tf-master01
kube-system coredns-76f75df574-j88n2 1/1 Running 1 (14s ago) 7m13s 10.244.0.5 vagrant-tf-master01
kube-system etcd-vagrant-tf-master01 1/1 Running 3 (4m2s ago) 6m37s 172.31.24.102 vagrant-tf-master01
kube-system kube-apiserver-vagrant-tf-master01 1/1 Running 2 (5m30s ago) 7m45s 172.31.24.102 vagrant-tf-master01
kube-system kube-controller-manager-vagrant-tf-master01 0/1 CrashLoopBackOff 4 (18s ago) 6m39s 172.31.24.102 vagrant-tf-master01
kube-system kube-proxy-gssp7 0/1 CrashLoopBackOff 4 (28s ago) 7m10s 172.31.23.45 vagrant-tf-worker01
kube-system kube-proxy-sw58j 0/1 CrashLoopBackOff 4 (49s ago) 7m13s 172.31.24.102 vagrant-tf-master01
kube-system kube-scheduler-vagrant-tf-master01 0/1 CrashLoopBackOff 4 (3s ago) 6m35s 172.31.24.102 vagrant-tf-master01

jamallmahmoudi · January 1, 2024, 12:05pm

Hi, mickt
I think you should also pay attention to the following

check Liveness AND Readiness
#kubectl describe pod | grep -e ‘Image:’ -e ‘Container ID:’ -e ‘Image ID:’
#kubectl get nodes -o jsonpath=‘{range .items[*]}{.metadata.name}{“\t”}{.status.nodeInfo.architecture}{“\n”}{end}’
#kubectl top pods

mickt · January 2, 2024, 8:56pm

Hi jamallmahmoudi

Thank you for your suggestion.

Unfortunately the commands that run for me obviously only do so when system is available. Pods are crashing before I can run some. The metrics-server dosn’t progress beyond pending state.

[ec2-user@vagrant-tf-master01 ~]$ kubectl get componentstatuses
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy ok

I can’t seem to add anything else as I hit the follwoing error.

An error occurred: Sorry, new users can only put 5 links in a post.

AlBundy · January 7, 2024, 6:14pm

you can check
kubectl describe pod/kube-flannel-ds-sj557
and
kubectl logs pod/kube-flannel-ds-sj557

with the correct name of a failed pod of course

mickt · January 7, 2024, 6:34pm

Describe and logs show little that assits. The only anomoly is ‘Pod sandbox changed, it will be killed and re-created.’

AlBundy · January 7, 2024, 8:23pm

according to stackoverflow maybe memory and/or cpu related.
you can try to increase limits and requests

mickt · January 7, 2024, 8:41pm

I think they should be ok as I’m using t3.large instance type.

fox-md · January 10, 2024, 1:07pm

Hi,
Can you please share your containerd config file?

mickt · January 10, 2024, 2:11pm

If you mean ‘/etc/containerd/config.toml’ then it does not exist; there is no ‘/etc/containerd’ directory.
Here is containerd installation.

wget https://github.com/containerd/containerd/releases/download/v1.7.11/containerd-1.7.11-linux-amd64.tar.gz
sudo tar xzvf containerd-1.7.11-linux-amd64.tar.gz -C /usr/local/
rm -f containerd-1.7.11-linux-amd64.tar.gz
sudo wget https://raw.githubusercontent.com/containerd/containerd/main/containerd.service -P /usr/lib/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now containerd

fox-md · January 11, 2024, 1:59pm

I see. Try using configuration advise from the official guide:

mickt · January 11, 2024, 8:42pm

Thank you for that! Don’t know how I missed it. (◔_◔) I’ll add and see how it goes.

mickt · February 13, 2024, 2:25pm

Sorry for delay in updating, I’ve been working on something else. Resolved by executing the following.

sudo mkdir /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

It then requires that the following be run (Ref. Installing and Configuring containerd as a Kubernetes Container Runtime - Anthony Nocentino's Blog).

sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

[ec2-user@k8s-tf-master01 ~]$ kubectl get pods --all-namespaces -o wide

NAMESPACE      NAME                                      READY   STATUS    RESTARTS   AGE     IP              NODE              NOMINATED NODE   READINESS GATES
kube-flannel   kube-flannel-ds-cz2gh                     1/1     Running   0          3m22s   172.31.37.161   k8s-tf-worker01   <none>           <none>
kube-flannel   kube-flannel-ds-zdmkf                     1/1     Running   0          3m27s   172.31.39.87    k8s-tf-master01   <none>           <none>
kube-system    coredns-76f75df574-dj8hm                  1/1     Running   0          3m27s   10.244.0.3      k8s-tf-master01   <none>           <none>
kube-system    coredns-76f75df574-tclrx                  1/1     Running   0          3m27s   10.244.0.2      k8s-tf-master01   <none>           <none>
kube-system    etcd-k8s-tf-master01                      1/1     Running   0          3m39s   172.31.39.87    k8s-tf-master01   <none>           <none>
kube-system    kube-apiserver-k8s-tf-master01            1/1     Running   0          3m39s   172.31.39.87    k8s-tf-master01   <none>           <none>
kube-system    kube-controller-manager-k8s-tf-master01   1/1     Running   0          3m43s   172.31.39.87    k8s-tf-master01   <none>           <none>
kube-system    kube-proxy-cs26g                          1/1     Running   0          3m22s   172.31.37.161   k8s-tf-worker01   <none>           <none>
kube-system    kube-proxy-nr5xb                          1/1     Running   0          3m27s   172.31.39.87    k8s-tf-master01   <none>           <none>
kube-system    kube-scheduler-k8s-tf-master01            1/1     Running   0          3m43s   172.31.39.87    k8s-tf-master01   <none>           <none>

Topic		Replies	Views
Failed to create pod sandbox General Discussions	0	787	January 1, 2020
Pods are going into CrashLoopBackOff state restarted due to SandboxChanged General Discussions	1	5476	November 10, 2022
Kubeadm init: Everything crash after several CrashLoopBackOff General Discussions	8	11438	March 3, 2023
Unable to connect to pod General Discussions k8s-blog	25	23875	March 27, 2019
When initializing the control-plane node with kubeadm, the API Server keeps restarting repeatedly General Discussions development	1	304	May 18, 2025

Why do my pods keep crashing?

Cluster information:

Related topics