mickt
December 28, 2023, 2:57pm
1
I’ve checked logs and described pods and the only thing of note is.
Pod sandbox changed, it will be killed and re-created.
I’m thinking that there may be a misconfiguration. I’ve shown versions first so maybe someone can confirm that they are ok.
Cluster information:
Kubernetes version: 1.29
Cloud being used: AWS
Installation method: terraform
Host OS: RHEL9
CNI and version: flannel v0.24.0
CRI and version: containerd 1.7.11
[ec2-user@vagrant-tf-master01 ~]$ rpm -qa | grep kube
kubernetes-cni-1.3.0-150500.1.1.x86_64
kubelet-1.29.0-150500.1.1.x86_64
kubectl-1.29.0-150500.1.1.x86_64
kubeadm-1.29.0-150500.1.1.x86_64
[ec2-user@vagrant-tf-master01 ~]$ containerd --version
containerd GitHub - containerd/containerd: An open and reliable container runtime v1.7.11 64b8a811b07ba6288238eefc14d898ee0b5b99ba
[ec2-user@vagrant-tf-master01 ~]$ runc --version
runc version 1.1.10
commit: v1.1.10-0-g18a0cb0f
spec: 1.0.2-dev
go: go1.20.10
libseccomp: 2.5.4
[ec2-user@vagrant-tf-master01 ~]$ sudo crictl version
Version: 0.1.0
RuntimeName: containerd
RuntimeVersion: v1.7.11
RuntimeApiVersion: v1
I am limited as to what I can add here. In summary, all is good immediately upon completion, i.e. nodes show “Ready” & CONTAINER-RUNTIME is ‘containerd’ and all pods are running. But pods start failing shortly after.
[ec2-user@vagrant-tf-master01 ~]$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-mns4f 1/1 Running 5 (108s ago) 7m10s 172.31.23.45 vagrant-tf-worker01
kube-flannel kube-flannel-ds-sj557 0/1 CrashLoopBackOff 5 (47s ago) 7m13s 172.31.24.102 vagrant-tf-master01
kube-system coredns-76f75df574-49j9q 1/1 Running 1 (65s ago) 7m13s 10.244.0.4 vagrant-tf-master01
kube-system coredns-76f75df574-j88n2 1/1 Running 1 (14s ago) 7m13s 10.244.0.5 vagrant-tf-master01
kube-system etcd-vagrant-tf-master01 1/1 Running 3 (4m2s ago) 6m37s 172.31.24.102 vagrant-tf-master01
kube-system kube-apiserver-vagrant-tf-master01 1/1 Running 2 (5m30s ago) 7m45s 172.31.24.102 vagrant-tf-master01
kube-system kube-controller-manager-vagrant-tf-master01 0/1 CrashLoopBackOff 4 (18s ago) 6m39s 172.31.24.102 vagrant-tf-master01
kube-system kube-proxy-gssp7 0/1 CrashLoopBackOff 4 (28s ago) 7m10s 172.31.23.45 vagrant-tf-worker01
kube-system kube-proxy-sw58j 0/1 CrashLoopBackOff 4 (49s ago) 7m13s 172.31.24.102 vagrant-tf-master01
kube-system kube-scheduler-vagrant-tf-master01 0/1 CrashLoopBackOff 4 (3s ago) 6m35s 172.31.24.102 vagrant-tf-master01
Hi, mickt
I think you should also pay attention to the following
check Liveness AND Readiness
#kubectl describe pod | grep -e ‘Image:’ -e ‘Container ID:’ -e ‘Image ID:’
#kubectl get nodes -o jsonpath=‘{range .items[*]}{.metadata.name}{“\t”}{.status.nodeInfo.architecture}{“\n”}{end}’
#kubectl top pods
mickt
January 2, 2024, 8:56pm
3
Hi jamallmahmoudi
Thank you for your suggestion.
Unfortunately the commands that run for me obviously only do so when system is available. Pods are crashing before I can run some. The metrics-server dosn’t progress beyond pending state.
[ec2-user@vagrant-tf-master01 ~]$ kubectl get componentstatuses
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy ok
I can’t seem to add anything else as I hit the follwoing error.
An error occurred: Sorry, new users can only put 5 links in a post.
you can check
kubectl describe pod/kube-flannel-ds-sj557
and
kubectl logs pod/kube-flannel-ds-sj557
with the correct name of a failed pod of course
mickt
January 7, 2024, 6:34pm
5
Describe and logs show little that assits. The only anomoly is ‘Pod sandbox changed, it will be killed and re-created.’
according to stackoverflow maybe memory and/or cpu related.
you can try to increase limits and requests
mickt
January 7, 2024, 8:41pm
7
I think they should be ok as I’m using t3.large instance type.
fox-md
January 10, 2024, 1:07pm
8
Hi,
Can you please share your containerd config file?
mickt
January 10, 2024, 2:11pm
9
If you mean ‘/etc/containerd/config.toml’ then it does not exist; there is no ‘/etc/containerd’ directory.
Here is containerd installation.
wget https://github.com/containerd/containerd/releases/download/v1.7.11/containerd-1.7.11-linux-amd64.tar.gz
sudo tar xzvf containerd-1.7.11-linux-amd64.tar.gz -C /usr/local/
rm -f containerd-1.7.11-linux-amd64.tar.gz
sudo wget https://raw.githubusercontent.com/containerd/containerd/main/containerd.service -P /usr/lib/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now containerd
fox-md
January 11, 2024, 1:59pm
10
I see. Try using configuration advise from the official guide:
mickt
January 11, 2024, 8:42pm
11
Thank you for that! Don’t know how I missed it. (◔_◔) I’ll add and see how it goes.
mickt
February 13, 2024, 2:25pm
12
Sorry for delay in updating, I’ve been working on something else. Resolved by executing the following.
sudo mkdir /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
It then requires that the following be run (Ref. Installing and Configuring containerd as a Kubernetes Container Runtime - Anthony Nocentino's Blog ).
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
[ec2-user@k8s-tf-master01 ~]$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-cz2gh 1/1 Running 0 3m22s 172.31.37.161 k8s-tf-worker01 <none> <none>
kube-flannel kube-flannel-ds-zdmkf 1/1 Running 0 3m27s 172.31.39.87 k8s-tf-master01 <none> <none>
kube-system coredns-76f75df574-dj8hm 1/1 Running 0 3m27s 10.244.0.3 k8s-tf-master01 <none> <none>
kube-system coredns-76f75df574-tclrx 1/1 Running 0 3m27s 10.244.0.2 k8s-tf-master01 <none> <none>
kube-system etcd-k8s-tf-master01 1/1 Running 0 3m39s 172.31.39.87 k8s-tf-master01 <none> <none>
kube-system kube-apiserver-k8s-tf-master01 1/1 Running 0 3m39s 172.31.39.87 k8s-tf-master01 <none> <none>
kube-system kube-controller-manager-k8s-tf-master01 1/1 Running 0 3m43s 172.31.39.87 k8s-tf-master01 <none> <none>
kube-system kube-proxy-cs26g 1/1 Running 0 3m22s 172.31.37.161 k8s-tf-worker01 <none> <none>
kube-system kube-proxy-nr5xb 1/1 Running 0 3m27s 172.31.39.87 k8s-tf-master01 <none> <none>
kube-system kube-scheduler-k8s-tf-master01 1/1 Running 0 3m43s 172.31.39.87 k8s-tf-master01 <none> <none>