Cluster information:
Kubernetes version: 1.28.1 / 1.28.8
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: OpenEuler
CNI and version: NO
CRI and version: containerd 1.7.13
CPU architecture:ARM64
Linux Kernel: 5.10.0-193.0.0.106.oe2203sp3.aarch64
Issue
I initialized a cluster using the kubeadm tool on a machine with ARM64 architecture running the OpenEuler system. I used the following command:
kubeadm init --apiserver-advertise-address=10.254.178.108 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.28.1 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all
I received the following output:
................................
..............................
.............................
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
I checked the container information in containerd:
#crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
af85b843ace3e 8b6e1980b7584 3 seconds ago Exited kube-controller-manager 14 ec2c4fca93244 kube-controller-manager-host-10-254-178-108
eded802f3b70e 9cdd6470f48c8 3 seconds ago Exited etcd 14 f6a60416f180b etcd-host-10-254-178-108
2e53086877372 b4a5a57e99492 5 seconds ago Running kube-scheduler 10 332339c763d9d kube-scheduler-host-10-254-178-108
83647ecacf27a b29fb62480892 5 seconds ago Running kube-apiserver 10 c94f26e928f7d kube-apiserver-host-10-254-178-108
After multiple attempts, the etcd and kube-controller-manager containers kept exiting quickly. I checked the container logs but didn’t find any meaningful information; they seemed to be killed instantly.
I checked the information about containerd and kubelet in /var/log/message:
kubelet:
Apr 3 22:05:16 host-10-254-178-108 kubelet[81121]: I0403 22:05:16.547278 81121 scope.go:117] "RemoveContainer" containerID="68386037caf9e9dc50fcc1abbce30053cfdfb8c43458ce83e429b488d6a2c580"
Apr 3 22:05:16 host-10-254-178-108 kubelet[81121]: E0403 22:05:16.778118 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-host-10-254-178-108_kube-system(e816a5233e465844be907fe05cd58ca4)\"" pod="kube-system/etcd-host-10-254-178-108" podUID="e816a5233e465844be907fe05cd58ca4"
Apr 3 22:05:16 host-10-254-178-108 kubelet[81121]: E0403 22:05:16.778968 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr 3 22:05:17 host-10-254-178-108 kubelet[81121]: I0403 22:05:17.567705 81121 scope.go:117] "RemoveContainer" containerID="eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77"
Apr 3 22:05:17 host-10-254-178-108 kubelet[81121]: E0403 22:05:17.568346 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-host-10-254-178-108_kube-system(e816a5233e465844be907fe05cd58ca4)\"" pod="kube-system/etcd-host-10-254-178-108" podUID="e816a5233e465844be907fe05cd58ca4"
Apr 3 22:05:17 host-10-254-178-108 kubelet[81121]: I0403 22:05:17.582318 81121 scope.go:117] "RemoveContainer" containerID="af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74"
Apr 3 22:05:17 host-10-254-178-108 kubelet[81121]: E0403 22:05:17.582986 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr 3 22:05:18 host-10-254-178-108 kubelet[81121]: I0403 22:05:18.559826 81121 scope.go:117] "RemoveContainer" containerID="af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74"
Apr 3 22:05:18 host-10-254-178-108 kubelet[81121]: E0403 22:05:18.560392 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr 3 22:05:19 host-10-254-178-108 kubelet[81121]: I0403 22:05:19.564507 81121 scope.go:117] "RemoveContainer" containerID="af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74"
Apr 3 22:05:19 host-10-254-178-108 kubelet[81121]: E0403 22:05:19.565047 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr 3 22:05:20 host-10-254-178-108 kubelet[81121]: I0403 22:05:20.937014 81121 scope.go:117] "RemoveContainer" containerID="eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77"
Apr 3 22:05:20 host-10-254-178-108 kubelet[81121]: E0403 22:05:20.937669 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-host-10-254-178-108_kube-system(e816a5233e465844be907fe05cd58ca4)\"" pod="kube-system/etcd-host-10-254-178-108" podUID="e816a5233e465844be907fe05cd58ca4"
Apr 3 22:05:21 host-10-254-178-108 kubelet[81121]: E0403 22:05:21.248518 81121 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"host-10-254-178-108\" not found"
Apr 3 22:05:22 host-10-254-178-108 kubelet[81121]: I0403 22:05:22.005814 81121 scope.go:117] "RemoveContainer" containerID="eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77"
Apr 3 22:05:22 host-10-254-178-108 kubelet[81121]: E0403 22:05:22.006310 81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd
container:
Apr 3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.103786303+08:00" level=info msg="StartContainer for \"eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77\" returns successfully"
Apr 3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.179529389+08:00" level=info msg="StartContainer for \"af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74\" returns successfully"
Apr 3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.507279279+08:00" level=info msg="StopContainer for \"af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74\" with timeout
30 (s)"
Apr 3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.507939608+08:00" level=info msg="Stop container \"af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74\" with signal terminated"
Apr 3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.522220106+08:00" level=info msg="StopContainer for \"eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77\" with timeout
30 (s)"
The containers are exiting quickly after running for only 500 milliseconds, but strangely the api-server container seems to be running fine.
I suspected it might be an issue with the images, but they work fine when I run them with Docker.
I tried the same operation in both 1.28.1 and 1.28.8 and got the same result.
I couldn’t find any useful information in the logs, neither kubeadm, kubelet, nor containerd provided any useful information about the container exits. Am I missing something? How should I troubleshoot this issue?