Kubeadm failed to create container/pod

Cluster information:

Kubernetes version: 1.28.1 / 1.28.8
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: OpenEuler
CNI and version: NO
CRI and version: containerd 1.7.13
CPU architecture:ARM64
Linux Kernel: 5.10.0-193.0.0.106.oe2203sp3.aarch64

Issue

I initialized a cluster using the kubeadm tool on a machine with ARM64 architecture running the OpenEuler system. I used the following command:

kubeadm init --apiserver-advertise-address=10.254.178.108 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.28.1 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all

I received the following output:

................................
..............................
.............................
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

I checked the container information in containerd:

#crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
af85b843ace3e       8b6e1980b7584       3 seconds ago       Exited              kube-controller-manager   14                  ec2c4fca93244       kube-controller-manager-host-10-254-178-108
eded802f3b70e       9cdd6470f48c8       3 seconds ago       Exited              etcd                      14                  f6a60416f180b       etcd-host-10-254-178-108
2e53086877372       b4a5a57e99492       5 seconds ago       Running             kube-scheduler            10                  332339c763d9d       kube-scheduler-host-10-254-178-108
83647ecacf27a       b29fb62480892       5 seconds ago       Running             kube-apiserver            10                  c94f26e928f7d       kube-apiserver-host-10-254-178-108

After multiple attempts, the etcd and kube-controller-manager containers kept exiting quickly. I checked the container logs but didn’t find any meaningful information; they seemed to be killed instantly.

I checked the information about containerd and kubelet in /var/log/message:

kubelet:

Apr  3 22:05:16 host-10-254-178-108 kubelet[81121]: I0403 22:05:16.547278   81121 scope.go:117] "RemoveContainer" containerID="68386037caf9e9dc50fcc1abbce30053cfdfb8c43458ce83e429b488d6a2c580"
Apr  3 22:05:16 host-10-254-178-108 kubelet[81121]: E0403 22:05:16.778118   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-host-10-254-178-108_kube-system(e816a5233e465844be907fe05cd58ca4)\"" pod="kube-system/etcd-host-10-254-178-108" podUID="e816a5233e465844be907fe05cd58ca4"
Apr  3 22:05:16 host-10-254-178-108 kubelet[81121]: E0403 22:05:16.778968   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr  3 22:05:17 host-10-254-178-108 kubelet[81121]: I0403 22:05:17.567705   81121 scope.go:117] "RemoveContainer" containerID="eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77"
Apr  3 22:05:17 host-10-254-178-108 kubelet[81121]: E0403 22:05:17.568346   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-host-10-254-178-108_kube-system(e816a5233e465844be907fe05cd58ca4)\"" pod="kube-system/etcd-host-10-254-178-108" podUID="e816a5233e465844be907fe05cd58ca4"
Apr  3 22:05:17 host-10-254-178-108 kubelet[81121]: I0403 22:05:17.582318   81121 scope.go:117] "RemoveContainer" containerID="af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74"
Apr  3 22:05:17 host-10-254-178-108 kubelet[81121]: E0403 22:05:17.582986   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr  3 22:05:18 host-10-254-178-108 kubelet[81121]: I0403 22:05:18.559826   81121 scope.go:117] "RemoveContainer" containerID="af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74"
Apr  3 22:05:18 host-10-254-178-108 kubelet[81121]: E0403 22:05:18.560392   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr  3 22:05:19 host-10-254-178-108 kubelet[81121]: I0403 22:05:19.564507   81121 scope.go:117] "RemoveContainer" containerID="af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74"
Apr  3 22:05:19 host-10-254-178-108 kubelet[81121]: E0403 22:05:19.565047   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-host-10-254-178-108_kube-system(0df828bd49881f9bd47d6aaeebf77078)\"" pod="kube-system/kube-controller-manager-host-10-254-178-108" podUID="0df828bd49881f9bd47d6aaeebf77078"
Apr  3 22:05:20 host-10-254-178-108 kubelet[81121]: I0403 22:05:20.937014   81121 scope.go:117] "RemoveContainer" containerID="eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77"
Apr  3 22:05:20 host-10-254-178-108 kubelet[81121]: E0403 22:05:20.937669   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-host-10-254-178-108_kube-system(e816a5233e465844be907fe05cd58ca4)\"" pod="kube-system/etcd-host-10-254-178-108" podUID="e816a5233e465844be907fe05cd58ca4"
Apr  3 22:05:21 host-10-254-178-108 kubelet[81121]: E0403 22:05:21.248518   81121 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"host-10-254-178-108\" not found"
Apr  3 22:05:22 host-10-254-178-108 kubelet[81121]: I0403 22:05:22.005814   81121 scope.go:117] "RemoveContainer" containerID="eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77"
Apr  3 22:05:22 host-10-254-178-108 kubelet[81121]: E0403 22:05:22.006310   81121 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd 

container:

Apr  3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.103786303+08:00" level=info msg="StartContainer for \"eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77\" returns successfully"
Apr  3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.179529389+08:00" level=info msg="StartContainer for \"af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74\" returns successfully"
Apr  3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.507279279+08:00" level=info msg="StopContainer for \"af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74\" with timeout 
30 (s)"
Apr  3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.507939608+08:00" level=info msg="Stop container \"af85b843ace3eaab834bea10789f03e53926c4d3fb13d2cc0fcb5318bb2deb74\" with signal terminated"
Apr  3 22:05:15 host-10-254-178-108 containerd[78498]: time="2024-04-03T22:05:15.522220106+08:00" level=info msg="StopContainer for \"eded802f3b70ebe3f17040f4e382ec2f55bbf533b0d3d304f9ba15c4875dfc77\" with timeout 
30 (s)"

The containers are exiting quickly after running for only 500 milliseconds, but strangely the api-server container seems to be running fine.

I suspected it might be an issue with the images, but they work fine when I run them with Docker.

I tried the same operation in both 1.28.1 and 1.28.8 and got the same result.

I couldn’t find any useful information in the logs, neither kubeadm, kubelet, nor containerd provided any useful information about the container exits. Am I missing something? How should I troubleshoot this issue?

I have resolved this issue, which was caused by inconsistency in the cgroup driver between containerd and kubelet.