Cluster information:
Kubernetes version: 1.16.13
Cloud being used: bare metal
Installation method: kubeadm
Host OS: Ubuntu 18.04
CNI and version: canal: calico v3.8.4, flannel v0.11.0
CRI and version: containerd.io 1.2.13-2
I am trying to remove docker from a cluster, so that it runs with pure containerd via CRI.
It looked to be straightforward at first. What I did was to reconfigure containerd to enable its CRI interface:
mv /etc/containerd/config.toml{,.old}
containerd config default > /etc/containerd/config.toml
vi /etc/containerd/config.toml # set systemd_cgroup = true
systemctl restart containerd
Rejoin the node: kubeadm reset
followed by kubeadm join ... --cri-socket /run/containerd/containerd.sock
. Aside: this is because if both containerd and docker are detected, docker takes precedence
Disable docker: systemctl stop docker
; systemctl disable docker
It starts its system pods happily:
root@dar7:~# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT
bfddb2d712e7b 13 days ago Ready kube-proxy-7ftdm kube-system 0
71ec1a994af1d 13 days ago Ready canal-r76bs kube-system 0
root@dar7:~# crictl ps
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT POD ID
b3f2b9362b693 83b416d242055 13 days ago Running calico-node 6 71ec1a994af1d
3172760b7ea99 9b65a0f78b091 13 days ago Running kube-proxy 3 bfddb2d712e7b
12aba76a0797c 8a9c4ced3ff92 13 days ago Running kube-flannel 0 71ec1a994af1d
root@dar7:~#
I have not noticed anything out-of-the-ordinary in those pod logs.
However, what I find is that the node stays in a NotReady state, saying that cni plugin is not initialized:
Ready False Fri, 31 Jul 2020 16:39:34 +0100 Fri, 31 Jul 2020 13:55:32 +0100 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
I am stuck now trying to work out how to fix this.
If I compare this node with an old node that still has docker:
# Node which has been switched from docker to containerd
root@dar7:~# grep KUBELET_KUBEADM_ARGS /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --resolv-conf=/run/systemd/resolve/resolv.conf"
# Node where docker still being used
root@dar25:~# grep KUBELET_KUBEADM_ARGS /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf"
I can see that the new node doesn’t have --network-plugin=cni
. I found this in the documentation:
Depending on the CRI runtime your cluster uses, you may need to specify different flags to the kubelet. For instance, when using Docker, you need to specify flags such as
--network-plugin=cni
, but if you are using an external runtime, you need to specify--container-runtime=remote
and specify the CRI endpoint using the--container-runtime-endpoint=<path>
.
The way I read this is that --network-plugin=cni
is not required when not using Docker. Or is it saying that CNI only works when Docker is present?? That would be very surprising.
This cluster was set up by someone else around Nov 2019, so I don’t know exactly how networking was set up. I can see there’s a canal daemonset:
$ kubectl get daemonset --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system canal 28 28 27 28 27 beta.kubernetes.io/os=linux 253d
kube-system kube-proxy 28 28 27 28 27 beta.kubernetes.io/os=linux 253d
The daemonset YAML references images calico/node:v3.8.4
, quay.io/coreos/flannel:v0.11.0
, calico/cni:v3.8.4
and calico/pod2daemon-flexvol:v3.8.4
I have tried googling various combinations of “calico without docker” or “calico with containerd” but not had any useful results.
Any clues please?
Thanks,
Brian.