I while ago I tried to upgrade my system from Debian buster to Debian bulleye. But etcd stops working. So I downgraded the kernel back to the buster version, and that fixed the problem.
I posted my experiences on stack overflow, which appeared to be the correct place to get support for Kubernetes, but it was closed with “We don’t allow questions about general computing hardware and software on Stack Overflow” which doesn’t make a lot of sense to me. This isn’t a general question IMHO. In hindsight maybe I should have emphasized this is a kubernetes system, that I was trying to upgrade. But not sure if this was actually the problem or not. The post was clearly tagged with kubernetes.
Since then I have upgraded Kubernetes to 1.23.5, haven’t tried the new kernel again, I don’t think anything has changed that would explain this (correct me if I am wrong).
Here is what I posted to stack overflow. Any ideas?
After I upgrade the kernel from Linux 4.19.0-18-amd64 (Debian/buster) to Linux 5.10.0-9-amd64, etcd initially looks like it is running fine. But after several minutes etcd gets killed. And I can’t work out why.
If I downgrade the kernel it works fine.
$ kubectltl describe pod -n kube-system etcd-kube-master-3
Name: etcd-kube-master-3
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: kube-master-3/192.168.3.40
Start Time: Thu, 25 Nov 2021 19:08:44 +1100
Labels: component=etcd
tier=control-plane
Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.3.40:2379
kubernetes.io/config.hash: ed77bf25802a86b137c96f3aede996ff
kubernetes.io/config.mirror: ed77bf25802a86b137c96f3aede996ff
kubernetes.io/config.seen: 2021-11-25T19:08:43.683581482+11:00
kubernetes.io/config.source: file
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 192.168.3.40
IPs:
IP: 192.168.3.40
Controlled By: Node/kube-master-3
Containers:
etcd:
Container ID: containerd://d4f0a6714fbf6dfabe23e3164b192d4aad24a883ce009f5052f552ed244928ab
Image: k8s.gcr.io/etcd:3.5.1-0
Image ID: k8s.gcr.io/etcd@sha256:64b9ea357325d5db9f8a723dcf503b5a449177b17ac87d69481e126bb724c263
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://192.168.3.40:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://192.168.3.40:2380
--initial-cluster=kube-master-3=https://192.168.3.40:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379,https://192.168.3.40:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://192.168.3.40:2380
--name=kube-master-3
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 26 Nov 2021 10:10:31 +1100
Finished: Fri, 26 Nov 2021 10:12:11 +1100
Ready: False
Restart Count: 11
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=8
Startup: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=24
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning DNSConfigForming 37m (x77 over 127m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 192.168.3.38 2001:44b8:4112:8a03::26 2001:44b8:4112:8a03::26
Warning DNSConfigForming 30m (x5 over 31m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 192.168.3.38 2001:44b8:4112:8a03::26 2001:44b8:4112:8a03::26
Warning DNSConfigForming 4m8s (x25 over 30m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 192.168.3.38 2001:44b8:4112:8a03::26 2001:44b8:4112:8a03::26
Normal Killing 2m24s kubelet Stopping container etcd
Normal SandboxChanged 2m22s (x2 over 2m26s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning BackOff 2m18s (x5 over 2m22s) kubelet Back-off restarting failed container
Normal Pulled 2m2s (x2 over 2m25s) kubelet Container image "k8s.gcr.io/etcd:3.5.1-0" already present on machine
Normal Created 2m2s (x2 over 2m25s) kubelet Created container etcd
Warning DNSConfigForming 2m1s (x11 over 2m26s) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 192.168.3.38 2001:44b8:4112:8a03::26 2001:44b8:4112:8a03::26
Normal Started 2m1s (x2 over 2m25s) kubelet Started container etcd
$ kubectl logs -n kube-system etcd-kube-master-3 -f
[...]
{"level":"info","ts":"2021-11-25T23:12:10.807Z","caller":"osutil/interrupt_unix.go:64","msg":"received signal; shutting down","signal":"terminated"} /namespaces/\" range_end:\"/registry/namespaces0\" limit:10000 "}
[... lots of verbose shutdown message omitted ...]
I suspect the significant message is the “Pod sandbox changed, it will be killed and re-created.” - I have no idea what this means.
(the Nameserver limits were exceeded limits messages, while curious, seem to be duplicating the same name server multiple times, are not related and occur with older kernel also)
Kubernetes 1.22.4 is running on LattePanda v1, LattePanda V1 - LattePanda 4G/64GB - DFR0419 | DFRobot Electronics.
Just wondering if there are any known issues with Kubernetes and a recent kernel?