Version 1.24.0 cluster built with kubeadm
Thanks for your time. My cluster seems to be fine other than I have to constantly restart kublet on the controller because after an irregular length of time I get connection refused using kubectl. Once kubelet is restarted I can use kubectl again. Sometimes it happens within a minute and other times it’s good for a few minutes.
I need to figure out how to permanently fix it though please.
I just noticed I still get connection refused even when kubelet is disabled.
Your timeouts do not come directly from kubelet as kubectl is communicating with the apiserver pod(s).
Can you share some more information?
- How loaded is the machine? (Cpu and load average would help)
- Can you check that the apiserver pod is running? (
kubectl get pods -n kube-system)
- Same for etcd(s) pods
Those are the initial investigations i would do
Thanks for your help Theog75. I just built this cluster as my first dabble with Kubernetes so I don’t know a whole lot about it at this point. It’s brand new and in fact I only have 1 test pod installed on it. Here’s the output from get pods -n kube-system:
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6799f5f4b4-5ptwp 0/1 CrashLoopBackOff 32 (2m9s ago) 3h9m
calico-node-9q9z7 0/1 CrashLoopBackOff 23 (26s ago) 3h2m
calico-node-bspps 0/1 Running 32 (62s ago) 3h9m
calico-node-l2zcf 0/1 Completed 27 (5m52s ago) 161m
coredns-6d4b75cb6d-6hpcw 1/1 Running 32 (45s ago) 3h36m
coredns-6d4b75cb6d-xwmxs 0/1 CrashLoopBackOff 28 (49s ago) 3h36m
etcd-k8c1.acme.lan 1/1 Running 82 (2m25s ago) 3h37m
kube-apiserver-k8c1.acme.lan 1/1 Running 85 (105s ago) 3h36m
kube-controller-manager-k8c1.acme.lan 1/1 Running 100 (2m11s ago) 3h35m
kube-proxy-mg6nm 0/1 CrashLoopBackOff 31 (3m11s ago) 161m
kube-proxy-vtqb6 0/1 CrashLoopBackOff 33 (44s ago) 3h2m
kube-proxy-xgfmm 1/1 Running 80 (66s ago) 3h36m
kube-scheduler-k8c1.acme.lan 0/1 CrashLoopBackOff 97 (16s ago) 3h37m
BTW, the connection refused problems have been going on since before I installed the Calico network and test pod.
Your system seems highly unstable most key pods are restarting continuously as i am sure you can see.
There can be many reasons from resource to faulty configuration.
If your aim is to learn how to use kubernetes (as opposed to installing and maintaining) i would suggest etarting with microk8s.
Otherwise you can either start and investigate pod by pod in kube-system (.to start with) and understand why they all restart frequently.
Did you deploy calico after installation?
I did deploy Calico after installing the cluster.
I just did a kubectl get events --all-namespaces | grep -i $podname to look at the controller and I see a ton of killing and restarting events but they don’t give you any idea of the root cause.
I’m going to destroy and rebuild the cluster rather than waste a ton of time on this stuff right now.
As your environment is non critical i would do the same (rebuild)
Just fyi you can see container lods inside the pod by running:
Kubectl logs <podname> -n <namespace>
This would let you get a bit more understanding as of why a pod is restarting or crashing
Tried rebuilding from scratch and got the same problem. I’m wondering if the instructions I am following are too old to be good any more. I also tried the same instructions using version 1.24 but again the same problem persists.
Here’s the instructions I used.
I’m a knucklehead, I thought my Ubuntu template was 20.04, it was 22.04. Once I built the cluster using 20.04 my problems went away.