Hi,
I’m trying to deploy K8S Control Plane on a c5.xlarge instance in AWS with Ubuntu20 and failing miserably at it.
The cluster seems to be installed and working for a while but then the api server dies.
Also I see in kube-system namespace containers going into CrashLoop:
ubuntu@ip-10-10-100-179:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-5tlch 0/1 CrashLoopBackOff 14 (35s ago) 35m
kube-system cilium-operator-65496b9554-792xm 1/1 Running 22 (86s ago) 35m
kube-system coredns-7db6d8ff4d-5d6ls 0/1 Pending 0 39m
kube-system coredns-7db6d8ff4d-ztrt5 0/1 Pending 0 39m
kube-system etcd-ip-10-10-100-179 1/1 Running 20 (2m53s ago) 35m
kube-system kube-apiserver-ip-10-10-100-179 1/1 Running 27 (119s ago) 39m
kube-system kube-controller-manager-ip-10-10-100-179 0/1 CrashLoopBackOff 30 (36s ago) 36m
kube-system kube-proxy-snzr2 1/1 Running 20 (63s ago) 39m
kube-system kube-scheduler-ip-10-10-100-179 1/1 Running 30 (2m53s ago) 40m
Swap is off (AWS cloud instances have it off).
Disk is not full.
This one keeps dying out and being restarted:
ubuntu@ip-10-10-100-179:~$ kubectl get pods -A
Get "https://10.10.100.179:6443/api/v1/pods?limit=500": dial tcp 10.10.100.179:6443: connect: connection refused - error from a previous attempt: read tcp 10.10.100.179:55826->10.10.100.179:6443: read: connection reset by peer
Not sure anymore what to try or where to look in which log.
syslog:2024-07-22T09:22:00.492140+00:00 ip-10-10-100-179 kubelet[20988]: E0722 09:22:00.492086 20988 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 20s restarting failed container=kube-apiserver pod=kube-apiserver-ip-10-10-100-179_kube-system(1b3b3892f733afc426e9fd57ef4aa6b2)\"" pod="kube-system/kube-apiserver-ip-10-10-100-179" podUID="1b3b3892f733afc426e9fd57ef4aa6b2"
syslog:2024-07-22T09:19:43.388388+00:00 ip-10-10-100-179 kubelet[14601]: E0722 09:19:43.387968 14601 kuberuntime_container.go:784] "Container termination failed with gracePeriod" err="rpc error: code = Unavailable desc = error reading from server: EOF" pod="kube-system/kube-apiserver-ip-10-10-100-179" podUID="1b3b3892f733afc426e9fd57ef4aa6b2" containerName="kube-apiserver" containerID="containerd://20e419f91fcefe27317feba35fd7f83d8fdf783d6e242af075c44f917cce1866" gracePeriod=30
Anyone hit this before?
I’m confused a bit about what I should do next.
I also tried a reinstallation using the k8scp.sh script but same problem as always:
Error: Unable to install Cilium: Kubernetes cluster unreachable: Get "https://10.10.100.172:6443/version": dial tcp 10.10.100.172:6443: connect: connection refused
Cilium install finished. Continuing with script."
The API Server is already dying since the beginning.
Digging more:
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
error execution phase addon/coredns: unable to create a new DNS service: rpc error: code = Unknown desc = malformed header: missing HTTP content-type
To see the stack trace of this error execute with --v=5 or higher