Hi all
I followed instruction at : https://computingforgeeks.com/deploy-kubernetes-cluster-on-ubuntu-with-kubeadm/ to install k8s cluster on premise on Ubuntu 20.04 with kubeadm
Cluster information:
kubectl version --short
Client Version: v1.26.2
Kustomize Version: v4.5.7
Server Version: v1.26.2
kubectl cluster-info
Kubernetes control plane is running at https://k8s.mydomain.com:6443
CoreDNS is running at https://k8s.mydomain.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Cloud being used: bare-metal)
Installation method:
kubeadm init --v=5 --pod-network-cidr=10.244.0.0/16 --upload-certs --control-plane-endpoint=k8s.mydomain.com
Host OS: Ubuntu 22.04
CNI and version: Flannel latest
CRI and version: CRI-O latest
Problem 1 : coredns pods cannot start
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-787d4945fb-g8hhj 0/1 CreateContainerError 0 7m1s
coredns-787d4945fb-nkssp 0/1 CreateContainerError 0 6m29s
etcd-k8s-52ts-master1 1/1 Running 1 12h
kube-apiserver-k8s-52ts-master1 1/1 Running 1 12h
kube-controller-manager-k8s-52ts-master1 1/1 Running 1 12h
kube-proxy-m4mhc 1/1 Running 1 12h
kube-proxy-rrdpb 1/1 Running 0 12h
kube-scheduler-k8s-52ts-master1 1/1 Running 1 12h
kubectl describe pod coredns-787d4945fb-g8hhj -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 65s default-scheduler Successfully assigned kube-system/coredns-787d4945fb-g8hhj to k8s-52ts-worker1
Warning Failed 64s kubelet Error: container create failed: time="2023-03-17T08:26:01+07:00" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd45fe767_098b_401f_8da9_1a1760d804f6.slice/crio-105d630363803602c6bfe9c1516b128192b95c519ff321ed1177fe0cacdc9b42.scope/memory.events: no such file or directory"
time="2023-03-17T08:26:01+07:00" level=error msg="container_linux.go:380: starting container process caused: exec: \"/coredns\": stat /coredns: no such file or directory"
Warning Failed 63s kubelet Error: container create failed: time="2023-03-17T08:26:02+07:00" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd45fe767_098b_401f_8da9_1a1760d804f6.slice/crio-2bbebc0f6e74a3c5a8f5e460e758e322a94ffe97964f34e357d831ce3fced345.scope/memory.events: no such file or directory"
time="2023-03-17T08:26:02+07:00" level=error msg="container_linux.go:380: starting container process caused: exec: \"/coredns\": stat /coredns: no such file or directory"
Warning Failed 47s kubelet Error: container create failed: time="2023-03-17T08:26:18+07:00" level=error msg="container_linux.go:380: starting container process caused: exec: \"/coredns\": stat /coredns: no such file or directory"
Warning Failed 34s kubelet Error: container create failed: time="2023-03-17T08:26:31+07:00" level=error msg="container_linux.go:380: starting container process caused: exec: \"/coredns\": stat /coredns: no such file or directory"
Warning Failed 19s kubelet Error: container create failed: time="2023-03-17T08:26:46+07:00" level=error msg="container_linux.go:380: starting container process caused: exec: \"/coredns\": stat /coredns: no such file or directory"
Normal Pulled 9s (x6 over 64s) kubelet Container image "registry.k8s.io/coredns/coredns:v1.9.3" already present on machine
I installed MetalLB successfully
kubectl get all -n metallb-system
NAME READY STATUS RESTARTS AGE
pod/controller-68bf958bf9-crsws 1/1 Running 0 105s
pod/speaker-n5fl7 1/1 Running 0 105s
pod/speaker-qj552 1/1 Running 0 105s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/webhook-service ClusterIP 10.100.0.100 <none> 443/TCP 106s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/speaker 2 2 2 2 2 kubernetes.io/os=linux 105s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/controller 1/1 1 1 106s
kubectl get ipaddresspools.metallb.io -n metallb-system
NAME AUTO ASSIGN AVOID BUGGY IPS ADDRESSES
k8s true false ["192.168.7.190-192.168.7.195"]
kubectl get l2advertisements.metallb.io -n metallb-system
NAME IPADDRESSPOOLS IPADDRESSPOOL SELECTORS INTERFACES
l2-advert
kubectl describe ipaddresspools.metallb.io k8s -n metallb-system
Name: k8s
Namespace: metallb-system
Labels: <none>
Annotations: <none>
API Version: metallb.io/v1beta1
Kind: IPAddressPool
Metadata:
Creation Timestamp: 2023-03-16T13:27:25Z
Generation: 1
Managed Fields:
API Version: metallb.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:addresses:
f:autoAssign:
f:avoidBuggyIPs:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-03-16T13:27:25Z
Resource Version: 3744
UID: 3e7ebe10-1073-4faf-9da0-3cdc98e659ad
Spec:
Addresses:
192.168.7.190-192.168.7.195
Auto Assign: true
Avoid Buggy I Ps: false
Events: <none>
I installed ingress-nginx successfully
kubectl get all -n ingress-nginx
NAME READY STATUS RESTARTS AGE
pod/ingress-nginx-controller-c69664497-qpdct 1/1 Running 0 2m16s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ingress-nginx-controller LoadBalancer 10.103.198.89 192.168.7.190 80:32500/TCP,443:32175/TCP 2m16s
service/ingress-nginx-controller-admission ClusterIP 10.102.57.244 <none> 443/TCP 2m16s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ingress-nginx-controller 1/1 1 1 2m16s
NAME DESIRED CURRENT READY AGE
replicaset.apps/ingress-nginx-controller-c69664497 1 1 1 2m16s
I installed a demo web app and created a ingress rule to public web app, I can access web app from outside k8s cluster through “192.168.7.190” successfully .
I installed cert-manager successfully
kubectl get all -n cert-manager
NAME READY STATUS RESTARTS AGE
pod/cert-manager-6ffb79dfdb-9sq7g 1/1 Running 0 100s
pod/cert-manager-cainjector-5fcd49c96-kcl7b 1/1 Running 0 101s
pod/cert-manager-webhook-796ff7697b-4cxkm 1/1 Running 0 100s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cert-manager ClusterIP 10.105.156.184 <none> 9402/TCP 101s
service/cert-manager-webhook ClusterIP 10.106.211.161 <none> 443/TCP 101s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cert-manager 1/1 1 1 101s
deployment.apps/cert-manager-cainjector 1/1 1 1 101s
deployment.apps/cert-manager-webhook 1/1 1 1 100s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cert-manager-6ffb79dfdb 1 1 1 101s
replicaset.apps/cert-manager-cainjector-5fcd49c96 1 1 1 101s
replicaset.apps/cert-manager-webhook-796ff7697b 1 1 1 100s
Problem 2 : clusterissuer is not ready
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
namespace: cert-manager
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: myemail@mydomain.com
privateKeySecretRef:
name: letsencrypt-production
solvers:
- http01:
ingress:
class: nginx
kubectl apply -f issuer-letsencrypt-production.yaml
clusterissuer.cert-manager.io/letsencrypt-production created
But it is not READY
kubectl get clusterissuer -n cert-manager
NAME READY AGE
letsencrypt-production False 33s
kubectl describe clusterissuer/letsencrypt-production -n cert-manager
Name: letsencrypt-production
Namespace:
Labels: <none>
Annotations: <none>
API Version: cert-manager.io/v1
Kind: ClusterIssuer
Metadata:
Creation Timestamp: 2023-03-16T13:50:31Z
Generation: 1
Managed Fields:
API Version: cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:acme:
.:
f:email:
f:privateKeySecretRef:
.:
f:name:
f:server:
f:solvers:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-03-16T13:50:31Z
API Version: cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:acme:
f:conditions:
.:
k:{"type":"Ready"}:
.:
f:lastTransitionTime:
f:message:
f:observedGeneration:
f:reason:
f:status:
f:type:
Manager: cert-manager-clusterissuers
Operation: Update
Subresource: status
Time: 2023-03-16T13:55:48Z
Resource Version: 6957
UID: a39ecd1d-2a74-453c-89bb-548db12ed901
Spec:
Acme:
Email: myemail@mydomain.com
Preferred Chain:
Private Key Secret Ref:
Name: letsencrypt-production
Server: https://acme-v02.api.letsencrypt.org/directory
Solvers:
http01:
Ingress:
Class: nginx
Status:
Acme:
Conditions:
Last Transition Time: 2023-03-16T13:50:41Z
Message: Failed to register ACME account: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:50165->10.96.0.10:53: read: connection refused
Observed Generation: 1
Reason: ErrRegisterACMEAccount
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrInitIssuer 5m12s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:37136->10.96.0.10:53: i/o timeout
Warning ErrInitIssuer 5m2s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:52820->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 4m52s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:58875->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 4m37s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:46895->10.96.0.10:53: i/o timeout
Warning ErrInitIssuer 4m27s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:43995->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 4m17s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:44989->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 4m1s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:45239->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 3m51s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:43369->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 3m41s cert-manager-clusterissuers Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:46797->10.96.0.10:53: read: connection refused
Warning ErrInitIssuer 5s (x18 over 3m26s) cert-manager-clusterissuers (combined from similar events): Error initializing issuer: Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org on 10.96.0.10:53: read udp 10.244.1.8:50165->10.96.0.10:53: read: connection refused
How can I troubleshoot and fix these 2 problem ? Please give me some advice , thank you very much.