[solved] API-Server not starting: Error: context deadline exceeded

Cluster information:

Kubernetes version: v1.22.2
Cloud being used: bare-metal
Installation method: kube-adm
Host OS: debian bullseye
CNI and version: calico 3.24.5
CRI and version: cri-o 1.24

My 2 node cluster was working, but after rebooting node node I lost my API-Server and it is not coming up again. Both master nodes the log for the API-Server logs as follows.
I can’t read out any issue here. I didn’t found anything based on this error message. Any help to further troubleshoot is appreciated.

2022-12-12T23:28:37.274818402+00:00 stderr F I1212 23:28:37.274707       1 server.go:553] external host was not specified, using 192.168.2.171
2022-12-12T23:28:37.275051965+00:00 stderr F I1212 23:28:37.275025       1 server.go:161] Version: v1.22.2
2022-12-12T23:28:37.360433172+00:00 stderr F I1212 23:28:37.360411       1 shared_informer.go:240] Waiting for caches to sync for node_authorizer
2022-12-12T23:28:37.361096291+00:00 stderr F I1212 23:28:37.361078       1 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
2022-12-12T23:28:37.361096291+00:00 stderr F I1212 23:28:37.361090       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
2022-12-12T23:28:37.365964771+00:00 stderr F I1212 23:28:37.365945       1 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
2022-12-12T23:28:37.365964771+00:00 stderr F I1212 23:28:37.365956       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
2022-12-12T23:28:57.369231344+00:00 stderr F Error: context deadline exceeded

Thanks
Thomas

Here the relevant part of the kube-apiserver.yaml

spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=192.168.2.171
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-issuer=https://kubernetes.default.svc.cluster.local
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12,fdd7:c7e6:fad4:4e81::/112
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    image: k8s.gcr.io/kube-apiserver:v1.22.2
    imagePullPolicy: IfNotPresent

Found the issue.
It was an unhealthy etcd which brought the api server down.
A more useful error message would be really helpful.

Bye
Thomas

Hi Thomas,
I am seeing the same issue. After I installed istio v1.17.1.
I am also using the same stack like your. k8v1.24.6

Could you please elaborate a bit how exactly overcome this issue.

 - lastTransitionTime: '2023-03-15T07:53:46Z'
      lastUpdateTime: '2023-03-15T07:53:46Z'
      message: >-
        Internal error occurred: failed calling webhook
        "namespace.sidecar-injector.istio.io": failed to call webhook: Post
        "https://istiod.istio-system.svc:443/inject?timeout=10s": context
        deadline exceeded
      reason: FailedCreate
      status: 'True'
      type: ReplicaFailure

As written, my etcd was not healthy. It was an config issue with missing and wrong etcd peers in the config.
But simply check if you etcd is running properly but taking a look into the logs. Depending how your deployment is looking like, the istio is interfering with the communication between the etcd instances.

Hi Thomas, how did you fix your etcd?

my etcd became corrupted and we restored it with etcd-manager-ctl. After the restore my cluster recovered but one of my control-planes is not being able to rejoin the cluster.

Marco.

Hi Marco,
it looks like, that you have a different issue as you have a running etcd now.
I am by far not an etcd/k8s troubleshooting expert.

You may need to share more infos here, like logs etc. But as your etcd is running, it may be a good idea to open another topic.

Bye
Thomas