Description:
Following a clean installation using “kubeadm init,” the Calico nodes with pod names “calico-node-fjzj7” and “calico-node-v9d2h” in the “calico-system” namespace are encountering readiness probe failures. Despite my troubleshooting efforts, I haven’t been successful in resolving the issue. Any suggestions and ideas for troubleshooting welcome. See relevant logs, and the installation steps above outline the process undertaken.
pod: calico-node-fjzj7 in namespace: calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 7m32s kubelet Readiness probe failed: calico/node is not ready: felix is not ready: Get "http://localhost:9099/readiness": dial tcp [::1]:9099: connect: connection refused
Warning Unhealthy 7m31s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning Unhealthy 7m30s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
pod: calico-node-v9d2h in namespace: calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 10m kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning Unhealthy 10m kubelet Readiness probe failed: calico/node is not ready: felix is not ready: Get "http://localhost:9099/readiness": dial tcp [::1]:9099: connect: connection refused
pod: calico-apiserver-54b4cd957d-cdbvv in namespace: calico-apiserver
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 9m34s kubelet MountVolume.SetUp failed for volume "calico-apiserver-certs" : failed to sync secret cache: timed out waiting for the condition
pod: calico-apiserver-54b4cd957d-dc8bt in namespace: calico-apiserver
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 9m35s kubelet MountVolume.SetUp failed for volume "calico-apiserver-certs" : failed to sync secret cache: timed out waiting for the condition
pod: calico-kube-controllers-6ddb6ddd65-qrrlg in namespace: calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9c4119ee8f5f88b2020633e5f5177f5e18654b1011d979ba24bd8c7c56d7b723": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
pod: calico-typha-8684598794-g6sj2 in namespace: calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 11m default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
pod: csi-node-driver-8jkgq in namespace: calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning NetworkNotReady 7m51s (x20 over 8m29s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Warning FailedCreatePodSandBox 7m49s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "2e2456fe7c48e00174f02ad2b17d1a25c556712a9f85c5f4642fdc70da8fcb45": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
pod: csi-node-driver-mmgz6 in namespace: calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning NetworkNotReady 10m (x20 over 11m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bbd297dedf3011c2dc00f572974dd8d5b435a676024c4e1e341739d535bfab75": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
pod: coredns-76f75df574-mxsn4 in namespace: kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b540694b788130a411f2359c77690ffb47a533a8d666673d0d6d7602fed7ce55": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
pod: coredns-76f75df574-sdsnz in namespace: kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "863e1a88b024e4dcf774900d7a57d3e2edf55f3483697fe4164ff4e070c9e90d": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
@fox-md The following logs were right after a kubeadm init.I didn’t notice anything out of the ordinary in the logs. It shows warning but they are up :s. I reached the body limit, upped logs here.
$ kubectl describe pod calico-node-rbrt7 -n calico-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned calico-system/calico-node-rbrt7 to k8s-control-plane-1
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "tigera-ca-bundle" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "kube-api-access-9xhks" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulling 19m kubelet Pulling image "docker.io/calico/pod2daemon-flexvol:v3.27.0"
Normal Pulled 19m kubelet Successfully pulled image "docker.io/calico/pod2daemon-flexvol:v3.27.0" in 7.193s (15.68s including waiting)
Normal Created 19m kubelet Created container flexvol-driver
Normal Started 19m kubelet Started container flexvol-driver
Normal Pulling 19m kubelet Pulling image "docker.io/calico/cni:v3.27.0"
Normal Pulled 18m kubelet Successfully pulled image "docker.io/calico/cni:v3.27.0" in 14.159s (14.159s including waiting)
Normal Created 18m kubelet Created container install-cni
Normal Started 18m kubelet Started container install-cni
Normal Pulling 18m kubelet Pulling image "docker.io/calico/node:v3.27.0"
Normal Pulled 18m kubelet Successfully pulled image "docker.io/calico/node:v3.27.0" in 15.738s (15.738s including waiting)
Normal Created 18m kubelet Created container calico-node
Normal Started 18m kubelet Started container calico-node
Warning Unhealthy 18m (x2 over 18m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
After first kube init or reboot of the system I get the
Warning Unhealthy 42m kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning Unhealthy 42m kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 15m (x3 over 16m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Seems its something to do with typha but not 100%. I increased the logging and here is the full log.
2024-01-12 12:35:18.243 [INFO][890] tunnel-ip-allocator/config_params.go 657: Parsed value for TyphaK8sNamespace: calico-system (from environment variable)
2024-01-12 12:35:18.243 [INFO][890] tunnel-ip-allocator/config_params.go 621: Parsing value for HealthPort: 9099 (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 657: Parsed value for HealthPort: 9099 (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 621: Parsing value for TyphaK8sServiceName: calico-typha (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 657: Parsed value for TyphaK8sServiceName: calico-typha (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 621: Parsing value for TyphaCAFile: /etc/pki/tls/certs/tigera-ca-bundle.crt (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/param_types.go 312: Looking for required file path="/etc/pki/tls/certs/tigera-ca-bundle.crt"
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 657: Parsed value for TyphaCAFile: /etc/pki/tls/certs/tigera-ca-bundle.crt (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 621: Parsing value for TyphaKeyFile: /node-certs/tls.key (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/param_types.go 312: Looking for required file path="/node-certs/tls.key"
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config_params.go 657: Parsed value for TyphaKeyFile: /node-certs/tls.key (from environment variable)
2024-01-12 12:35:18.244 [INFO][890] tunnel-ip-allocator/config.go 63: Found FELIX_TYPHAK8SSERVICENAME=calico-typha
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/config.go 63: Found FELIX_TYPHAK8SNAMESPACE=calico-system
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/config.go 63: Found FELIX_TYPHAKEYFILE=/node-certs/tls.key
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/config.go 63: Found FELIX_TYPHACERTFILE=/node-certs/tls.crt
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/config.go 63: Found FELIX_TYPHACAFILE=/etc/pki/tls/certs/tigera-ca-bundle.crt
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/config.go 63: Found FELIX_TYPHACN=typha-server
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/discovery.go 179: Creating Kubernetes client for Typha discovery...
2024-01-12 12:35:18.245 [INFO][890] tunnel-ip-allocator/discovery.go 195: (Re)discovering Typha endpoints using the Kubernetes API...
2024-01-12 12:35:18.250 [ERROR][890] tunnel-ip-allocator/discovery.go 235: Didn't find any ready Typha instances.
2024-01-12 12:35:18.250 [FATAL][890] tunnel-ip-allocator/startsyncerclient.go 49: Typha discovery enabled but discovery failed. error=Kubernetes service missing IP or port
2024-01-12 12:35:18.911 [INFO][897] status-reporter/startup.go 445: Early log level set to info
2024-01-12 12:35:18.912 [INFO][897] status-reporter/config.go 63: Found FELIX_TYPHAK8SSERVICENAME=calico-typha
2024-01-12 12:35:18.912 [INFO][897] status-reporter/config.go 63: Found FELIX_TYPHAK8SNAMESPACE=calico-system
2024-01-12 12:35:18.912 [INFO][897] status-reporter/config.go 63: Found FELIX_TYPHAKEYFILE=/node-certs/tls.key
2024-01-12 12:35:18.912 [INFO][897] status-reporter/config.go 63: Found FELIX_TYPHACERTFILE=/node-certs/tls.crt
2024-01-12 12:35:18.912 [INFO][897] status-reporter/config.go 63: Found FELIX_TYPHACAFILE=/etc/pki/tls/certs/tigera-ca-bundle.crt
2024-01-12 12:35:18.912 [INFO][897] status-reporter/config.go 63: Found FELIX_TYPHACN=typha-server
2024-01-12 12:35:18.912 [INFO][897] status-reporter/discovery.go 179: Creating Kubernetes client for Typha discovery...
2024-01-12 12:35:18.912 [INFO][897] status-reporter/discovery.go 195: (Re)discovering Typha endpoints using the Kubernetes API...
2024-01-12 12:35:18.916 [ERROR][897] status-reporter/discovery.go 235: Didn't find any ready Typha instances.
2024-01-12 12:35:18.916 [FATAL][897] status-reporter/startsyncerclient.go 49: Typha discovery enabled but discovery failed. error=Kubernetes service missing IP or port
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 63: Found FELIX_TYPHAK8SSERVICENAME=calico-typha
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 63: Found FELIX_TYPHAK8SNAMESPACE=calico-system
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 63: Found FELIX_TYPHAKEYFILE=/node-certs/tls.key
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 63: Found FELIX_TYPHACERTFILE=/node-certs/tls.crt
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 63: Found FELIX_TYPHACAFILE=/etc/pki/tls/certs/tigera-ca-bundle.crt
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 63: Found FELIX_TYPHACN=typha-server
2024-01-12 12:35:18.924 [INFO][902] confd/config.go 81: Skipping confd config file.
2024-01-12 12:35:18.924 [INFO][902] confd/run.go 18: Starting calico-confd
2024-01-12 12:35:18.929 [WARNING][902] confd/winutils.go 141: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-01-12 12:35:18.929 [INFO][902] confd/client.go 1484: Advertise global service ranges from this node
2024-01-12 12:35:18.929 [INFO][902] confd/client.go 1415: Updated with new cluster IP CIDRs: []
2024-01-12 12:35:18.929 [INFO][902] confd/client.go 1484: Advertise global service ranges from this node
2024-01-12 12:35:18.929 [INFO][902] confd/client.go 1406: Updated with new external IP CIDRs: []
2024-01-12 12:35:18.929 [INFO][902] confd/client.go 1484: Advertise global service ranges from this node
2024-01-12 12:35:18.929 [INFO][902] confd/client.go 1439: Updated with new Loadbalancer IP CIDRs: []
2024-01-12 12:35:18.929 [INFO][902] confd/discovery.go 179: Creating Kubernetes client for Typha discovery...
2024-01-12 12:35:18.929 [INFO][902] confd/discovery.go 195: (Re)discovering Typha endpoints using the Kubernetes API...
2024-01-12 12:35:18.931 [ERROR][902] confd/discovery.go 235: Didn't find any ready Typha instances.
2024-01-12 12:35:18.931 [FATAL][902] confd/startsyncerclient.go 49: Typha discovery enabled but discovery failed. error=Kubernetes service missing IP or port
bird: Unable to open configuration file /etc/calico/confd/config/bird.cfg: No such file or directory
bird: Unable to open configuration file /etc/calico/confd/config/bird6.cfg: No such file or directory
2024-01-12 12:35:19.216 [ERROR][721] felix/discovery.go 235: Didn't find any ready Typha instances.
2024-01-12 12:35:19.216 [ERROR][721] felix/daemon.go 355: Typha discovery enabled but discovery failed. error=Kubernetes service missing IP or port
2024-01-12 12:35:19.293 [INFO][912] tunnel-ip-allocator/param_types.go 718: StringSliceParam StringSliceParam raw="docker+"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "typhak8sservicename"="calico-typha"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "typhacafile"="/etc/pki/tls/certs/tigera-ca-bundle.crt"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "healthenabled"="true"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "typhacertfile"="/node-certs/tls.crt"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "typhacn"="typha-server"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "typhakeyfile"="/node-certs/tls.key"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "defaultendpointtohostaction"="ACCEPT"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "typhak8snamespace"="calico-system"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "healthport"="9099"
2024-01-12 12:35:19.294 [INFO][912] tunnel-ip-allocator/env_var_loader.go 40: Found felix environment variable: "ipv6support"="false"
2024-01-12 12:35:19.295 [INFO][912] tunnel-ip-allocator/config_params.go 490: Merging in config from environment variable: map[defaultendpointtohostaction:ACCEPT healthenabled:true healthport:9099 ipv6support:false typhacafile:/etc/pki/tls/certs/tigera-ca-bundle.crt typhacertfile:/node-certs/tls.crt typhacn:typha-server typhak8snamespace:calico-system typhak8sservicename:calico-typha typhakeyfile:/node-certs/tls.key]
2024-01-12 12:35:19.295 [INFO][912] tunnel-ip-allocator/config_params.go 621: Parsing value for TyphaCertFile: /node-certs/tls.crt (from environment variable)
2024-01-12 12:35:19.295 [INFO][912] tunnel-ip-allocator/param_types.go 312: Looking for required file path="/node-certs/tls.crt"
2024-01-12 12:35:19.295 [INFO][912] tunnel-ip-allocator/config_params.go 657: Parsed value for TyphaCertFile: /node-certs/tls.crt (from environment variable)
2024-01-12 12:35:19.295 [INFO][912] tunnel-ip-allocator/config_params.go 621: Parsing value for TyphaKeyFile: /node-certs/tls.key (from environment variable)