Need assistance in resolving 'Connection Timed Out' error when executing inside running pod

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:
For my Cluster, There are 3 nodes in cluster.

2 nodes are work well, But One nodes doesn’t work well.

I tired to access running pod using exec command like this : kubectl exec -n calico-system -it [pod_name] /bin/bash

However, for one nodes that has problem, it shows like below.

This situation also happen to kube-proxy, if i use exec command, it just pending and give 127.0.0.1:xxxx connection time out.

The describe of this pod:

Events:
Type Reason Age From Message


Normal Scheduled 5m31s default-scheduler Successfully assigned calico-system/calico-node-qmhps to ubuntu
Normal Pulled 5m30s kubelet Container image “docker.io/calico/pod2daemon-flexvol:v3.26.1” already present on machine
Normal Created 5m30s kubelet Created container flexvol-driver
Normal Started 5m30s kubelet Started container flexvol-driver
Normal Pulled 5m29s kubelet Container image “docker.io/calico/cni:v3.26.1” already present on machine
Normal Created 5m29s kubelet Created container install-cni
Normal Started 5m29s kubelet Started container install-cni
Warning Unhealthy 5m27s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:14.483764 61 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 5m26s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:15.487428 371 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 4m55s (x3 over 5m15s) kubelet Readiness probe failed: command “/bin/calico-node -bird-ready -felix-ready” timed out
Normal Created 4m50s (x2 over 5m28s) kubelet Created container calico-node
Normal Started 4m50s (x2 over 5m28s) kubelet Started container calico-node
Normal Pulled 4m50s (x2 over 5m28s) kubelet Container image “docker.io/calico/node:v3.26.1” already present on machine
Warning Unhealthy 4m50s (x3 over 5m10s) kubelet Liveness probe failed: Get “http://localhost:9099/liveness”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Killing 4m50s kubelet Container calico-node failed liveness probe, will be restarted
Warning Unhealthy 4m50s kubelet Readiness probe failed: 2023-09-08 13:07:51.192 [INFO][523] confd/health.go 180: Number of node(s) with BGP peering established = 2
W0908 13:07:51.187693 523 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 4m50s kubelet Readiness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to create exec “10f418b975084fd2b587cab91d3df175b0e08003ab1d756cfc9a3645ddd3d804”: task 5c55e7e8fa6a9d841758e8866f23382c50d290d1b651efd3a72618434e9d62a3 not found: not found
Warning Unhealthy 4m50s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:51.554603 23 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 4m49s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:52.589282 85 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning BackOff 30s (x7 over 90s) kubelet Back-off restarting failed container calico-node in pod calico-node-qmhps_calico-system(eaa1e1ba-190f-4fe7-83e6-343dc8cd2dae)`

The log of this pod doesn’t give any Warning or Fatal for me. This is last some line of error

bird: Mesh_: Connected to table master
bird: Mesh_
: State changed to wait
bird: Mesh_: Connected to table master
bird: Mesh_
: State changed to wait
bird: Graceful restart done
bird: Mesh_: State changed to feed
bird: Mesh_
: State changed to feed
bird: Mesh_: Invalid NEXT_HOP attribute in route 192.168.163.0/26
bird: Mesh_
: State changed to up
bird: Mesh_: Invalid NEXT_HOP attribute in route 192.168.157.0/26
bird: Mesh_
: State changed to up

And, One more strange things is that the calico-typha created once more.

Kubernetes version: v1.27.4
Cloud being used: On local server
Installation method: kubeadm

Host OS: Ubuntu 20.06.04, 20.06.06
CNI and version: Calico, v3.26.1
CRI and version: Containerd 1.6.22, 1.6.21
The version of Containerd: 1.6.22 (for normal 2 nodes), 1.6.21 (for strange 1 nodes)