Need assistance in resolving 'Connection Timed Out' error when executing inside running pod

minsung4420 · September 9, 2023, 12:22pm

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:
For my Cluster, There are 3 nodes in cluster.

2 nodes are work well, But One nodes doesn’t work well.

I tired to access running pod using exec command like this : kubectl exec -n calico-system -it [pod_name] /bin/bash

However, for one nodes that has problem, it shows like below.

This situation also happen to kube-proxy, if i use exec command, it just pending and give 127.0.0.1:xxxx connection time out.

The describe of this pod:

Events:
Type Reason Age From Message

Normal Scheduled 5m31s default-scheduler Successfully assigned calico-system/calico-node-qmhps to ubuntu
Normal Pulled 5m30s kubelet Container image “docker.io/calico/pod2daemon-flexvol:v3.26.1” already present on machine
Normal Created 5m30s kubelet Created container flexvol-driver
Normal Started 5m30s kubelet Started container flexvol-driver
Normal Pulled 5m29s kubelet Container image “docker.io/calico/cni:v3.26.1” already present on machine
Normal Created 5m29s kubelet Created container install-cni
Normal Started 5m29s kubelet Started container install-cni
Warning Unhealthy 5m27s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:14.483764 61 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 5m26s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:15.487428 371 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 4m55s (x3 over 5m15s) kubelet Readiness probe failed: command “/bin/calico-node -bird-ready -felix-ready” timed out
Normal Created 4m50s (x2 over 5m28s) kubelet Created container calico-node
Normal Started 4m50s (x2 over 5m28s) kubelet Started container calico-node
Normal Pulled 4m50s (x2 over 5m28s) kubelet Container image “docker.io/calico/node:v3.26.1” already present on machine
Warning Unhealthy 4m50s (x3 over 5m10s) kubelet Liveness probe failed: Get “http://localhost:9099/liveness”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Killing 4m50s kubelet Container calico-node failed liveness probe, will be restarted
Warning Unhealthy 4m50s kubelet Readiness probe failed: 2023-09-08 13:07:51.192 [INFO][523] confd/health.go 180: Number of node(s) with BGP peering established = 2
W0908 13:07:51.187693 523 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 4m50s kubelet Readiness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to create exec “10f418b975084fd2b587cab91d3df175b0e08003ab1d756cfc9a3645ddd3d804”: task 5c55e7e8fa6a9d841758e8866f23382c50d290d1b651efd3a72618434e9d62a3 not found: not found
Warning Unhealthy 4m50s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:51.554603 23 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning Unhealthy 4m49s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W0908 13:07:52.589282 85 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Warning BackOff 30s (x7 over 90s) kubelet Back-off restarting failed container calico-node in pod calico-node-qmhps_calico-system(eaa1e1ba-190f-4fe7-83e6-343dc8cd2dae)`

The log of this pod doesn’t give any Warning or Fatal for me. This is last some line of error

bird: Mesh_: Connected to table master
bird: Mesh_: State changed to wait
bird: Mesh_: Connected to table master
bird: Mesh_: State changed to wait
bird: Graceful restart done
bird: Mesh_: State changed to feed
bird: Mesh_: State changed to feed
bird: Mesh_: Invalid NEXT_HOP attribute in route 192.168.163.0/26
bird: Mesh_: State changed to up
bird: Mesh_: Invalid NEXT_HOP attribute in route 192.168.157.0/26
bird: Mesh_: State changed to up

And, One more strange things is that the calico-typha created once more.

Kubernetes version: v1.27.4
Cloud being used: On local server
Installation method: kubeadm

Host OS: Ubuntu 20.06.04, 20.06.06
CNI and version: Calico, v3.26.1
CRI and version: Containerd 1.6.22, 1.6.21
The version of Containerd: 1.6.22 (for normal 2 nodes), 1.6.21 (for strange 1 nodes)

Topic		Replies	Views
Calico reporting errors during the deployment of the K8S binary cluster General Discussions development , podcast , network	0	382	November 20, 2024
Some kubectl commands are timing out General Discussions	0	5297	June 11, 2020
K8S cluster deployment Calico error Chinese development , network	0	133	October 24, 2024
Calico network issue, TCP answer packages not coming back into Pod General Discussions	0	799	May 27, 2019
Kubectl connection refused intermittently General Discussions	8	8914	February 21, 2023

Need assistance in resolving 'Connection Timed Out' error when executing inside running pod

Related topics