Konnectivity Agent failure scenario in node

Mr_TK · December 20, 2022, 5:13pm

Hello Community,

Being a newbie here in this forum but not with the kubernetes stuffs, kindly apologize for any noise.
I was going through the konnectivity server & agent docs to understand the communication between the control plane & nodes. From the design perspective, what I can see is, we need to have the konnectivity agent running as a daemon set in all the nodes and based on the below github issue, I think if the konnectivity agent on the node fails, we might lose kubectl exec/logs/cp & other commands even though our kubelet guy is healthy. The failure I’m more concerned here is the one that occurs due to volume mounts/crio/imagepull and all other failures excluding network related(node to node communication majorly).

github.com/kubernetes-sigs/apiserver-network-proxy

Server tries to route traffic through "hard" disconnected agents

opened 11:58AM - 15 Oct 20 UTC

closed 08:00PM - 04 Feb 21 UTC

Avatat

lifecycle/stale

Hello, I have Kubernetes M…ost of the time, everything works perfectly, but after ~20 days of konnectivity-server uptime, some DIAL_REQ are getting nowhere. `kubectl logs -n kube-system konnectivity-agent-h6bt9 -f` only worked after the third execution. 1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0] 1 server.go:257] start serving frontend stream 1 server.go:268] >>> Received DIAL_REQ 1 backend_manager.go:170] pick agentID=f42e628f-f3a5-42cc-bb75-8065768b1be1 as backend 1 server.go:290] >>> DIAL_REQ sent to backend 1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0] 1 server.go:257] start serving frontend stream 1 server.go:268] >>> Received DIAL_REQ 1 backend_manager.go:170] pick agentID=f42e628f-f3a5-42cc-bb75-8065768b1be1 as backend 1 server.go:290] >>> DIAL_REQ sent to backend 1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0] 1 server.go:257] start serving frontend stream 1 server.go:268] >>> Received DIAL_REQ 1 backend_manager.go:170] pick agentID=15fb1f4a-55a5-454d-9257-a717e70dc5dd as backend 1 server.go:290] >>> DIAL_REQ sent to backend 1 server.go:522] <<< Received DIAL_RSP(rand=2352227733928774978), agentID 15fb1f4a-55a5-454d-9257-a717e70dc5dd, connID 84) 1 server.go:144] register frontend &{grpc 0xc020108df0 <nil> 0xc02cc82960 84 15fb1f4a-55a5-454d-9257-a717e70dc5dd {13824409201659358837 1806906010189152 0x227f060} 0xc02cd49da0} for agentID 15fb1f4a-55a5-454d-9257-a717e70dc5dd, connID 84 1 server.go:308] >>> Received 239 bytes of DATA(id=84) 1 server.go:324] >>> DATA sent to Backend 1 server.go:551] <<< Received 2171 bytes of DATA from agentID 15fb1f4a-55a5-454d-9257-a717e70dc5dd, connID 84 1 client.go:271] [tracing] recv packet, type: DIAL_REQ 1 client.go:280] received DIAL_REQ 1 client.go:271] [tracing] recv packet, type: DATA 1 client.go:339] received DATA(id=84) 1 client.go:413] [connID: 84] write last 239 data to remote 1 client.go:384] received 2171 bytes from remote for connID[84] 1 client.go:271] [tracing] recv packet, type: DATA 1 client.go:339] received DATA(id=84) 1 client.go:413] [connID: 84] write last 64 data to remote 1 client.go:271] [tracing] recv packet, type: DATA 1 client.go:339] received DATA(id=84) 1 client.go:413] [connID: 84] write last 203 data to remote 1 client.go:384] received 106 bytes from remote for connID[84] 1 client.go:384] received 183 bytes from remote for connID[84] I want to view logs from the second agent, I'm getting these warnings: 1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0] 1 server.go:257] start serving frontend stream 1 server.go:268] >>> Received DIAL_REQ 1 backend_manager.go:170] pick agentID=d700cd66-6ae4-416f-89be-4523adda93c3 as backend 1 server.go:288] >>> DIAL_REQ to Backend failed: rpc error: code = Unavailable desc = transport is closing 1 server.go:290] >>> DIAL_REQ sent to backend 1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0] 1 server.go:257] start serving frontend stream 1 server.go:268] >>> Received DIAL_REQ 1 backend_manager.go:170] pick agentID=d700cd66-6ae4-416f-89be-4523adda93c3 as backend 1 server.go:288] >>> DIAL_REQ to Backend failed: rpc error: code = Unavailable desc = transport is closing 1 server.go:290] >>> DIAL_REQ sent to backend konnectivity-server (+ kube-apiserver) pod restart would help (because it helped in the past), but I don't want to do it, because maybe you will need some more tests.

In such cases, control plane components lose the connection with the node and why doesn’t the connection isn’t being established from the other nodes where the agent is healthy and those agents are still able to route the traffic to the concerned node?

Topic		Replies	Views
Error No agent available while doing kubectl exec/logs General Discussions	0	442	January 2, 2025
Unable to join a node to an existing cluster, errors in daemon-cluster-agent log microk8s	1	746	June 2, 2021
Cattle-node-agent and Canal system pods stuck in Updating status General Discussions	1	607	April 12, 2021
After upgrading k8s to 1.18.20 one of the nodes throws connection errors with Jenkins agent General Discussions	0	383	October 10, 2023
[Kubernetes Troubleshooting] pods running in k8s cluster lost connectivity to services outside of the cluster General Discussions	0	457	October 28, 2020

Konnectivity Agent failure scenario in node

Related topics