Who/where actually work liveness probe in kubernetes?


#1

in my kubernetes cluster, http liveness probe always failed with this message

Liveness probe failed: Get http://10.233.90.72:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

so, coredns and kubernetes-dashboard (any other using http liveness probe) pods being infinitely restart.

while pod running (between events start and restart), i check the endpoints for the pod with executing command curl http://10.233.90.72:8080/health on the busyboxplus pod. this command are working normally, i can see OK return. but liveness probe still failed. pod is restarting…

in this situation, i want to debug liveness probe, but i don’t have any idea who/where actually work liveness probe in kubernetes? is this pod? or node?

how can i debug liveness probe? does anyone have same issue…?

please advice for me.

kubectl versions:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

version info:
 OS: Ubuntu 18.04
 Kubernetes: 1.13.3
 Docker: 18.09.2

i also asked on stackoverflow. https://stackoverflow.com/questions/54702668/who-where-actually-work-liveness-probe-in-kubernetes

thanks in advance


#2

Can you access the pod IPs from the nodes themselves? Kubelet doesn’t (generally) run in a Pod.


#3

I think that the docs say the kubelet (i.e the node) runs those probes.

The root problem may vary, there might be too little cpu for them, be overloaded, etc. Do you have metrics to try to understand what is happening in the cluster?


#4

yes, in the node, i can access the pod ip and health check url
in the node.

curl http://10.233.123.147:8080/health
OK

#5

how can i see metrics…?
i have 3 machine(actually vm) for kubernetes cluster. they have 4core cpu, 4gb mem, 50gb ssd.


#6

omg…!!
i have 3 nodes, node1 node2 node3
and i have 2 coredns pods. they actually exist in node1, node2.
when i test in node1, i can access coredns pod in the node2, but i can not access coredns pod in the node1!!!..

for example

in node1, coredns1 - 1.1.1.1
in node2, coredns2 - 2.2.2.2

in node1. 
  access 1.1.1.1:8080/health -> timeout
  access 2.2.2.2:8080/health -> ok

in node2. 
  access 1.1.1.1:8080/health -> ok
  access 2.2.2.2:8080/health -> timeout

real example

traceroute node1
10.233.90.73 -> in node1
10.233.66.14 -> in node2

root@node1:  traceroute 10.233.66.14
traceroute to 10.233.66.14 (10.233.66.14), 64 hops max
  1   10.233.66.0  2.670ms  3.469ms  3.941ms
  2   10.233.66.14  1.403ms  0.345ms  0.236ms

root@node1: traceroute 10.233.90.73
traceroute to 10.233.90.73 (10.233.90.73), 64 hops max
  1   *  *  *
  2   *  * ^C

i use calico for cni

how can i fix it?


#7

Check with the Calico folks. I suspected this was your failure mode.


#8

ok thanks!