Traffic to a Pod located in a Dead Node

3 Nodes with Masters+Workers in Centos 7.5

Kubernetes version: 14.4
Cloud being used: bare-metal
Installation method: Kubeadm
Host OS: Centos 7.5
CNI and version: Flannel 0.11.0-amd64
CRI and version: Docker 18.06.1-ce, build e68fc7a

Hello, I have a 3 Nodes HA Kubernetes cluster where all nodes are Master. I am facing a behavior that I would like to avoid.

When I was doing an HA Test I have removed the NIC interface from one of the nodes and see the hole cluster takes at most 2 minutes to realize the node is down. I have faced situations where it took more than that, like 18 minutes.

During this time the services is still sending traffic to a Pod located in the Dead Host and as I have it replicated in the other two I am receiving 66% of Success responses and 33% of Failure responses. When the Cluster realizes the node is down, after a pod eviction time it will Terminate the pod that is running in the Dead Node and the traffic to this Pod is stoped.

Is there a way to allow kubernetes to stop sending requests if that Node where the Pod is located doing any custom configuration or using any custom solution?

By the way, I am using Istio Ingress Gateway and Envoy Proxies to routing my requests to the Pod, but I could not achieve this with Circuit Break.

18 minutes is really long, that is definitely weird.

There are several ways to tackle this problem that I can see.

  1. Tune kube controller manager flags and lower grace periods and those flags (but as a general recommendation be careful and understand the trade offs in the changes)

  2. If you are using an Ingress controller, some have health checks also. For example, in contour you can configure health checks and in that case the requests routed should be only until the next health check runs (you can configure the threshold and period, etc.). I think ingress nginx has health checks too

  3. Investigate how istio can help. I have no experience with that, though :-/

I could bypass part of the issue using Istio and retries and seems Istio stops sending traffic to the node that is NotReady but Kubernetes thinks it is Ready.

Mu big problem is that I have an application that is Statefull and I have just one instance of that application. When the Kubernetes goes to that situation where the Node goes down and Kubernetes takes almost 20 minutes to realize it, my app becomes useless until it realizes the Node is NotReady and migrate the application to another Node.

I am not 100% sure but I think that happens when I remove the NIC interface from a machine that is running the Control Plane, I mean, the machine that locks the kube-control-manager endpoint

[root@k8s127 ~]# kubectl -n kube-system get endpoints kube-controller-manager -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"k8s128.ept.lab_9d0437af-c59a-11e9-ae69-005056bba6c8","leaseDurationSeconds":15,"acquireTime":"2019-08-23T12:01:37Z","renewTime":"2019-08-23T12:56:00Z","leaderTransitions":226}'
  creationTimestamp: "2019-05-22T03:39:46Z"
  name: kube-controller-manager
  namespace: kube-system
  resourceVersion: "15722681"
  selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
  uid: 3e6bf197-7c43-11e9-bd3f-005056bb3932

Is it a known issue in kubernetes?

Regards