Kubespray not working with loadbalancer type and external IP on Ingress Conroller on IPVS

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.19
Cloud being used: Bare Metal
Installation method: Kubespray
Host OS: Centos7, Centos8, Ubuntu 20.04
CNI and version: Kube-router 1.2
CRI and version: docker 19.03

Machines: 2

I’ve checked out latest master of Kubespray (27/03/20201) and configured for 1 control pane only in hosts file. Thereafter, I modified k8s-cluster config to install “kube-router” and use “ipvs”.

After completion of kubespray provisioning - I run “kubectl get nodes” and the control pane and worker node are in ready state and working with no errors. Also checked “journalctl -xe” and there were no errors.

If I do curl “masterip:6443” on both the control pane and worker node I get

Client sent an HTTP request to an HTTPS server.

So far so good and beautfilly connected and working.

However, the BIG issue arises…

Now I deploy an ingress controller NGINX - tried via both helm and also kubectl method…but before I deploy it, I put a watch on worker node of the curl command and then deployed it.

curl “masterip:6443”

I get for a few seconds

Client sent an HTTP request to an HTTPS server.

As it deploys suddenly get the following on worker node:

curl: (7) Failed connect to masterip:6443; Connection refused

And now if I do kubectl get nodes I see worker node in “notready” state. So I do “journalctl -xe” and I see lines full of

10s": dial tcp masterip:6443: connect: connection refused

To be noted:

  1. If I go back to the control pane and other machines in the same network run the curl I get:

curl “masterip:6443”

Client sent an HTTP request to an HTTPS server.

Which says it’s reachable on 6443, but inside worker node suddenly is no longer reachable.

Why was worker node able to reach, then suddenly it’s not reaching? What is doing this and how to fix?

  1. Working on iptables mode (not ipvs) - works completely fine.
  2. This same behaviour occurs on Centos7 and Centos8.
  3. I’ve run kube-proxy cleanup and still occurs
  4. I’ve run iptables -F for a test purpose - this I can’t curl inside worker node port 6443 of masterip, but I can access from any other machines.
  5. If I run on worker node “ipvsadm -ln”
    → masterip:6443 Masq 1 0 2

I see in the InActConn column 2 - why?

I’ve tested using other CNI e.g. flannel also occurs.

Summary: Cluster provisioning works until as soon as ingress controller is deployed e.g. nginx, the worker node stops being able to access masterip:6443 - prior to deployment it can access.

Please let me know for further information and how to resolve.

Want to add to above - I setup 2 fresh 20.04.2 LTS boxes (1 single control pane and 1 worker node) and did the same thing above and got same result - as soon as I deploy nginx ingress controller, the worker node stops being able to connect to port 6443 on master…really baffled.

To be noted - I’m using the master (control pane) external ip to host as ingress controller - could this be causing it? Are masters no longer able to host nginx ingress controller on port 80 / 443 as well as be master control pane? (I have working on 1.18 on centos8 - but from 1.19 no longer works if that is correct assumption of issue here).