Major problems with kubernetes cluster

Hello. Having a major problem running a multi node cluster of Kubernetes on docker 20.10. I’m running on 4 vsphere Redhat 8 hosts, with the following: calico/flannel (both tried), metallb, nginx ingress, cgroupfs docker driver (but also tried systemd as recommended with the same problem) as a base setup.

Everything goes fine with the cluster running solidly (most recently running a kubeapps installation) until a random point (sometimes when running a port forward to access a service in the browser) when the command on linux freezes. On trying to login to the server again, the login prompt no longer appears. Kubernetes still runs in the background but I’m no longer able to access the server using ssh.

Usually it’s the master node that this happens on, but also the second node had a similar problem. While running traceroute, it appears that traffic for the first node is running through the fourth node (but that may not be related).

Are there any issues with vsphere virtual machines, calico/flannel, metallb or nginx ingress?

According to a colleague, it appears the network service is causing the problem as during reboot the service just stops working.Has anybody heard of this problem and is there a solution?

Cluster information:

Kubernetes version: 17
Cloud being used: bare metal
Installation method: manual
Host OS: Redhat 8