Some kubectl commands are timing out

I have taken over two clusters (stage & prod) and learning kubes as I go. A problem has manifested on stage cluster and I’m not sure how to even begin troubleshooting; any pointers would be appreciated.

I can’t reliably communicate with the cluster.

Some commands are working ok:

$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                                                       READY   STATUS      RESTARTS   AGE
kube-system     aws-spot-handler-k8s-spot-termination-handler-4njjl                        1/1     Running     0          48d
kube-system     aws-spot-handler-k8s-spot-termination-handler-5dmfq                        1/1     Running     0          49d
kube-system     aws-spot-handler-k8s-spot-termination-handler-5kpck                        1/1     Running     0          69d
...

While other commands always return an error:

kubectl get nodes
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)

From what I can tell; any commands involving nodes will fail/timeout. Prior to this problem I had noticed a few calico pods were failing readiness probes. So I’m making a vague assumption that problem is due to failing networking due to failing calico pods. Being a Kubes newbie I’m not sure where to even start.

Since it’s the stage cluster, I was happy to drain nodes with unhealthy calico pods and let kubes spin up new nodes, but the drain command keeps timing out as well.

As a side note, I had also noticed that some of the calico pods are using a “lot” of memory (over 4Gb), and am wondering if that’s related to them failing readiness probes.

Suggestions on how to troubleshoot??