API very slow, often times out

jarrod · December 19, 2022, 6:53am

Hi,

I am having an issue with my microk8s cluster. API calls are incredibly slow, and I can’t figure out the root cause. The system has been running fine for almost a year, until this issue.

Calls will run sometimes, but take a long time. Other times they will time out. journalctl on all the machines also shows lots of timeouts. I also get random “couldn’t get the resource list for…”. Whether this happens, and what it is for, changes a lot.

For example:

jarrod@storage01:~$ time microk8s kubectl get ns
E1219 06:38:25.397755   73512 memcache.go:255] couldn't get resource list for cert-manager.io/v1: the server could not find the requested resource
E1219 06:38:25.397889   73512 memcache.go:255] couldn't get resource list for acme.cert-manager.io/v1: the server could not find the requested resource
E1219 06:38:25.399638   73512 memcache.go:255] couldn't get resource list for traefik.containo.us/v1alpha1: the server could not find the requested resource
E1219 06:38:30.393967   73512 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1219 06:38:35.396371   73512 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1219 06:38:40.400431   73512 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1219 06:38:45.403532   73512 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
NAME                 STATUS   AGE
kube-system          Active   329d
kube-public          Active   329d
kube-node-lease      Active   329d
default              Active   329d
rook-ceph            Active   329d
monitoring           Active   329d
metallb-system       Active   324d
traefik              Active   324d
ssh                  Active   303d
samba                Active   316d
container-registry   Active   238d
cert-manager         Active   231d
hello-world          Active   231d
honeycomb            Active   136d
devenv               Active   2d3h

real	0m35.609s
user	0m0.136s
sys	0m0.051s

It doesn’t matter which host I run this on. I have not deployed a load balancer in front of the API yet.

DNS is working fine, all the names resolve reliably.

I’m not sure what to look at to figure this out, pointers would be appreciated.

I can’t see how to attach files here, so I have uploaded the inspect reports to google drive: microk8s-inspect-reports - Google Drive

That is four of the six nodes; the other two haven’t finished creating the reports.

parandhama_reddy · January 22, 2023, 3:11pm

did you find out the solution for this issue

jarrod · January 22, 2023, 11:50pm

No, it’s still an issue, and I have no idea where to even look further.

allardkrings · March 20, 2023, 9:23am

Have the same issue, verry annoying!

Dunge · March 22, 2023, 6:18pm

I believe I have the same issue, but much worse. It was fast, and the next day everything crashed, nothing can be done on the cluster any more. Getting a lot of these memcached.go the server is currently unable to handle the request errors too.

I believe this is due to k8s-dqlite process using 100% CPU, but why?

github.com/canonical/microk8s

Nodes crashed. Kubelet keeps restarting. k8s-dqlite 100% cpu. node.kubernetes.io/unreachable:NoSchedule taint

opened 10:33PM - 20 Mar 23 UTC

Dunge

#### Summary **Ubuntu 20.04.5 LTS microk8s 1.26.1 installed via snap 3 nodes …HA clusters** I came back one morning and my cluster that was working fine the previous day was completely down. Unlike my previous similar experience, this time the disks still have space remaining. ``` microk8s status microk8s is not running. Use microk8s inspect for a deeper inspection. ``` At this point `microk8s inspect` would just freeze at the `Gathering system information` step after printing that the services were running. I then launched a `microk8s start`, nothing for a few minutes, then exit with no message. status still say not running. I tried to check the logs using `sudo journalctl -u snap.microk8s.daemon-kubelite` but there’s too much stuff, couldn’t find anything relevant. I then rebooted the nodes `sudo reboot` and they actually came back online status Ready. The condition look good: ``` Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message NetworkUnavailable False Thu, 23 Feb 2023 04:36:12 +0000 Thu, 23 Feb 2023 04:36:12 +0000 CalicoIsUp Calico is running on this node MemoryPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled ``` But there is a Taint preventing any workload to get scheduled on them: ``` Taints: http://node.kubernetes.io/unreachable:NoSchedule ``` Events on pods: ``` 0/3 nodes are available: 3 node(s) had untolerated taint {http://node.kubernetes.io/unreachable: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling… ``` Events seems to crash the kubelet service on loop: ``` Warning InvalidDiskCapacity 9m38s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientPID 9m38s kubelet Node jldocker-1 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 9m38s kubelet Updated Node Allocatable limit across pods Normal NodeHasNoDiskPressure 9m38s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientMemory 9m38s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory Normal Starting 9m38s kubelet Starting kubelet. Normal Starting 7m7s kubelet Starting kubelet. Warning InvalidDiskCapacity 7m7s kubelet invalid capacity 0 on image filesystem Normal NodeHasNoDiskPressure 7m6s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure Normal NodeAllocatableEnforced 7m6s kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientPID 7m6s kubelet Node jldocker-1 status is now: NodeHasSufficientPID Normal NodeHasSufficientMemory 7m6s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory Normal Starting 2m57s kubelet Starting kubelet. Warning InvalidDiskCapacity 2m57s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory 2m57s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 2m57s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 2m57s kubelet Node jldocker-1 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 2m57s kubelet Updated Node Allocatable limit across pods ``` The "Invalid Disk Capacity" part seems to be normal, because the pods handling the disk CSI (Longhorn) are not running because of the node taint. Any command I send to microk8s kubectl print some error message about memcache.go: ``` E0320 20:41:19.730002 50241 memcache.go:255] couldn’t get resource list for http://metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0320 20:41:19.733484 50241 memcache.go:106] couldn’t get resource list for http://metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0320 20:41:19.735017 50241 memcache.go:106] couldn’t get resource list for http://metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0320 20:41:19.738322 50241 memcache.go:106] couldn’t get resource list for http://metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0320 20:41:20.827654 50241 memcache.go:106] couldn’t get resource list for http://metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0320 20:41:20.830214 50241 memcache.go:106] couldn’t get resource list for http://metrics.k8s.io/v1beta1: the server is currently unable to handle the request ``` Any idea what is causing this? From my point of view, nodes seems healthy, ram, cpu and disk are good, networking between them is also functional. #### Introspection Report Here is my microk8s inspect tarball file hosted on my Google Drive: https://drive.google.com/file/d/1LEaCI3O2Y82OI4Q2quo-uIBD-zV8rZBA/view?usp=sharing

Topic		Replies	Views
Nodes crashed. node.kubernetes.io/unreachable:NoSchedule taint microk8s microk8s	3	4993	March 22, 2023
Microk8s memory increasing microk8s	4	5082	November 26, 2020
Pi4 1.20 nodes keep failing after a few days on a fairly stock install microk8s	4	2004	February 3, 2021
Troubleshooting microk8s docs	6	11684	March 21, 2025
Microk8s 1.21 fails to start after controlled cluster shutdown and restart microk8s	2	3646	April 30, 2022

API very slow, often times out

Related topics