API Server looping fail to start

Total newb, sorry…

I created a 3 node cluster without issue and was able to run a few “hello world” type pods. After a couple of days my cluster was not running and I was not able to get it to start. microk8s start took a long time to complete and after it did “microk8s status” would report “microk8s is not running. Use microk8s inspect for a deeper inspection.”

Inspect would intermittently report that everything was running or that the api-server had failed, in either case the api-server was always unreachable.

the journal for api server shows this contiguous loop that repeats about once every second:

snap.microk8s.daemon-apiserver.service: Main process exited, code=killed, status=11/SEGV
snap.microk8s.daemon-apiserver.service: Failed with result ‘signal’.
snap.microk8s.daemon-apiserver.service: Scheduled restart job, restart counter is at 4.
Stopped Service for snap application microk8s.daemon-apiserver.
Started Service for snap application microk8s.daemon-apiserver.

snap.microk8s.daemon-apiserver.service: Main process exited, code=killed, status=11/SEGV
snap.microk8s.daemon-apiserver.service: Failed with result ‘signal’.
snap.microk8s.daemon-apiserver.service: Scheduled restart job, restart counter is at 5.
Stopped Service for snap application microk8s.daemon-apiserver.

where do I go from here to recover it?

There is this post on recovering failing nodes.

Alternatively you can try to microk8s.remove-node --force the failing node and add a new one.

Hi,

Today I have the same issue with a cluster with two nodes - apiserver dies with status=11/SEGV. So HA is not enabled and I could not get valid backend files from another node. How to recover in this case?

backend folder:
/var/snap/microk8s/current/var/kubernetes/backend# ls -la
total 355788
drwxrwx— 2 root microk8s 4096 Oct 22 08:28 ./
drwxr-xr-x 3 root root 4096 Oct 20 12:16 …/
-rw-rw---- 1 root microk8s 4099760 Oct 22 06:43 0000000000949134-0000000000949304
-rw-rw---- 1 root microk8s 8380624 Oct 22 06:44 0000000000949305-0000000000949879
-rw-rw---- 1 root microk8s 8381480 Oct 22 06:44 0000000000949880-0000000000950465
-rw-rw---- 1 root microk8s 8387024 Oct 22 06:45 0000000000950466-0000000000951071
-rw-rw---- 1 root microk8s 7933552 Oct 22 06:45 0000000000951072-0000000000951650
-rw-rw---- 1 root microk8s 8378272 Oct 22 06:48 0000000000951651-0000000000952024
-rw-rw---- 1 root microk8s 8378600 Oct 22 06:49 0000000000952025-0000000000952570
-rw-rw---- 1 root microk8s 8378528 Oct 22 06:49 0000000000952571-0000000000953172
-rw-rw---- 1 root microk8s 8387816 Oct 22 06:49 0000000000953173-0000000000953789
-rw-rw---- 1 root microk8s 8387096 Oct 22 06:50 0000000000953790-0000000000954396
-rw-rw---- 1 root microk8s 8362400 Oct 22 06:50 0000000000954397-0000000000955002
-rw-rw---- 1 root microk8s 3477568 Oct 22 06:50 0000000000955003-0000000000955252
-rw-rw---- 1 root microk8s 8361432 Oct 22 06:53 0000000000955253-0000000000955617
-rw-rw---- 1 root microk8s 8369672 Oct 22 06:53 0000000000955618-0000000000956153
-rw-rw---- 1 root microk8s 3927640 Oct 22 06:54 0000000000956154-0000000000956441
-rw-rw---- 1 root microk8s 8386632 Oct 22 06:56 0000000000956442-0000000000956814
-rw-rw---- 1 root microk8s 8385008 Oct 22 06:58 0000000000956815-0000000000957221
-rw-rw---- 1 root microk8s 8378672 Oct 22 06:59 0000000000957222-0000000000957825
-rw-rw---- 1 root microk8s 3780080 Oct 22 06:59 0000000000957826-0000000000958116
-rw-rw---- 1 root microk8s 8295576 Oct 22 07:01 0000000000958117-0000000000958480
-rw-rw---- 1 root microk8s 48 Oct 22 07:01 0000000000958481-0000000000958481
-rw-rw---- 1 root microk8s 48 Oct 22 07:02 0000000000958482-0000000000958482
-rw-rw---- 1 root microk8s 808 Oct 22 07:49 0000000000958483-0000000000958502
-rw-rw---- 1 root microk8s 48 Oct 22 07:49 0000000000958503-0000000000958503
-rw-rw---- 1 root microk8s 48 Oct 22 07:49 0000000000958504-0000000000958504
-rw-rw---- 1 root microk8s 48 Oct 22 08:28 0000000000958505-0000000000958505
-rw-rw---- 1 root microk8s 2216 Oct 20 12:16 cluster.crt
-rw-rw---- 1 root microk8s 3272 Oct 20 12:16 cluster.key
-rw-rw---- 1 root microk8s 67 Oct 22 08:28 cluster.yaml
-rw-rw---- 1 root microk8s 61 Oct 20 12:24 info.yaml
srw-rw---- 1 root microk8s 0 Oct 22 06:59 kine.sock=
-rw-rw---- 1 root microk8s 32 Oct 20 12:16 metadata1
-rw-rw---- 1 root microk8s 75700904 Oct 22 06:54 snapshot-1-956427-151041951
-rw-rw---- 1 root microk8s 72 Oct 22 06:54 snapshot-1-956427-151041951.meta
-rw-rw---- 1 root microk8s 75251824 Oct 22 06:58 snapshot-1-957451-151331063
-rw-rw---- 1 root microk8s 72 Oct 22 06:58 snapshot-1-957451-151331063.meta
-rw-rw---- 1 root microk8s 64442368 Oct 22 07:01 snapshot-1-958476-151508802
-rw-rw---- 1 root microk8s 72 Oct 22 07:01 snapshot-1-958476-151508802.meta

These specific files seem awfully small, you could try moving them out of the way in case they are corrupted, and see if the apiserver is able to start.

Thanks @ec0 , I killed that cluster and started new one with 3 nodes.