Pi4 1.20 nodes keep failing after a few days on a fairly stock install

I have 6 Raspberry Pi 4 8gb RAM, 32 or 64gb SD running Ubuntu 20.04.2 and kubernetes via microk8s 1.20.1, though started with 1.19.x. I’ve noticed after a few days, nodes start dropping off and becoming “not ready”, and will get to the point often they become so CPU bound they become unresponsive and I have to power cycle them. I’ve torn down and rebuilt this cluster 3 or 4 times, and every time ends the same way.

Pretty basic installation, with metallb, prometheus (only recently), metrics server and dns for addons. I think they are a pretty stock install following the Install a local Kubernetes with MicroK8s | Ubuntu to get them up and running to learn.

Only a single pod I was playing for a mumble server with an endpoint exposed to the node, and a pvc via NFS for storage, nothing fancy and nothing else besides stock pods from microk8s (was my “hello world” deployment to play with it.) Ironically this has been solid as it moves from node to node as they die.

I finally installed filebeat on the nodes so I can see why they are going off the rails as sometimes can no longer ssh into them as CPU is pegged by kube-apiserver of kubelet. and of course logs are of thousands of different errors. Sometimes powercycling a node brings it back, other times I need to force remove the node and re-add it after a snap purge as microk8s will refuse to start.

I saw this with just 2 nodes, 3 nodes etc, with I first started playing, totally different network switch, different power supplies, etc. Was initially 1.19 as mentioned earlier (hoped rebuilding the cluster at 1.20 would “fix” it.) So either I am doing something wrong or something in my env is causing this, but not sure where to start to debug this. Thanks for any advice you can give.

Here’s an example, this node was working fine, logs were quiet, then at 13:44pm it went off the rails, 800 messages in a second or so:

Feb 2, 2021 @ 13:44:14.000	pam_unix(cron:session): session closed for user root
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.333921422-05:00" level=error msg="post event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.507355667-05:00" level=error msg="forward event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.332491595-05:00" level=error msg="post event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.507375260-05:00" level=error msg="forward event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.508540129-05:00" level=error msg="post event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:13.509073339-05:00" level=error msg="forward event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.334818703-05:00" level=error msg="post event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.963950231-05:00" level=error msg="forward event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	time="2021-02-02T13:44:12.990810661-05:00" level=error msg="post event" error="context deadline exceeded"
Feb 2, 2021 @ 13:44:14.000	E0202 13:44:10.560084 1091668 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.ReplicaSet: failed to list *v1.ReplicaSet: Get "https://127.0.0.1:16443/apis/apps/v1/replicasets?limit=500&resourceVersion=0": dial tcp 127.0.0.1:16443: i/o timeout
Feb 2, 2021 @ 13:44:14.000	I0202 13:44:10.422833 1091668 request.go:655] Throttling request took 4h41m15.355483084s, request: GET:https://127.0.0.1:16443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0
Feb 2, 2021 @ 13:44:14.000	I0202 13:44:11.083255 1091668 trace.go:205] Trace[700283889]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Feb-2021 05:54:03.063) (total time: 114606046ms):
Feb 2, 2021 @ 13:44:14.000	Trace[700283889]: [31h50m6.046171838s] [31h50m6.046171838s] END
Feb 2, 2021 @ 13:44:14.000	E0202 13:44:11.083412 1091668 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: Get "https://127.0.0.1:16443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0": dial tcp 127.0.0.1:16443: i/o timeout
Feb 2, 2021 @ 13:44:14.000	I0202 13:44:11.083684 1091668 trace.go:205] Trace[1837340726]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Feb-2021 05:54:03.063) (total time: 114608019ms):
Feb 2, 2021 @ 13:44:14.000	Trace[1837340726]: [31h50m8.019757135s] [31h50m8.019757135s] END
Feb 2, 2021 @ 13:44:14.000	E0202 13:44:11.083741 1091668 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: Get "https://127.0.0.1:16443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0": dial tcp 127.0.0.1:16443: i/o timeout
Feb 2, 2021 @ 13:44:14.000	I0202 13:44:11.250892 1091668 trace.go:205] Trace[855746558]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Feb-2021 05:58:41.117) (total time: 114330133ms):
Feb 2, 2021 @ 13:44:14.000	Trace[855746558]: [31h45m30.133043891s] [31h45m30.133043891s] END
Feb 2, 2021 @ 13:44:14.000	E0201 07:37:20.160415    6014 cached_token_authenticator.go:170] runtime error: invalid memory address or nil pointer dereference
Feb 2, 2021 @ 13:44:14.000	goroutine 2353807 [running]:
Feb 2, 2021 @ 13:44:14.000	k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authentication/token/cache.(*cachedTokenAuthenticator).doAuthenticateToken.func1.1(0x40039883f0, 0x400098ff48, 0x42303f0)
Feb 2, 2021 @ 13:44:14.000	#011/build/microk8s/parts/k8s-binaries/build/go/src/github.com/kubernetes/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go:169 +0xcc
Feb 2, 2021 @ 13:44:14.000	panic(0x37a53a0, 0x67bb280)
Feb 2, 2021 @ 13:44:14.000	#011/snap/go/6746/src/runtime/panic.go:969 +0x15c
Feb 2, 2021 @ 13:44:14.000	k8s.io/kubernetes/vendor/k8s.io/apiserver/plugin/pkg/authenticator/token/webhook.(*WebhookTokenAuthenticator).AuthenticateToken(0x4000cdf720, 0x473ef00, 0x4003988660, 0x4001d86c07, 0x3aa, 0x40030dc3c0, 0x473ef00, 0x4003988660, 0x4001106ef8)
Feb 2, 2021 @ 13:44:14.000	#011/build/microk8s/parts/k8s-binaries/build/go/src/github.com/kubernetes/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/plugin/pkg/authenticator/token/webhook/webhook.go:131 +0x1d8
Feb 2, 2021 @ 13:44:14.000	k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authentication/token/cache.(*cachedTokenAuthenticator).doAuthenticateToken.func1(0x3390e20, 0x40039883f0, 0x0, 0x0)
Feb 2, 2021 @ 13:44:14.000	#011/build/microk8s/parts/k8s-binaries/build/go/src/github.com/kubernetes/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go:194 +0x1f4
Feb 2, 2021 @ 13:44:14.000	k8s.io/kubernetes/vendor/golang.org/x/sync/singleflight.(*Group).doCall(0x4000d081c8, 0x40027c4060, 0x4001e3e220, 0x20, 0x40022d03c0)
Feb 2, 2021 @ 13:44:14.000	#011/build/microk8s/parts/k8s-binaries/build/go/src/github.com/kubernetes/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/golang.org/x/sync/singleflight/singleflight.go:97 +0x28
Feb 2, 2021 @ 13:44:14.000	created by k8s.io/kubernetes/vendor/golang.org/x/sync/singleflight.(*Group).DoChan
Feb 2, 2021 @ 13:44:14.000	#011/build/microk8s/parts/k8s-binaries/build/go/src/github.com/kubernetes/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/golang.org/x/sync/singleflight/singleflight.go:90 +0x304
Feb 2, 2021 @ 13:44:14.000	I0202 13:44:13.010276    5957 trace.go:205] Trace[1988399392]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Feb-2021 18:39:38.889) (total time: 68673989ms):
Feb 2, 2021 @ 13:44:14.000	Trace[1988399392]: [19h4m33.989896115s] [19h4m33.989896115s] END
Feb 2, 2021 @ 13:44:14.000	E0202 13:44:13.010472    5957 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://127.0.0.1:16443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=6568266": dial tcp 127.0.0.1:16443: connect: connection refused
Feb 2, 2021 @ 13:44:14.000	I0202 13:44:13.673931    5957 trace.go:205] Trace[1936514998]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (01-Feb-2021 08:27:43.487) (total time: 105390186ms):
Feb 2, 2021 @ 13:44:14.000	Trace[1936514998]: [29h16m30.186730258s] [29h16m30.186730258s] END
Feb 2, 2021 @ 13:44:14.000	E0202 13:44:13.674032    5957 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: Get "https://127.0.0.1:16443/apis/discovery.k8s.io/v1beta1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=7171785": dial tcp 127.0.0.1:16443: connect: connection refused
Feb 2, 2021 @ 13:44:14.000	E0202 13:44:14.032097    5957 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://127.0.0.1:16443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=6568266": dial tcp 127.0.0.1:16443: connect: connection refused
Feb 2, 2021 @ 13:44:14.000	Failed to start Snap Daemon.

Figures after I posted the above, tweaked my search in elastic and found the cause of this node failing:

Feb 2, 2021 @ 13:44:01.000	[425463.690361] [    939]     0   939     5039      768    57344        0         -1000 systemd-udevd
Feb 2, 2021 @ 13:44:01.000	[425463.690369] [   1546]     0  1546    70061     4154    90112        0         -1000 multipathd
Feb 2, 2021 @ 13:44:01.000	[425463.690375] [   1607]   113  1607     1783      431    49152        0             0 rpcbind
Feb 2, 2021 @ 13:44:01.000	[425463.690381] [   1691]   100  1691     6619      574    77824        0             0 systemd-network
Feb 2, 2021 @ 13:44:01.000	[425463.690386] [   1694]   101  1694     6085     1396    90112        0             0 systemd-resolve
Feb 2, 2021 @ 13:44:01.000	[425463.690391] [   1726]   103  1726     4623      632    57344        0          -900 dbus-daemon
Feb 2, 2021 @ 13:44:01.000	[425463.690396] [   1729]     0  1729    20236      457    57344        0             0 irqbalance
Feb 2, 2021 @ 13:44:01.000	[425463.690401] [   1730]     0  1730     7274     2305    90112        0             0 networkd-dispat
Feb 2, 2021 @ 13:44:01.000	[425463.690406] [   1732]   104  1732    57861      509    86016        0             0 rsyslogd
Feb 2, 2021 @ 13:44:01.000	[425463.690411] [   1735]     0  1735    11814      815   118784        0             0 sssd
Feb 2, 2021 @ 13:44:01.000	[425463.690416] [   1736]     0  1736     3094      557    57344        0             0 wpa_supplicant
Feb 2, 2021 @ 13:44:01.000	[425463.690420] [   1745]     0  1745   448262     5580   512000        0             0 filebeat
Feb 2, 2021 @ 13:44:01.000	[425463.690425] [   1752]     0  1752     2181      221    61440        0             0 apiservice-kick
Feb 2, 2021 @ 13:44:01.000	[425463.690430] [   1762]     0  1762     2082      433    53248        0             0 run-cluster-age
Feb 2, 2021 @ 13:44:01.000	[425463.690435] [   1765]   112  1765     3064      506    49152        0             0 chronyd
Feb 2, 2021 @ 13:44:01.000	[425463.690440] [   1781]     0  1781     2181      498    57344        0             0 bash
Feb 2, 2021 @ 13:44:01.000	[425463.690445] [   1782]   112  1782     1042      434    49152        0             0 chronyd
Feb 2, 2021 @ 13:44:01.000	[425463.690450] [   1952]     0  1952     3051      712    61440        0         -1000 sshd
Feb 2, 2021 @ 13:44:01.000	[425463.690455] [   2040]     0  2040    59400      587   102400        0             0 accounts-daemon
Feb 2, 2021 @ 13:44:01.000	[425463.690460] [   2042]     0  2042     2084      369    49152        0             0 cron
Feb 2, 2021 @ 13:44:01.000	[425463.690465] [   2043]     0  2043     4175      542    69632        0             0 systemd-logind
Feb 2, 2021 @ 13:44:01.000	[425463.690469] [   2048]     0  2048      898      409    49152        0             0 atd
Feb 2, 2021 @ 13:44:01.000	[425463.690474] [   2063]     0  2063     1709      357    53248        0             0 agetty
Feb 2, 2021 @ 13:44:01.000	[425463.690479] [   2068]     0  2068    26953     2470   106496        0             0 unattended-upgr
Feb 2, 2021 @ 13:44:01.000	[425463.690484] [   2069]     0  2069     1328      283    45056        0             0 agetty
Feb 2, 2021 @ 13:44:01.000	[425463.690489] [   2070]     0  2070    58234      482    94208        0             0 polkitd
Feb 2, 2021 @ 13:44:01.000	[425463.690494] [   2162]     0  2162     7710     4161    98304        0             0 python3
Feb 2, 2021 @ 13:44:01.000	[425463.690499] [   2163]     0  2163     5629     3084    81920        0             0 python3
Feb 2, 2021 @ 13:44:01.000	[425463.690504] [   2251]     0  2251      480      108    40960        0             0 none
Feb 2, 2021 @ 13:44:01.000	[425463.690509] [   5589]     0  5589   443080     8404   393216        0             0 containerd
Feb 2, 2021 @ 13:44:01.000	[425463.690514] [   5908]     0  5908  2209159  1726964 14487552        0             0 kube-apiserver
Feb 2, 2021 @ 13:44:01.000	[425463.690519] [   5957]     0  5957   185307     2803   196608        0          -999 kube-proxy
Feb 2, 2021 @ 13:44:01.000	[425463.690524] [   6014]     0  6014   498845    16744   585728        0          -999 kubelet
Feb 2, 2021 @ 13:44:01.000	[425463.690529] [   6628]     0  6628   177781      651    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690534] [   6643]     0  6643   177493      644    90112        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690539] [   6680]     0  6680   177845      617    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690544] [   6688] 65534  6688      200        1    36864        0          -998 pause
Feb 2, 2021 @ 13:44:01.000	[425463.690549] [   6700]     0  6700      200        1    36864        0          -998 pause
Feb 2, 2021 @ 13:44:01.000	[425463.690554] [   6716]     0  6716      200        1    36864        0          -998 pause
Feb 2, 2021 @ 13:44:01.000	[425463.690559] [   6806]     0  6806   177429      678    90112        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690563] [   6828]     0  6828      200        1    36864        0          -998 pause
Feb 2, 2021 @ 13:44:01.000	[425463.690568] [   6848]     0  6848   177909      868   102400        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690573] [   6867]     0  6867   177429      590    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690578] [   6900]     0  6900   177909      924   114688        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690583] [   6932] 65534  6932      200        1    36864        0          -998 pause
Feb 2, 2021 @ 13:44:01.000	[425463.690588] [   6955]     0  6955    32490     2229   135168        0          -997 speaker
Feb 2, 2021 @ 13:44:01.000	[425463.690593] [   7055]     0  7055   177845      941    90112        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690598] [   7304]     0  7304   177845      901   102400        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690603] [   7473]     0  7473   177973     1207   151552        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690608] [   7497]     0  7497      202        9    32768        0          -997 runsvdir
Feb 2, 2021 @ 13:44:01.000	[425463.690613] [   7587]     0  7587      197        8    32768        0          -997 runsv
Feb 2, 2021 @ 13:44:01.000	[425463.690617] [   7589]     0  7589    36447     3867   167936        0          -997 calico-node
Feb 2, 2021 @ 13:44:01.000	[425463.690622] [   7602]     0  7602   177845      936    86016        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690627] [   7705]     0  7705   177909     1005    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690632] [  33411]     0 33411   177781      644    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690637] [  33445]     0 33445      200        1    36864        0          -998 pause
Feb 2, 2021 @ 13:44:01.000	[425463.690642] [  33477]     0 33477   177845      953    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690646] [  33518]     0 33518   177909      968    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690651] [  33596]     0 33596   177845      966    94208        0             1 containerd-shim
Feb 2, 2021 @ 13:44:01.000	[425463.690656] [1091648]     0 1091648   201905     3689   307200        0             0 kube-controller
Feb 2, 2021 @ 13:44:01.000	[425463.690661] [1091668]     0 1091668   186071     2807   208896        0             0 kube-scheduler
Feb 2, 2021 @ 13:44:01.000	[425463.690666] [1140534]     0 1140534   185032      903   143360        0             0 kubectl
Feb 2, 2021 @ 13:44:01.000	[425463.690672] [1169628]     0 1169628      514      100    40960        0             0 apt.systemd.dai
Feb 2, 2021 @ 13:44:01.000	[425463.690677] [1169632]     0 1169632      514      321    45056        0             0 apt.systemd.dai
Feb 2, 2021 @ 13:44:01.000	[425463.690682] [1170502]     0 1170502     9034      627   114688        0             0 sssd_pam
Feb 2, 2021 @ 13:44:01.000	[425463.690687] [1171774]     0 1171774   186662     1024   114688        0             1 runc
Feb 2, 2021 @ 13:44:01.000	[425463.690692] [1172307]     0 1172307     4673     1490    77824        0             0 apt-get
Feb 2, 2021 @ 13:44:01.000	[425463.690697] [1172540]     0 1172540     4673     1033    73728        0             0 apt-get
Feb 2, 2021 @ 13:44:01.000	[425463.690702] [1172702]     0 1172702      514       92    40960        0             0 sh
Feb 2, 2021 @ 13:44:01.000	[425463.690707] [1172703]     0 1172703      514      296    45056        0             0 update-motd-upd
Feb 2, 2021 @ 13:44:01.000	[425463.690712] [1172717]     0 1172717    29671    15570   274432        0             0 apt-check
Feb 2, 2021 @ 13:44:01.000	[425463.690718] [1173211]     0 1173211      514      304    40960        0             0 50-motd-news
Feb 2, 2021 @ 13:44:01.000	[425463.690723] [1173245]     0 1173245     9859     5750   114688        0             0 cloud-id
Feb 2, 2021 @ 13:44:01.000	[425463.690728] [1177217]     0 1177217     8540      498    86016        0             0 cron
Feb 2, 2021 @ 13:44:01.000	[425463.690733] [1177265]     0 1177265      514      305    45056        0             0 sh
Feb 2, 2021 @ 13:44:01.000	[425463.690738] [1177267]     0 1177267      484      287    45056        0             0 run-parts
Feb 2, 2021 @ 13:44:01.000	[425463.690743] [1177404]     0 1177404      514      140    45056        0             0 update-notifier
Feb 2, 2021 @ 13:44:01.000	[425463.690748] [1177405]     0 1177405     3743     1025    65536        0             0 package-data-do
Feb 2, 2021 @ 13:44:01.000	[425463.690753] [1177426]     0 1177426     3667      944    61440        0             0 apport
Feb 2, 2021 @ 13:44:01.000	[425463.690758] [1177432]     0 1177432   142959     2547   163840        0          -900 snapd
Feb 2, 2021 @ 13:44:01.000	[425463.690762] [1177503]     0 1177503     1403       35    45056        0             0 grep
Feb 2, 2021 @ 13:44:01.000	[425463.690768] [1177574]     0 1177574     5524      263    77824        0             0 cron
Feb 2, 2021 @ 13:44:01.000	[425463.690773] [1177592]     0 1177592     7509      534    94208        0             0 sssd_nss
Feb 2, 2021 @ 13:44:01.000	[425463.690778] [1177605]     0 1177605     5342      422    77824        0             0 cron
Feb 2, 2021 @ 13:44:01.000	[425463.690783] [1177609]     0 1177609     4324      378    69632        0             0 sssd_be
Feb 2, 2021 @ 13:44:01.000	[425463.690789] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=6cb513af3d2b0f8a0330d755ee36f00e1487de82eeed449eae54d931c58c1a34,mems_allowed=0,global_oom,task_memcg=/system.slice/snap.microk8s.daemon-apiserver.service,task=kube-apiserver,pid=5908,uid=0
Feb 2, 2021 @ 13:44:01.000	[425463.690887] Out of memory: Killed process 5908 (kube-apiserver) total-vm:8836636kB, anon-rss:6907856kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:14148kB oom_score_adj:0
Feb 2, 2021 @ 13:44:01.000	[425445.160494] audit: type=1400 audit(1612291422.960:11406): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/locale/C.UTF-8/LC_NAME" pid=1177609 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Feb 2, 2021 @ 13:44:01.000	[425445.449481] audit: type=1400 audit(1612291423.252:11407): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/locale/C.UTF-8/LC_PAPER" pid=1177609 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Feb 2, 2021 @ 13:44:01.000	[425445.794360] audit: type=1400 audit(1612291423.596:11408): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/locale/C.UTF-8/LC_MESSAGES/" pid=1177609 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Feb 2, 2021 @ 13:44:01.000	[425447.270264] kauditd_printk_skb: 5 callbacks suppressed
Feb 2, 2021 @ 13:44:01.000	[425447.270269] audit: type=1400 audit(1612291425.072:11414): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/locale/C.UTF-8/LC_CTYPE" pid=1177609 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Feb 2, 2021 @ 13:44:01.000	[425463.690049] calico-node invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOMEMALLOC), order=2, oom_score_adj=-997
Feb 2, 2021 @ 13:44:01.000	[425463.690061] CPU: 0 PID: 7589 Comm: calico-node Tainted: G         C  E     5.4.0-1028-raspi #31-Ubuntu
Feb 2, 2021 @ 13:44:01.000	[425463.690064] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
Feb 2, 2021 @ 13:44:01.000	[425463.690066] Call trace:
Feb 2, 2021 @ 13:44:01.000	[425463.690077]  dump_backtrace+0x0/0x198
Feb 2, 2021 @ 13:44:01.000	[425463.690080]  show_stack+0x28/0x38
Feb 2, 2021 @ 13:44:01.000	[425463.690085]  dump_stack+0xd8/0x134
Feb 2, 2021 @ 13:44:01.000	[425463.690090]  dump_header+0x4c/0x204
Feb 2, 2021 @ 13:44:01.000	[425463.690092]  oom_kill_process+0x1d0/0x1d8
Feb 2, 2021 @ 13:44:01.000	[425463.690095]  out_of_memory+0xe4/0x2c8
Feb 2, 2021 @ 13:44:01.000	[425463.690100]  __alloc_pages_slowpath+0xa2c/0xc00
Feb 2, 2021 @ 13:44:01.000	[425463.690103]  __alloc_pages_nodemask+0x2b4/0x330
Feb 2, 2021 @ 13:44:01.000	[425463.690107]  kmalloc_order+0x34/0x80
Feb 2, 2021 @ 13:44:01.000	[425463.690110]  kmalloc_order_trace+0x40/0x120
Feb 2, 2021 @ 13:44:01.000	[425463.690113]  __kmalloc_track_caller+0x31c/0x358
Feb 2, 2021 @ 13:44:01.000	[425463.690118]  __kmalloc_reserve.isra.0+0x58/0xa8
Feb 2, 2021 @ 13:44:01.000	[425463.690121]  __alloc_skb+0x80/0x198
Feb 2, 2021 @ 13:44:01.000	[425463.690125]  netlink_dump+0x294/0x330
Feb 2, 2021 @ 13:44:01.000	[425463.690129]  __netlink_dump_start+0x144/0x1c0
Feb 2, 2021 @ 13:44:01.000	[425463.690133]  rtnetlink_rcv_msg+0x274/0x358
Feb 2, 2021 @ 13:44:01.000	[425463.690136]  netlink_rcv_skb+0x60/0x120
Feb 2, 2021 @ 13:44:01.000	[425463.690139]  rtnetlink_rcv+0x2c/0x38
Feb 2, 2021 @ 13:44:01.000	[425463.690142]  netlink_unicast+0x188/0x210
Feb 2, 2021 @ 13:44:01.000	[425463.690145]  netlink_sendmsg+0x1c0/0x368
Feb 2, 2021 @ 13:44:01.000	[425463.690150]  sock_sendmsg+0x58/0x68
Feb 2, 2021 @ 13:44:01.000	[425463.690153]  __sys_sendto+0xe8/0x158
Feb 2, 2021 @ 13:44:01.000	[425463.690155]  __arm64_sys_sendto+0x34/0x48
Feb 2, 2021 @ 13:44:01.000	[425463.690159]  el0_svc_common.constprop.0+0x84/0x218
Feb 2, 2021 @ 13:44:01.000	[425463.690162]  el0_svc_handler+0x38/0xa0
Feb 2, 2021 @ 13:44:01.000	[425463.690165]  el0_svc+0x10/0x2d4
Feb 2, 2021 @ 13:44:01.000	[425463.690168] Mem-Info:
Feb 2, 2021 @ 13:44:01.000	[425463.690178] active_anon:1835039 inactive_anon:51 isolated_anon:0
Feb 2, 2021 @ 13:44:01.000	[425463.690178]  active_file:409 inactive_file:1521 isolated_file:0
Feb 2, 2021 @ 13:44:01.000	[425463.690178]  unevictable:4284 dirty:3 writeback:0 unstable:0
Feb 2, 2021 @ 13:44:01.000	[425463.690178]  slab_reclaimable:19690 slab_unreclaimable:48240
Feb 2, 2021 @ 13:44:01.000	[425463.690178]  mapped:1885 shmem:1190 pagetables:5262 bounce:0
Feb 2, 2021 @ 13:44:01.000	[425463.690178]  free:13703 free_pcp:145 free_cma:4526
Feb 2, 2021 @ 13:44:01.000	[425463.690185] Node 0 active_anon:7340156kB inactive_anon:204kB active_file:1636kB inactive_file:6084kB unevictable:17136kB isolated(anon):0kB isolated(file):0kB mapped:7540kB dirty:12kB writeback:0kB shmem:4760kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Feb 2, 2021 @ 13:44:01.000	[425463.690195] DMA free:29356kB min:1204kB low:2056kB high:2908kB active_anon:850468kB inactive_anon:4kB active_file:284kB inactive_file:1328kB unevictable:0kB writepending:8kB present:970752kB managed:884912kB mlocked:0kB kernel_stack:0kB pagetables:1020kB bounce:0kB free_pcp:52kB local_pcp:0kB free_cma:18104kB
Feb 2, 2021 @ 13:44:01.000	[425463.690197] lowmem_reserve[]: 0 3008 6947 6947
Feb 2, 2021 @ 13:44:01.000	[425463.690210] DMA32 free:19208kB min:4364kB low:7444kB high:10524kB active_anon:3028772kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3080192kB managed:3080192kB mlocked:0kB kernel_stack:16kB pagetables:4172kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Feb 2, 2021 @ 13:44:01.000	[425463.690213] lowmem_reserve[]: 0 0 3939 3939
Feb 2, 2021 @ 13:44:01.000	[425463.690225] Normal free:6248kB min:5712kB low:9744kB high:13776kB active_anon:3460916kB inactive_anon:196kB active_file:1396kB inactive_file:4560kB unevictable:17136kB writepending:4kB present:4194304kB managed:4033656kB mlocked:17136kB kernel_stack:8576kB pagetables:15856kB bounce:0kB free_pcp:528kB local_pcp:248kB free_cma:0kB
Feb 2, 2021 @ 13:44:01.000	[425463.690228] lowmem_reserve[]: 0 0 0 0
Feb 2, 2021 @ 13:44:01.000	[425463.690235] DMA: 187*4kB (UMEC) 126*8kB (UMEC) 78*16kB (UMEC) 40*32kB (UME) 12*64kB (UE) 5*128kB (UEC) 1*256kB (E) 2*512kB (EC) 2*1024kB (EC) 2*2048kB (UE) 4*4096kB (C) = 29500kB
Feb 2, 2021 @ 13:44:01.000	[425463.690265] DMA32: 3200*4kB (UME) 421*8kB (UME) 190*16kB (UME) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19208kB
Feb 2, 2021 @ 13:44:01.000	[425463.690286] Normal: 352*4kB (UME) 588*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6112kB
Feb 2, 2021 @ 13:44:01.000	[425463.690305] 4777 total pagecache pages
Feb 2, 2021 @ 13:44:01.000	[425463.690309] 0 pages in swap cache
Feb 2, 2021 @ 13:44:01.000	[425463.690313] Swap cache stats: add 0, delete 0, find 0/0
Feb 2, 2021 @ 13:44:01.000	[425463.690316] Free swap  = 0kB
Feb 2, 2021 @ 13:44:01.000	[425463.690319] Total swap = 0kB
Feb 2, 2021 @ 13:44:01.000	[425463.690322] 2061312 pages RAM
Feb 2, 2021 @ 13:44:01.000	[425463.690325] 0 pages HighMem/MovableOnly
Feb 2, 2021 @ 13:44:01.000	[425463.690327] 61622 pages reserved
Feb 2, 2021 @ 13:44:01.000	[425463.690331] 16384 pages cma reserved
Feb 2, 2021 @ 13:44:01.000	[425463.690334] Tasks state (memory values in pages):
Feb 2, 2021 @ 13:44:01.000	[425463.690337] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 2, 2021 @ 13:44:01.000	[425463.690355] [    911]     0   911    94436      642   774144        0          -250 systemd-journal
Feb 2, 2021 @ 13:44:02.000	[425464.526002] audit: type=1400 audit(1612291442.328:11415): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/aarch64-linux-gnu/ldb/modules/ldb/" pid=1177609 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Feb 2, 2021 @ 13:44:02.000	[425464.947104] oom_reaper: reaped process 5908 (kube-apiserver), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Feb 2, 2021 @ 13:44:02.000	[425464.989406] systemd[1]: snapd.service: start operation timed out. Terminating.

Sure enough if I look across all my nodes, I am seeing calico-node invoked oom-killer across most of the nodes in the last 7 days, suspect this is the cause?

Sorry for all the self-replies, but I think I found the issue, https://github.com/ubuntu/microk8s/issues/1598