Troubleshooting

evilnick · June 4, 2020, 3:17pm

It is important to recognise that things can go wrong. But MicroK8s gives you tools to help work out what has gone wrong, as detailed below. Be sure to check out the common issues section for help resolving the most frequently encountered problems.

Checking logs

If a pod is not behaving as expected, the first port of call should be the logs.

Assuming you run a simple workload in a namespace called redis, first determine the resource identifier for the pod:

microk8s kubectl get pods -n redis

This will list the currently available pods, for example:

NAME    READY   STATUS         RESTARTS   AGE
redis   0/1     ErrImagePull   0          84m

You can then use kubectl to view the log. For example, for the simple redis pod above:

microk8s kubectl logs redis -n redis
Error from server (BadRequest): container "redis" in pod "redis" is waiting to start: image can't be pulled

If this information is not sufficient, you can look into the events to find out more:

microk8s kubectl get events -n redis
LAST SEEN   TYPE      REASON      OBJECT      MESSAGE
5m9s        Warning   Failed      pod/redis   Failed to pull image "redis:XXlatest": failed to pull and unpack image "docker.io/library/redis:XXlatest": failed to resolve reference "docker.io/library/redis:XXlatest": unexpected status from HEAD request to https://www.docker.com/: 403 Forbidden
5m9s        Warning   Failed      pod/redis   Error: ErrImagePull
3m59s       Warning   BackOff     pod/redis   Back-off restarting failed container redis in pod redis_redis(b4ec0dac-609d-48c2-955a-1d6abc1c42b0)

In this specific case there is no such image in the image registry, thus the pod specification needs to be adjusted.

Examining the configuration

If the problem you are experiencing indicates a problem with the configuration of the Kubernetes components themselves, it could be helpful to examine the arguments used to run these components.

These are available in the directory ${SNAP_DATA}/args, which on Ubuntu should point to /var/snap/microk8s/current.
Note that the $SNAP_DATA environment variable itself is only available to the running snap. For more information on the snap environment, check the snap documentation.

Using the built-in inspection tool

MicroK8s ships with a script to compile a complete report on MicroK8s and the system which it is running on. This is essential for bug reports, but is also a useful way of confirming the system is (or isn’t) working and collecting all the relevant data in one place.

To run the inspection tool, enter the command (admin privilege is required to collect all the data):

sudo microk8s inspect

You should see output similar to the following:

Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/1031/inspection-report-20191104_153950.tar.gz

This confirms the services that are running, and the resultant report file can be viewed to get a detailed look at every aspect of the system.

Common issues

Node is not ready when RBAC is enabled...

Ensure the hostname of your machine name does not contain capital letters or underscores. Kubernetes normalizes the machine name causing its registration to fail.

To fix this you can change the hostname or use the --hostname-override argument in kubelet’s configuration in /var/snap/microk8s/current/args/kubelet.

My dns and dashboard pods are CrashLooping...

The cni network plugin used by MicroK8s creates a vxlan.calico interface (cbr0 on pre v1.16 releases or cni0 in pre v1.19 releases and non-HA deployments) when the first pod is created.

If you have ufw enabled, you'll need to allow traffic on this interface:

sudo ufw allow in on vxlan.calico && sudo ufw allow out on vxlan.calico
sudo ufw allow in on cali+ && sudo ufw allow out on cali+

My pods can't reach the internet or each other (but my MicroK8s host machine can)...

Make sure packets to/from the pod network interface can be forwarded to/from the default interface on the host via the iptables tool. Such changes can be made persistent by installing the iptables-persistent package:

   sudo iptables -P FORWARD ACCEPT
   sudo apt-get install iptables-persistent

or, if using ufw:

   sudo ufw default allow routed

The MicroK8s inspect command can be used to check the firewall configuration:

   microk8s inspect

A warning will be shown if the firewall is not forwarding traffic.

My log collector is not collecting any logs...

By default container logs are located in /var/log/pods/{id}. You have to mount this location in your log collector for that to work. Following is an example diff for fluent-bit:

@@ -36,6 +36,9 @@
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
   +        - name: varlibdockercontainers
   +          mountPath: /var/snap/microk8s/common/var/lib/containerd/
   +          readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
          terminationGracePeriodSeconds: 10
   @@ -45,7 +48,7 @@
              path: /var/log
          - name: varlibdockercontainers
            hostPath:
   -          path: /var/lib/docker/containers
   +          mountPath: /var/snap/microk8s/common/var/lib/containerd/
          - name: fluent-bit-config
            configMap:
              name: fluent-bit-config

My pods are not starting and I use ZFS...

Microk8s switched to containerd as its container runtime in release 492. When run on ZFS, containerd must be configured to use ZFS snapshots. Presently neither Microk8s nor containerd perform this automatically so you must manually update the configuration. Instructions on how to do this are documented here.

My home directory is not in /home or is on NFS and I can't get Microk8s to work...

While not strictly a Microk8s issue, snaps generally do not work out of the box if your home directory is mounted via NFS, or if it is not located directly under /home. See snapd bugs #1662552 and #1620771 for further information and possible workarounds.

I need to recover my HA cluster

In normal use, a MicroK8s HA cluster is self healing. There may be occasions when testing edge versions or mixing releases that the cluster may need to be recovered. This docs page details the procedure.

Raspberry Pi and systems with low disk performance

The symptoms you may observe vary. You may experience the API server being slow, crashing or forming an unstable multi node cluster. Such problems are often traced to low performing or miss-configured disks. In the logs of the API server you will notice the data store being slow to write on disk. With journalctl -f -u snap.microk8s.daemon-kubelite or (for prior to v1.21) journalctl -f -u snap.microk8s.daemon-apiserver you will see messages such as microk8s.daemon-kubelite[3802920]: Trace[736557743]: ---"Object stored in database" 7755ms .

To identify if the a slow disk is affecting you, you could use hdparm and try to write a large file with dd, for example, hdparm -Tt /dev/sda and dd if=/dev/zero of=/tmp/test1.img bs=1G count=1

In systems such as the Raspberry Pi the issue may caused by devices not fully implementing the UAS specification.

In some cases, a way to mitigate the issue is to move the journald logs on volatile storage. This is done by editing /etc/systemd/journald.conf setting Storage=volatile.

Of course, you can always consider upgrading the attached storage.

The issue API Server hanging on raspberry pi · Issue #2280 · canonical/microk8s · GitHub demonstrates a successful debugging of this issue.

"access denied" error on Debian 9

Snapctl coming with Debian 9 is outdated. Here is how to replace it with a fresh one:

   sudo snap install core
   sudo mv /usr/bin/snapctl /usr/bin/snapctl.old
   sudo ln -s  /snap/core/current/usr/bin/snapctl /usr/bin/snapctl

I get "i/o timeouts" when calling "microk8s kubectl logs"

Make sure your hostname resolves correctly to the IP address of your host or localhost. The following error may indicate this misconfiguration:

  microk8s kubectl logs 
  Error from server: Get "https://hostname:10250/containerLogs/default/...": dial tcp host-IP:10250: i/o timeout

One way to address this issue is to add the hostname and IP details of the host in /etc/hosts. In the case of a multi-node cluster, the /etc/hosts on each machine has to be updated with the details of all cluster nodes.

I get "Unable to connect to the server: x509" on a multi-node cluster

This indicates that the certificates are not being regenerated correctly to reflect network changes. A workaround is to temporarily rename the file found at: /

var/snap/microk8s/current/var/lock/no-cert-reissue

The certificates should then be automatically regenerated. The above file can then be returned to its original name.

Calico controller fails on Raspberry Pi with Ubuntu 21.10

Extra kernel modules are needed on RPi after upgrading to Ubuntu 21.10. Install those with sudo apt install linux-modules-extra-raspi. You may need to restart MicroK8s afterwards.

Pod communication problems when using firewall-cmd (Fedora etc)

On systems which use firewall-cmd, pods are unable to communicate with each other because the firewall drops the packets. To check if this is the case, do the following:

# get the subnet cidr the pods are using
SUBNET=`cat /var/snap/microk8s/current/args/cni-network/cni.yaml | grep CALICO_IPV4POOL_CIDR -a1 | tail -n1 | grep -oP '[\d\./]+'`
echo $SUBNET

# enable logging of denied packets
sudo firewall-cmd --set-log-denied=all
sudo firewall-cmd --reload

# e.g. restart a pod and check for denied packets with dmesg
# (look for packets having an IP from the SUBNET above as SRC)
dmesg | grep -i REJECT

Solution: Create a dedicated zone for the microk8s subnet to avoid packets being dropped:

# if you see packets being rejected create a dedicated zone for microk8s:
sudo firewall-cmd --permanent --new-zone=microk8s-cluster
sudo firewall-cmd --permanent --zone=microk8s-cluster --set-target=ACCEPT
sudo firewall-cmd --permanent --zone=microk8s-cluster --add-source=$SUBNET
sudo firewall-cmd --reload

# finally reset the logging
sudo firewall-cmd --set-log-denied=off
sudo firewall-cmd --reload

I get "This node does not have enough RAM to host the Kubernetes control plane services"

MicroK8s will refuse to start on machines with less than 512MB available RAM, in order to prevent the system from running out of memory. It is suggested that these nodes are added as worker-only nodes to an existing cluster.

If you still wish to start the control plane services, you can do:

microk8s start --disable-low-memory-guard

Reporting a bug

If you cannot solve your issue and believe the fault may lie in MicroK8s, please file an issue on the project repository.

To help us deal effectively with issues, it is incredibly useful to include the report obtained from microk8s inspect, as well as any additional logs, and a summary of the issue.

rockaut · January 21, 2021, 6:47am

Heja there,

there was an Issue over on github Inaccessible pods on other nodes for high availability cluster · Issue #1892 · ubuntu/microk8s · GitHub and I thought it might be also a good thing to document this in the docs too?

evilnick · January 21, 2021, 2:05pm

Yes. Thanks for that I will add it to my ToDo list!

random-dwi · November 9, 2021, 9:22am

Hey,

I have an additional point for the troubleshooting section (see: Inter pod communication on Fedora 34 · Issue #2636 · ubuntu/microk8s · GitHub)

Title: communication between pods not working and I use firewall-cmd
Problem Description: pods are unable to communicate with each other because the firewall drops the packets. To check if this is the case, do the following:

# get the subnet cidr the pods are using
SUBNET=`cat /var/snap/microk8s/current/args/cni-network/cni.yaml | grep CALICO_IPV4POOL_CIDR -a1 | tail -n1 | grep -oP '[\d\./]+' 
echo $SUBNET

# enable logging of denied packets
sudo firewall-cmd --set-log-denied=all
sudo firewall-cmd --reload

# e.g. restart a pod and check for denied packets with dmesg
# (look for packets having an IP from the SUBNET above as SRC)
dmesg | grep -i REJECT

Solution: Create a dedicated zone for the microk8s subnet to avoid packets being dropped:

# if you see packets being rejected create a dedicated zone for microk8s:
sudo firewall-cmd --permanent --new-zone=microk8s-cluster
sudo firewall-cmd --permanent --zone=microk8s-cluster --set-target=ACCEPT
sudo firewall-cmd --permanent --zone=microk8s-cluster --add-source=$SUBNET
sudo firewall-cmd --reload

# finally reset the logging
sudo firewall-cmd --set-log-denied=off
sudo firewall-cmd --reload

Rene_Radoi · April 18, 2024, 10:17am

Hey,

I’d like to enhance the general section on the top a little. It could provide a little bit more context, an actual troubleshooting scenario and some information about looking into events to the troubleshooting section. Here’s a suggestion:

Checking logs

If a pod is not behaving as expected, the first port of call should be the logs.

Assuming you run a simple workload in a namespace called redis, first determine the resource identifier for the pod:

microk8s kubectl get pods -n redis

This will list the currently available pods, for example:

NAME    READY   STATUS         RESTARTS   AGE
redis   0/1     ErrImagePull   0          84m

You can then use kubectl to view the log. For example, for the simple redis pod above:

microk8s kubectl logs redis -n redis
Error from server (BadRequest): container "redis" in pod "redis" is waiting to start: image can't be pulled

If this information is not sufficient, you can look into the events to find out more:

microk8s kubectl get events -n redis
LAST SEEN   TYPE      REASON      OBJECT      MESSAGE
5m9s        Warning   Failed      pod/redis   Failed to pull image "redis:XXlatest": failed to pull and unpack image "docker.io/library/redis:XXlatest": failed to resolve reference "docker.io/library/redis:XXlatest": unexpected status from HEAD request to https://www.docker.com/: 403 Forbidden
5m9s        Warning   Failed      pod/redis   Error: ErrImagePull
3m59s       Warning   BackOff     pod/redis   Back-off restarting failed container redis in pod redis_redis(b4ec0dac-609d-48c2-955a-1d6abc1c42b0)

In this specific case there is no such image in the image registry, thus the pod specification needs to be adjusted.

Would be cool if you could add this.

Thanks! René

evilnick · April 24, 2024, 9:10am

Nice work!

pamudithaA · March 21, 2025, 8:38am

Hey,
I would like to suggest additional section for verifying the nodes as this could be helpful for new users encountering issues related to nodes or pod issues caused by the nodes.
What do you think? Below is my suggestion and would be nice if you could review and add it.

Checking the Nodes

The next checkpoint could be the nodes.
The first step for verifying nodes would be to check the node status and verify the STATUS (2nd column in the output) is Ready. This can be verified with:

microk8s kubectl get nodes -o wide

Output will be the list of nodes in the microk8s cluster. Output will include the hostname of the node, status, K8s version and other details, for example:

NAME               STATUS     ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
ip-172-31-31-75    Ready      master,worker   39d   v1.31.6   172.31.31.75    <none>        Ubuntu 24.04.1 LTS   6.8.0-1021-aws   containerd://1.6.28
ip-172-31-89-109   NotReady   worker          39d   v1.31.6   172.31.89.109   <none>        Ubuntu 24.04.1 LTS   6.8.0-1024-aws   containerd://1.6.28

If the node status is NotReady, that indicates node is not ready to host any pods. It can happen due to multiple reasons such as resource issues, network issues or API timeouts.

The next step would be to check if there are any abnormal conditions reported for the nodes.

microk8s kubectl describe <node_name>

This would provide multiple information about the node, however Network, Memory, Disk and PID conditions are actively monitored, for example:

Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 21 Mar 2025 01:08:11 +0000   Fri, 21 Mar 2025 01:08:11 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Fri, 21 Mar 2025 07:21:39 +0000   Mon, 10 Feb 2025 02:18:33 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         True    Fri, 21 Mar 2025 07:21:39 +0000   Fri, 21 Mar 2025 07:21:29 +0000   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure          False   Fri, 21 Mar 2025 07:21:39 +0000   Mon, 10 Feb 2025 02:18:33 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 21 Mar 2025 07:21:39 +0000   Mon, 10 Feb 2025 02:18:49 +0000   KubeletReady                 kubelet is posting ready status

Under normal conditions, NetworkUnavailable, MemoryPressure, DiskPressure, PIDPressure would show Status as False and Ready would show as True.

In this specific case the node indicates a DiskPressure which indicates insufficient storage availability requiring further action to reduce the disk usage.

Topic		Replies	Views
Nodes crashed. node.kubernetes.io/unreachable:NoSchedule taint microk8s microk8s	3	4998	March 22, 2023
Microk8s not starting in Debian 10 VM microk8s	0	1049	February 4, 2023
Pi4 1.20 nodes keep failing after a few days on a fairly stock install microk8s	4	2005	February 3, 2021
MicroK8s documentation - home microk8s docs	11	20802	July 26, 2021
Kibana returns : Kibana did not load properly. Check the server output for more information microk8s	29	11457	February 5, 2023

Troubleshooting

Checking logs

Examining the configuration

Using the built-in inspection tool

Common issues

Reporting a bug

Checking the Nodes

Related topics