Dear,
I installed microk8s on a supercomputer and I kept having memory errors.
The use-case for this k8s is a jupyterhub server.
The main node issue warnings like this :
Warning ImageGCFailed 6m30s kubelet failed to garbage collect required amount of images. Wanted to free 255164889497 bytes, but freed 317164 bytes
until the moment where we reach the eviction threshold.
Warning EvictionThresholdMet 19m (x5 over 20m) kubelet Attempting to reclaim ephemeral-storage
Restarting microk8s and docker system prune help eventually but I am not sure what to do to fix it on the long-term.
Thanks for your help.
(very beginner here)
Marine
This particular error seems to point to storage issue and not memory. Where the kubelet have to clean up images (garbage collection) to make room.
MicroK8s keep most of its data in /var/snap/microk8s/...
Thanks for your answer. I am not sure to see the difference between storage issue against memory issue.
I was thinking the storage issue is due to a lack of memory.
Are you meaning it would be a lack of permission making impossible to clean the data in /var/snap/microk8s ?
thanks
I will first check if there is enough space on /var.
Apologies if I explain something you already know, but since you do state that you’re a very beginner, I’ll try to explain thoroughly.
First, the difference between memory and storage issues:
- a memory issue is related to RAM and how much processes in containers try to use. For example, if you set a memory limit of 1 GiB for a container, that had a process running that grew much larger than that.
- a storage issue is about disk space, and the ephemeral-storage message makes it clear that is what is happening here.
A container can access on-disk storage in different ways. Common ones are to mount an existing host directory within the container or to request a persistent-volume. (Storage | Kubernetes) If you create a container that hasn’t configured space in one of those specific ways, the disk space used in the container is called ephemeral-storage. (It’s called ephemeral because it makes no guarantees about durability or persistence.) That space has to be backed by storage somewhere in the host, and that frequently is in /var. (/var/lib/docker in configurations I’ve used)
When /var fills up, I’ve seen errors about ephemeral storage. It says it’s trying to clean up > 255GB used. Do you expect that much disk space used in your container? Perhaps you need to make a specific storage configuration for that capacity?
Thank you! That’s very clear.
Do you have an idea how I could find which containers are using this quantity and type of disk space?
My first guess would be that my docker images themselves are huge so it could come from here.
Also, I have two pods mounting volume with microk8s-hostpath, does that count as ephemeral storage as it is also saved in /var?
Thanks, Have a good day.
Marine
Thanks to @hopedata for the explanation.
@marinechaput most if not all data used by MicroK8s are stored in /var/snap/microk8s
.
Its good to see howucj storage you still have left.
df -h
command can show you that.
Ok Indeed I see the problem.
df -h /var/snap/microk8s
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 59G 54G 2,2G 97% /
What can I do to fix it? I already some time ago relocated the storage location in nfs mounting point very large (following this ubuntu 18.04 - How to change microk8s kubernetes storage location - Stack Overflow)
df -h /mnt/data6/run/containerd/
Filesystem Size Used Avail Use% Mounted on
/dev/sdg 2,7T 2,3T 321G 88% /mnt/data6
Thanks !
I don’t what workload you are running. But in terms of addons are you using the registry
and storage
addon?
If you are using the storage
addon, that can be a source of what is in the /var/snap/microk8s
.
I will first check the content of this directory and see what is taking up the space.
du
command can help you with that.
yes I am using the registry + the storage.
And it is for running a jupyterhub. Each jupyterlab instance spawned for each user comes with a persistent dynamique storage (nfs) creating a persistent folder in the /var/snap/microk8s/common/default-storage/ which is now around 15G.
So if I relocated this dynamic storage in one of my mount disk, where I have a lot of space that should fixed my problem right ?
Thanks.
Yes i think so. Just clarifying, so your nfs server is using the storage
addon persistent volume?
Yes indeed I am using the storage addon for persistent volume.
Thanks for your help. I will try to do that.