My Box is Ubuntu 18.04 with microk8s.
I enabled Fluentd addon. It works correct but the log memory buffer grows indefinitely and eventually the machine dies by lack of memory. How can I make fluentd rotate logs in memory, or remediate to this issue?
Thanks for for you advices.
I am not sure it is fluentd consuming the memory. It can very well be elasticsearch which comes with the
Elasticsearch uses a lot of memory though but im curious to know what is your system’s allocated memory.
The machine is an Azure VM 8 cores 32 GO RAM. I was also coming to the conclusion that’s an Elasticsearch issue. fluentd collects all kube-system logs and also some application logs. The consumption / leakage is approximately 100 MiB / hour. Since 50 pods run (low workload however), the cluster dies in a few days. I read several mailing lists on this topic, but no actual clue on what to do to fix the issue. Any idea is welcome.
BTW, Elasticsearch consumes 1500 MiB just after startup.
This is strange. Both pods have memory resource limits defined. Here.
If the pod utilizes more than that ideally it will be OOMkilled.
top may show some sign. Or enabling
metrics-server may also help track which pod can be misbehaving.
We’ve run the same fluentd image for several years and it never brought down a server.
Thanks, I looked at the config, yes it looks good !
But I see that RBAC is invoked in this config. RBAC is not enabled on the cluster because it is in my playground. May be I have to enable it prior to enable fluentd addon ? Can it be the root cause of the issue ?
Additionally I added the two lines on the kublet file to rotate docker log files.
Otherwise, I use microk8s right out of the box.
I installed the metrics server dans looking at the config I discovered
this error. It look that this problem is linked to RBAC : https://github.com/ubuntu/microk8s/issues/729
Enabling rbac will probably resolve that error you are seeing in the dashboard.
When you say machine fails, meaning it becomes unresponsive? What does
top command show which process using the most memory.
I have not recorded the top. The machine becomes extremely slow to respond, for instance moving the cursor in the console takes 20 or 30 seconds, the CPU is 400 % and total free ram is a few Mi.
I have posted on Github as you seen, I have redone an install on a VM but with insufficient memory So I am doing a new one with 6 Gi on my local workstation. I keep you informed of the ability to reproduce the problem or not.