Fluentd logs memory overflow

Hi,
My Box is Ubuntu 18.04 with microk8s.
I enabled Fluentd addon. It works correct but the log memory buffer grows indefinitely and eventually the machine dies by lack of memory. How can I make fluentd rotate logs in memory, or remediate to this issue?
Thanks for for you advices.
GB

I am not sure it is fluentd consuming the memory. It can very well be elasticsearch which comes with the fluentd addon.
Elasticsearch uses a lot of memory though but im curious to know what is your system’s allocated memory.

The machine is an Azure VM 8 cores 32 GO RAM. I was also coming to the conclusion that’s an Elasticsearch issue. fluentd collects all kube-system logs and also some application logs. The consumption / leakage is approximately 100 MiB / hour. Since 50 pods run (low workload however), the cluster dies in a few days. I read several mailing lists on this topic, but no actual clue on what to do to fix the issue. Any idea is welcome.
GB

BTW, Elasticsearch consumes 1500 MiB just after startup.

This is strange. Both pods have memory resource limits defined. Here.

If the pod utilizes more than that ideally it will be OOMkilled.
Running top may show some sign. Or enabling metrics-server may also help track which pod can be misbehaving.
We’ve run the same fluentd image for several years and it never brought down a server.

Thanks, I looked at the config, yes it looks good !
But I see that RBAC is invoked in this config. RBAC is not enabled on the cluster because it is in my playground. May be I have to enable it prior to enable fluentd addon ? Can it be the root cause of the issue ?
Additionally I added the two lines on the kublet file to rotate docker log files.
–container-log-max-files=7
–container-log-max-size=10Mi
Otherwise, I use microk8s right out of the box.

I installed the metrics server dans looking at the config I discovered


this error. It look that this problem is linked to RBAC : https://github.com/ubuntu/microk8s/issues/729

Enabling rbac will probably resolve that error you are seeing in the dashboard.
When you say machine fails, meaning it becomes unresponsive? What does top command show which process using the most memory.

I have not recorded the top. The machine becomes extremely slow to respond, for instance moving the cursor in the console takes 20 or 30 seconds, the CPU is 400 % and total free ram is a few Mi.
I have posted on Github as you seen, I have redone an install on a VM but with insufficient memory :frowning: So I am doing a new one with 6 Gi on my local workstation. I keep you informed of the ability to reproduce the problem or not.
Best regards
GB