Fluentd logs memory overflow

geekbot · July 29, 2020, 9:06pm

Hi,
My Box is Ubuntu 18.04 with microk8s.
I enabled Fluentd addon. It works correct but the log memory buffer grows indefinitely and eventually the machine dies by lack of memory. How can I make fluentd rotate logs in memory, or remediate to this issue?
Thanks for for you advices.
GB

balchua1 · July 29, 2020, 9:40pm

I am not sure it is fluentd consuming the memory. It can very well be elasticsearch which comes with the fluentd addon.
Elasticsearch uses a lot of memory though but im curious to know what is your system’s allocated memory.

geekbot · July 29, 2020, 10:29pm

The machine is an Azure VM 8 cores 32 GO RAM. I was also coming to the conclusion that’s an Elasticsearch issue. fluentd collects all kube-system logs and also some application logs. The consumption / leakage is approximately 100 MiB / hour. Since 50 pods run (low workload however), the cluster dies in a few days. I read several mailing lists on this topic, but no actual clue on what to do to fix the issue. Any idea is welcome.
GB

geekbot · July 29, 2020, 10:32pm

BTW, Elasticsearch consumes 1500 MiB just after startup.

balchua1 · July 29, 2020, 11:26pm

This is strange. Both pods have memory resource limits defined. Here.

If the pod utilizes more than that ideally it will be OOMkilled.
Running top may show some sign. Or enabling metrics-server may also help track which pod can be misbehaving.
We’ve run the same fluentd image for several years and it never brought down a server.

geekbot · July 30, 2020, 5:54am

Thanks, I looked at the config, yes it looks good !
But I see that RBAC is invoked in this config. RBAC is not enabled on the cluster because it is in my playground. May be I have to enable it prior to enable fluentd addon ? Can it be the root cause of the issue ?
Additionally I added the two lines on the kublet file to rotate docker log files.
–container-log-max-files=7
–container-log-max-size=10Mi
Otherwise, I use microk8s right out of the box.

I installed the metrics server dans looking at the config I discovered

this error. It look that this problem is linked to RBAC : https://github.com/ubuntu/microk8s/issues/729

balchua1 · July 30, 2020, 11:34am

Enabling rbac will probably resolve that error you are seeing in the dashboard.
When you say machine fails, meaning it becomes unresponsive? What does top command show which process using the most memory.

geekbot · July 30, 2020, 1:51pm

I have not recorded the top. The machine becomes extremely slow to respond, for instance moving the cursor in the console takes 20 or 30 seconds, the CPU is 400 % and total free ram is a few Mi.
I have posted on Github as you seen, I have redone an install on a VM but with insufficient memory So I am doing a new one with 6 Gi on my local workstation. I keep you informed of the ability to reproduce the problem or not.
Best regards
GB

geekbot · August 26, 2020, 9:16am

It comes from the gossip of system logs which which fill ElasticSearch index at high speed. I can’t manage to stop this noise source.
It fills 1,5 GBi per day for no activity.
Here is the reporting of Elasticsearch on yesterday index.
yellow open logstash-2020.08.25 1aIWCccbTmSK-t2vqbUwXA 5 1 244222 0 1.5gb 1.5gb

The log is repeated : ERROR: logging before flag.Parse: E0826 09:12:22.951717 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: v1.NodeList: Items: v1.Node: v1.Node: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 1550 …:{},“k:{”… at {“kind”:“NodeList”,“apiVersion”:“v1”,“metadata”:{“selfLink”:"/api/v1/nodes",“resourceVersion”:“22659126”},“items”:[{“metadata”:{“name”:“copservices”,“selfLink”:"/api/v1/nod

If somebody knows how to stop this noisy log, I would be grateful.
Best regards
GB

balchua1 · August 26, 2020, 11:46am

I couldn’t determine where this is coming from. It is possible it comes from the kubernetes control plane components like apiserver, kubelet, controller manager etc.

If it is from the control plane you can opt to exclude these logs from being fed to elasticsearch.

In the kube-system namespace you will find the fluentd config. The ConfigMap name is something like this fluentd-es-config-v0.2.0. do a kubectl edit of that configmap.

You will find the section system.input.conf:. From there you can delete the section where kubelet, etcd, kube-proxy, …

After saving it, you will have to bounce the fluentd DaemonSet. It should reduce your elastic index sizes.

Just a reminder that doing this will force you to check the system journal for logs related to the control plane components.

geekbot · August 27, 2020, 12:40pm

Thank you @balchua1 , I look at the config and I tell you. I saw on some mailing lists that our cluster is not the only one suffering this issue. I keep you informed.
Best regards
GB

GustafHultgren · October 12, 2020, 8:05am

I’m trying to get around this problem with using elastic search built in life cycle management tool with automatic snapshots and log rotation following the docs

I’m currently stuck with appending the elasticsearch.yml config file on startup with something like

         command:
         - /bin/bash
         - -c
         - |
           echo 'path.repo: [\"/var/backups\"]' >> /usr/share/elasticsearch/config/elasticsearch.yml'

Has anyone else gone this route?

Seems like snapshot lifecycle management is not available in the open source version so it was a dead end.

geekbot · October 15, 2020, 2:42pm

Hello, Yes I think it is not in OSS version. I intend to do a Kubernetes CronJob which does the deletion call to the ES REST API. It might be possible with a very short script.
GB

Topic		Replies	Views
Fluentd /elastic search default conf: changes and storage microk8s	1	913	October 27, 2021
How to collect container log on Fluentd microk8s	3	2645	June 25, 2020
Add on: fluentd microk8s docs	12	10593	May 28, 2023
Kibana returns : Kibana did not load properly. Check the server output for more information microk8s	29	11348	February 5, 2023
[Security Advisory] fluentd-elasticsearch addon updates for log4j vulnerabilities Announcements	0	782	December 24, 2021

Fluentd logs memory overflow

Related topics