Exit code 137 - Pods terminated

Yashwanth_Yellapraga · January 14, 2020, 8:15am

Hello Team! I am relatively new to k8s and am hoping I can learn a lot from you all!

I need some advise on an issue I am facing with k8s 1.14 and running gitlab pipelines on it. Many jobs are throwing up exit code 137 errors and I found that it means that the container is being terminated abruptly.

Cluster information:

Kubernetes version: 1.14
Cloud being used: AWS EKS
Installation method: EKS
Host OS: Amazon Linux
Node: c5.4xlarge

After digging in, I found the below logs:

kubelet: I0114 03:37:08.639450 4721 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 95% which is over the high threshold (85%). Trying to free 3022784921 bytes down to the low threshold (80%).
kubelet: E0114 03:37:08.653132 4721 kubelet.go:1282] Image garbage collection failed once. Stats initialization may not have completed yet: failed to garbage collect required amount of images. Wanted to free 3022784921 bytes, but freed 0 bytes
kubelet: W0114 03:37:23.240990 4721 eviction_manager.go:397] eviction manager: timed out waiting for pods runner-u4zrz1by-project-12123209-concurrent-4zz892_gitlab-managed-apps(d9331870-367e-11ea-b638-0673fa95f662) to be cleaned up
kubelet: W0114 00:15:51.106881 4781 eviction_manager.go:333] eviction manager: attempting to reclaim ephemeral-storage
kubelet: I0114 00:15:51.106907 4781 container_gc.go:85] attempting to delete unused containers
kubelet: I0114 00:15:51.116286 4781 image_gc_manager.go:317] attempting to delete unused images
kubelet: I0114 00:15:51.130499 4781 eviction_manager.go:344] eviction manager: must evict pod(s) to reclaim ephemeral-storage
kubelet: I0114 00:15:51.130648 4781 eviction_manager.go:362] eviction manager: pods ranked for eviction:

runner-u4zrz1by-project-10310692-concurrent-1mqrmt_gitlab-managed-apps(d16238f0-3661-11ea-b638-0673fa95f662)
runner-u4zrz1by-project-10310692-concurrent-0hnnlm_gitlab-managed-apps(d1017c51-3661-11ea-b638-0673fa95f662)
runner-u4zrz1by-project-13074486-concurrent-0dlcxb_gitlab-managed-apps(63d78af9-3662-11ea-b638-0673fa95f662)
prometheus-deployment-66885d86f-6j9vt_prometheus(da2788bb-3651-11ea-b638-0673fa95f662)
nginx-ingress-controller-7dcc95dfbf-ld67q_ingress-nginx(6bf8d8e0-35ca-11ea-b638-0673fa95f662)
alertmanager-768d89dcc8-4hxj6_prometheus(d4e6f161-3651-11ea-b638-0673fa95f662)
kube-proxy-bpqm7_kube-system(4e307fee-35c6-11ea-b638-0673fa95f662)
aws-node-rc8rw_kube-system(4e30a734-35c6-11ea-b638-0673fa95f662)

And then the pods get terminated resulting in the exit code 137s.Can anyone help me understand the reason and a possible solution to overcome this?

tomasz.prus · January 14, 2020, 9:25pm

Hi.

It seems that your applications (started by gitlab runner) write a lot of data (logs, artifacts, cache?) and the node can’t hold them so the eviction manager deletes some of them … “must evict pod(s) to reclaim ephemeral-storage”.

As a solution you can try to use bigger disk for nodes, attach an additional volume to the pods (Kubernetes executor | GitLab), reduce number of parallel runners…

Yashwanth_Yellapraga · January 16, 2020, 6:09am

Hello Tomasz,

The nodes initially had 20G of ebs volume and on a c5.4xlarge. I increased it to 50 and 100G but that did not help. So I did not know if that was supposed to solve the problem. But after you advising to do the same, I changed the instance type to c5d.4xlarge which had 400GB of cache storage and gave 300GB of EBS. This solved the error.

Thanks for confirming the solution.

Best,
Yash.

Topic		Replies	Views
Garbage collection issue, disk space not freed General Discussions	1	1201	February 15, 2023
Kubeadm init fails. Image garbage collection failed once; General Discussions	0	3909	June 21, 2019
The ephemeral storage issue in my Kubernetes cluster General Discussions	4	23445	September 11, 2020
Why is kubelet cleaning up more images than the defined LowThresholdPercent? General Discussions	0	47	September 23, 2024
Pod resources limit ephemeral-storage General Discussions development	0	106	December 18, 2024

Exit code 137 - Pods terminated

Related topics