How To Restart Pods That Run Out Of Memory With More Memory

Howard_Roark · July 27, 2020, 9:18pm

Some of my workflows have short-lived pods that occasionally run out of memory. In cases where this happens, I would like them to be automatically restarted with double their initial memory request=limit. For example, if podA has a request=limit of 32Gi of memory and exceeds it, I would like a new pod to replace it that has a request=limit of 64Gi of memory. I was thinking of experimenting with the Vertical Pod Autoscaler for this purpose, but I don’t think it will have the desired effect because there will be many times where a given workflow will use less than 32Gi of memory and only a couple of times where it exceeds 32Gi of memory. So, I’m not sure if the Recommender would make good decisions based on the historical data, and I’m not sure if it works well with Kubernetes Jobs. I could be wrong though and will definitely try it out

That being said, I am wondering what the Community has done to solve this sort of problem where you have important production jobs running that sometimes use too much memory, and teams need those to be immediately automatically restarted with more memory. There are sometimes teams that don’t put enough thought into their memory usage for a given job, and in the middle of the night these teams need their evicted pods to be automatically restarted with more memory.

I know that I can also handle this at the Job Scheduler (ie- Jenkins, Team City, etc) level, but it would be cool if people have ideas of how to handle it within Kubernetes itself. I’m sorry that this may have been over explained-- I just wanted to be sure to cover the problem I’m solving.

Thanks,

Mike

Topic		Replies	Views
Is available memory taken into consideration when scheduling pods? General Discussions	5	772	May 17, 2024
Pod memory overflow General Discussions	1	124	December 13, 2024
Memory management, how? General Discussions	0	633	March 23, 2021
Question: How does k8s manage pod creation/deletion in cronjobs and how to specify memory/cpu requirements in CronJob? General Discussions docs , development	1	45	February 16, 2025
Changing pod resource requests/limits without restarting General Discussions	3	837	December 13, 2024

How To Restart Pods That Run Out Of Memory With More Memory

Related topics