Automatically killing pods with a abnormal(high) CPU Usage

kasvith · October 1, 2023, 7:40am

Hi, for one of our applications we need to monitor pods for high CPU usage(in case of an infinite loop). once we detect this abnormal CPU usage we want to terminate the pod.

But under normal circumstances, we would actually need to use HPA to do auto-scaling pods. ( based n of on req, CPU, mem)

I was thinking about following the strategies

Kill pods smartly monitor the Abnormal CPU activity for a prolonged period(1min)

We are watching for specific pods by a specific label with their CPU activities
if we find an abnormal behavior(max CPU usage for 1min) we terminate the ill pod and allow HPA to create a new one

Kill specific pods randomly

Assuming all pods under the same label are stateless, we can schedule killing 25% of random running pods each minute allowing fresh ones to start if needed
We can use a label to select the target pod group

What is the best solution we can have? and are there existing tools to accomplish this task?

Thank you.

Topic		Replies	Views
Disable hpa when cluster resource utilization is high General Discussions development	1	72	July 11, 2024
Handling Long running request during HPA Scale-down General Discussions	2	910	October 31, 2024
Optimize HPA in my cluster General Discussions	0	434	April 28, 2022
What is the use case for Pod Autoscaler given that costs are in the Nodes? General Discussions	1	591	November 25, 2019
HPA based on readiness probe? General Discussions	2	1925	December 6, 2019

Automatically killing pods with a abnormal(high) CPU Usage

Related topics