GKE scaled-down nodes won't terminate

fduran · February 13, 2019, 4:33pm

Hello,

(This is a GKE-specific question, if there’s a better forum please let me know).

I have a cluster that can horizontally autoscale , I’ve used both regular and preemptible node pools and in both cases, after the cluster scales up properly due to CPU workload and then scales down after the workload is done, there’s always 1-3 nodes that stay alive “forever” (several days) with no pods on it, only systems resources, exactly these three:

kube-proxy, metrics-server, metadata-agent, fluentd, using a total of 0.3 CPU and 400MB

Is there a way to tune k8s so these nodes will be terminated?

Thanks!

matti · February 13, 2019, 6:40pm

GKE manages the number of replicas very slowly… If you try to set replicas, the master will just set them back.

I run three+ node pools in gke for this reason:

gke base pool 1-3 (zonal/regional) nodes (f1micro/g1small)
gke autoscaling pool 0…n nodes (f1micro/g1small)
my own pools with taint NO_EXECUTE that my workload tolerates, but GKE system pods don’t.

This ensures that some GKE workloads scale, but don’t leak in to my nodes preventing scale down. Removing autoscaling GKE pool might be okay if your cluster does not get too large.

This setup allows you to also manually scale down the node autoscaling pool. You can also scale the base pool to zero if manually scale it back up when you need the cluster again.

There is an github issue open about this, but I can’t find it.

matti · February 15, 2019, 5:51am

Vitaly_il · April 15, 2019, 1:05pm

Do you have any idea if this bug still relevant in GKE v1.12.6-gke.7?
It seems that answer is ‘yes’ because I encountered the same behaviour - no downscale.
@matti - thank you for suggested workaround, but is there a chance that simpler solution exist?
For example, can’t we just reconfigure ’ kube-system’ pods do not use ‘critical-pod’ annotation?

Thanks,
Vitaly

matti · April 16, 2019, 5:36am

nope, welcome to Managed Kubernetes by google

Chris_Furlong · September 21, 2019, 10:40pm

Is there a solution to this problem? The pod disruption budget workaround no longer seems to be effective, and the taint/toleration solution doesn’t work either now that a few of the system Pods ignore all taints.

ragyibrahim · March 18, 2021, 8:10am

Im having the same issue here and it’s actually costing our business a fair bit of unnecessary money. Does anyone have any updates on this?? Appreciate the help

Topic		Replies	Views
GKE won't scale down General Discussions	1	1619	January 4, 2020
`kube-dns-autoscaler` preventing GKE standard cluster to scale down General Discussions	2	1336	October 18, 2023
Autoscaler not activating nodes from nodepool with CPUs General Discussions	0	17	June 20, 2025
GKE auto-pilot and node-pools General Discussions	0	378	August 26, 2021
How can I stop restarting completed job pod after scale down General Discussions	0	725	December 21, 2022

GKE scaled-down nodes won't terminate

Related topics