One GKE, I have noticed that the is something like a management users that logs into at once.
I assume this is so Google can perform rolling updates and various other updates.
However, if the node is under pressure (ie. A deployment does not have a CPU limit imposed and uses the host’s entire CPI) the management user logs in again and keeps the old ssh connection open.
This has become an issue for me as it stops the node being able to fork processes, thus stopping being able to spin up any new deployments.
My current solution is to kill sshd and all ssh sessions on the node then restart it from the gcp console but this feels like a bug on GKE side and I wonder who is best to speak to about this.