How to deal with large number of Pods

kluvi · January 16, 2020, 7:11am

Hi.
First I told you our “fuckup” story (you can skip it if you want )
We have some app, that consists of about 90 repositories. We use a lot RabbitMQ for queueing as small jobs as possible (for better scalability). In past, we doesnt use any orchestrator. Each repository contains some docker-compose.yml, that our CI server uploads to production servers and makes docker-compose down && docker-compose up. Our production server was single baremetal server with only docker installed (everything was in container). We outsourced management of the server to our provider.
Then we decided to move everything into Kubernetes. Our provider also offers managed Kubernetes clusters, so we buy 3 baremetal servers for worker nodes, each with 64 threads and 192GB RAM. Master nodes are some VMs and everything is managed by our provider. We have just access to our Namespace. So we started moving our application to Kubernetes. Everything looked great, but few days ago, we reached some “magical point” and everything goes totally shitty. The magical point was number of Pods per node. It was the limit we didnt know about. After almost everything was moved to Kubernetes, there were about 1300 running Pods (=433 Pods per Node), which is about 4x more than recommended value of 110 Pods. We have to hotfix it by merging all Pods from Deployment into single Pod with running supervisord (it was the solution we used earlier on single-node server).

My question is: How do you deal with very large amount of small Pods?

I have a few solutions in mind, but noone is ideal for our use-case:

merge Deployments into single Pods with supervisord
- numprocs (supervisord) = replicas (Deployment)
merge Deployments into 3 Pods with supervisord
- numprocs (supervisord) = replicas (Deployment) / 3
- this allows us at least benefit from HighAvailibility
merge Deployments to N Pods with supervisord
- N = for example 5, so we have 5x less Pods
split Nodes into 4 (or more) VMs, so we dont have 3 Nodes, but 12 Nodes

Each solution has some disadvantages.
1-3) all solutions with supervisord totally kills idea of liveness/readinessProbes
1-2) This kills possibility to effectively use of Horizontal Pod Autoscaler. There is also problem, that a lot of our Deployments have only 3 replicas (fast workers on small queues)
3) This is little bit scalable, but still a lot of hacking
4) This is most of finantial problem, because our provider charges us for:

managing Kubernetes
housing of our baremetal servers
managing the virtualization
managing of the additional Kubernetes Node (first 5 nodes are included in “basic” management)
if we count together the last 3 points, its more expensive than management of whole Kubernetes (which itself is not very cheap and cost-effective, but it looked, that Kubernetes has so many tech advantages, so we decided to pay for it)

tomasz.prus · January 16, 2020, 11:31pm

Hi, it’s very interesting.

what exactly have you observed? Instead of using supervisord you can also create N containers within the one Pod, but disadvantages remain.

kluvi · January 17, 2020, 6:45am

New Pods wont start. There were different errors, like not mounted volumes (from ConfigMaps, Secrets, PVs,…). Also some networking troubles - some Pods dont see some Services from time to time. I dont see much “under the hood” (because our cluster is managed by our provider), but the provider says, that a lot of resources is consumed by kubelet itself.

Theoi-Meteoroi · January 20, 2020, 7:03am

You can’t really run kubernetes without specific monitoring - observability is a problem. From the sound of it - you overrun the masters and it goes pear-shaped but can’t see it. The kubelet is the canary. Who does the kubelet talk to?

Topic		Replies	Views
We have exceeded the limit on the number of pods and need a way to enable kubernetes to support running more containers General Discussions	5	1147	June 17, 2022
Imbalance of pods in Kubernetes General Discussions	1	1132	October 27, 2019
Single Node Kubernetes in Production General Discussions	2	1776	October 24, 2018
Pod level resource constraints or large multi process containers General Discussions	0	765	July 17, 2019
Help understanding Kubernetes/Containers resources requirements compared to dedicated servers General Discussions	0	401	January 16, 2021

How to deal with large number of Pods

Related topics