We want to use a Kubenetes cluster to process work items from a queue. Each work item will take from a few minutes to a few hours to complete (cannot predict upfront), and we want to have maximum isolation for each work item. Typical volume for the queue is about 2 million new work-items per day.
Our current plan is to have queue to store the work-items, with multiple pods running across many nodes in parallel. Each pod will take one work-item, process it (each work-item is independent of each other) and exit.
Since this means that the Kubenetes cluster needs to create millions of pod per day, we are concerned if this could pose an issue in the long run (e.g., there will be close to one billion pods created each year, could Kubenetes DB that’s used to track the pods have performance issues)? We are wondering if there is a limit (or recommended limit about how many pods to create per day)?
You will have to delete pods as soon as they are in Terminated state, because a node can support only 100 pods at the same time (this number is configurable and depends of your Kubernetes flavour)
Thanks, feloy! Yeah, that’s a good point. Since we write logs to volumes, there is no need to keep the completed pods around, we can delete them immediately after completion.
Our concern is mainly about the long-term impact for the throughput level, not sure if Kubernetes still keeps the completed pod info somewhere, so that overtime the tracking DB will be very large. Or there should be no concern and the pod info will be completely gone after they are deleted?
I’m practically sure the pods data will be deleted from the etcd database when you delete pods. You can verify the size of the etcd database to see its expansion (due to some sort of fragmentation for example) by getting snapshots at regular time. More info on etcd on this page (especially built-in snapshot): Operating etcd clusters for Kubernetes - Kubernetes