Processes with random resource needs

My organization is currently converting our product suite to run within kubernetes. Within this suite we have a set of roughly 400 utility programs that do something then terminate. The problem is each of these utilities have radically different resource needs and we have never had a need to calculate them. to compound this individual utilities can also have radically different resource needs based on input parameters.

For the sake of this discussion lets assume that any individual execution can take 1 second to 1 week to run and take 1mb to 128gb of ram. We have never had a need to calculate solid numbers for any of the tools so also for the sake of this discussion assume these numbers are random. The only saving grace here is all these tools are single threaded so a limit of one CPU is acceptable.

My question is how can we run these loads in k8s? Running without resource requests or limits on a dedicated node using affinities is my best answer so far. Unfortunately, we want to support multiple versions of k8s and various k8s like environments (openshift, aws, etc). Some of those do not allow for scheduling “best effort” pods like this. Also unfortunately, rewriting these utilities to be cloud friendly is not in the cards either.

Does anyone have any suggestions or maybe point to a feature I’m overlooking? Do these loads even belong in a containerized/k8s environment?


My approach would be to use prometheus and grafana. Start logging what’s using how much memory & CPU. Figure out sane baselines.

You can start in test environments and work your way to production with sane baselines. You want to be logging metrics like this regardless. Just expect that production results may vary depending on what you’re doing exactly. This mindset will prepare you for tuning those resource request values over time.