Lets assume that I run K8s in AWS on a node with 2 vCPUs. I would like to understand what are the best practices about pods amount vs requested CPU.
For example, let`s use these 2 scenarios:
I can set resources.requests.cpu = 1000m with maxReplicas = 2 and it will use the whole available CPUs: 1000m*2 = 2 vCPUs.
I can set resources.requests.cpu = 100m with maxReplicas = 20 and it will also use the whole available CPUs: 100m*20 = 2 vCPUs
In which scenario my system will work faster? It is better to plan more pods amount with small CPU requests or it is better to plan small amount of pods with big CPU requests? Are there any recommendation/guidelines or rather any time performance tests should be run to identify optimal configuration?
The point of requests is to provide the scheduler with information so it can figure out where to throw the pod in the cluster. Requests should be reasonable enough for the application to viably run. Limits should be set to prevent an application from starving the other applications on the same node of resources.
In addition to this, you want to scale your application with horizontal pod autoscalers. Out of the box, kubernetes favors horizontally scalable apps.