Even distribution of critical pods in K8S cluster

Hello to all!

Has anyone implemented a mechanism that will distribute critical pods for all namespaces, such DBs, evenly across workers?

ty,
Michalis

Can you be more specific on what do you want to distribute? Like many different deployments, spread as they can?

Depends on how is what you want to spread, but maybe using some labels and podaffinity can be enough for it. Have you checked it?

I don’t speak about replica sets. Let’s say i have a chart with apache-php (2rs) & a mysql.
I have too many installations of this set and i want each time of deployment that mysql pods spread evenly between workers.
e.g. My cluster consists of 4 workers and i have 8 installations of this set. I want to achieve that each worker will have 2 mysql pod eventually.

That is a common heuristic of the default scheduler.

If you want to enforce something like that, you can check the pod affinity and try to make it using that and some common labels on deployment (like “try-to-spresd: true”)

Wouldn’t that work?

I know that is a main feature via default scheduler but it doesn’t met our needs always.
I have tried podAntiAffinity in deployment but didn’t do the trick, see my snippet below:

spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: appDB
operator: In
values:
- critical
topologyKey: “kubernetes.io/hostname”
replicas: 1
template:
metadata:
labels:
app: mysql
appDB: critical

The result of:

kubectl get pod --all-namespaces -o wide |grep mysql

demo04 xxx04-mysql-5fff64fd5-22ktq 1/1 Running 0 1m test-worker0.lab
demo03 xxx03-mysql-585b665958-xvx4h 1/1 Running 0 2m test-worker3.lab
demo01 xxx01-mysql-5d9cf98cf6-sdhdn 1/1 Running 0 10m test-worker0.lab
demo02 xxx02-mysql-76c69b8c9-wfq2w 1/1 Running 0 4m test-worker0.lab

I repeat that i want to apply this scheduling across all namespaces.

Not sure what you mean that you want it in all namespaces.

If you use the affinity in all deployments that you need it, it should work. I’m not using that, though, so not sure if pod A in ns A collided with pod B in ns B.

Another (but I wouldn’t recommend as it has tons of other issues) is using hostnetwork or something like that. That way no two would be ever scheduled on the same node, as they use the same port.

But, using that it won’t be possible to schedule on the same node even if you need it (like some nodes crash and you need to serve sharing nodes).

So, I’d try with affinity and that stuff first, as the side-effect of node crashes can be exactly the opposite of what you want to do (have more resiliency) by using hostPort/hostnetwork.

Please share what you find! :slight_smile:

1 Like

The 2nd case is not an option and for me as well.
I figured to distribute the desired deployments (mysql) with podAntiAffinity across all nodes but only if i specify a list namespaces of namespaces which the labelSelector should match against, in podAntiAffinity section. See below:

    affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: appDB
              operator: In
              values:
              - criticalPod
          topologyKey: "kubernetes.io/hostname"
          namespaces:
          - demo01-ns
          - demo02-ns
          - demo03-ns
          - demo04-ns

Is there a way somehow to match all namespaces without specifying always a list (of namespaces) which is mutable.

I think these links are somehow my case:

Hello, any update on this ?

Not sure if there is any way. I’d check the API reference in the kubernetes doc to see all the available options if other docs are not clear.