Something I’m trying to figure out is how K8S handles node allocation and scheduling / what component is responsible for ensuring the desired resources. I’m porting an existing mpi job submission system to K8S. With these MPI jobs, all the nodes have to be present before executing. One particular thing I was testing is to see what K8S does when I request 8 but only have 5 nodes available. What I observed with my KIND setup up is whether I used parallelism with a job manifest OR replicas with mpi-operator manifest; it would run 5 and then sequentially do the remaining 3. In these cases I didn’t call MPI, I just wanted to see how K8S would scheduled them.
Am I wrong in expecting K8S to just hold the submission until the desired number of nodes were available or fail the submission? Is this out of the scope of K8S responsibility and a service needs to handle scheduling? Considering MPI needs all nodes up at the same time does this mean I found a bug in mpi-operator?