Mongodb pods are not rescheduling when a worker is down

I have deployed a MongoDB replica set using a StatefulSet in Kubernetes, with persistent volumes (PVs) attached to an NFS server. My cluster consists of 3 master nodes and 3 worker nodes. When I shut down worker1, the MongoDB pods running on that node remain in a terminating state and are not being rescheduled to another available worker node. Can someone help identify the issue and suggest how to resolve it?

If you have not tried already, can you try below for graceful shut down of worker1 and see if it helps.

kubectl drain worker1 --ignore-daemonsets --delete-emptydir-data
kubectl delete pod <pod-name> --grace-period=0 --force
kubectl get pvc -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
# Check if the pod is trying to mount the volume and whether there are any related errors.
# Look under the Events section for any messages related to mounting or attaching volumes.
kubectl logs <pod-name> -n <namespace>
kubectl get volumeattachment

The PV needs to be cleanup with resource attachment before it can spin up new pod. Also unless specifically configured for mongodb, it may or may not spin up pod on different node and it might need new replacement node instead of same. It will depend on your mondoDB installation constraints.

could you please help me with mondoDB installation constraints, so that mongodb pods will reschedule in available worker automatically.

It will depend on your helm charts or kubernetes configuration if it has pod affinity and how node taints are setup. You have to look at its configuration. Here are some good documentation around it

I would suggest to look at your statefulset configuration like this

    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - mongo
              topologyKey: "kubernetes.io/hostname"

If you are looking for a pod to be scheduled on any available node rather than strictly enforcing it to be on a different node from other pods, you can modify the pod anti-affinity rules to use a “soft” rule. This ensures that Kubernetes prefers to schedule the pods on different nodes but won’t block scheduling if the rule cannot be satisfied.

    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - mongo
                topologyKey: "kubernetes.io/hostname"
2 Likes