Static pod problem: mirror pod not created if imagePullSecrets set

vas · May 19, 2021, 2:19am

Dear Colleagues,

I’ve created a static pod as described in Create static Pods | Kubernetes and I see the mirror pod all right in kubectl get pods. However, if I add a list of imagePullSecrets to the static pod spec, a mirror pod is not created though the pod itself is seems running (I see it in docker container ls output on its node). As soon as I remove “imagePullSecrets” from the static pod manifest, I am able to see the mirror pod again.

What am I doing wrong and where do I look for debug info?

K8s v1.21.1 on Debian Buster amd64, non-cloud setup with kubeadm.

vas · May 19, 2021, 7:30am

I could probably use a regular pod with a nodeSelector instead of a static pod. However, I cannot make a regular pod restart after its node has been down for a while. I think the controller forgets about it (the pod enters the Terminating state) and never comes back alive when its node comes back alive. I have to reapply the pod definition to make it run again.

mrbobbytables · May 19, 2021, 11:21am

Static pods bypass admission control and don’t have access to secrets, they are controlled by kubelet directly and not a higher level controller. Their direct use highly discouraged unless you have to control something completely out of band from kubernetes directly.

mrbobbytables · May 19, 2021, 11:30am

Have you looked at DaemonSets ? Their design purpose is to run on a per node basis and might fit your use-case better.

thepip3r · May 19, 2021, 11:20pm

Or Deployment or StatefulSet – but don’t use a static Pod…

vas · May 20, 2021, 2:35am

All right, I won’t use a static pod. But questions remain:

Can a Deployment be bound to one particular node, like a pod with a nodeSelector?
Is a regular pod not supposed to be restarted when its node is alive again after a downtime?

vas · May 20, 2021, 2:37am

I thought DaemonSets were designed to work on every node of the cluster. My purpose is kind of opposite: restrict a pod to a particular node and make it survive the node’s downtime.

thepip3r · May 20, 2021, 2:56am

Yes you can, however, that kind of defeats the purpose of Kubernetes (implicit container orchestration – a sort of built-in HA). If you’re in a situation where you have workers with different hardware, and you want to allow the Deployment to prefer a particular set of nodes, then you can use nodeSelector or affinities:

Even though the example shows static pods, it’ll work with Deployments as well and then don’t have to manually do a bunch of other stuff that’s automatically done for you in Deployments.

If you specify it as a deployment, think of it like a declaration to Kubernetes that you always want the pod(s) defined in the Deployment to always be up. So lets assume you have a Deployment specified for a single pod of some application. Once deployed successfully, Kubernetes will run that pod on one of your workers. If something happens to that worker, the Deployment will automatically spin up the pod on another available worker (constrained by your other question) – another available worker that meets the selector labels you apply. If there none available, the pod will fail.

However, lets assume you don’t apply labels and all of your workers are equal – if the node dies or pod dies on a worker, the Deployment will automatically spin up the pod on the same worker (if it hasn’t died) or another worker (based on a bunch of different algorithms). In this way, your pod will only be offline for however long it takes your pod to spin up + a little time for Kubernetes to figure out it’s dead.

If the pod itself dies, the spin-up will be pretty much immediate. If the worker dies that the pod is running on, I’ve run into situations where it takes a few minutes for K8s to figure out it has died to spin it back up on another worker.

vas · May 20, 2021, 4:15am

On what level do I place a nodeSelector in a Deployment declaration? In the deployment spec or in the template spec?

If the worker dies that the pod is running on, I’ve run into situations where it takes a few minutes for K8s to figure out it has died

More than 5 minutes in my experience.

thepip3r · May 20, 2021, 2:39pm

Here’s is an example with affinities using Deployments: Implement Node and Pod Affinity/Anti-Affinity in Kubernetes: A Practical Example – The New Stack

Yes, ours was about 5 minutes also. However, there are add-ons to address this unique situation (ie your workers shouldn’t be dying often). In the more common case where the pod dies, this is not a problem and the container spins back up immediately.

mrbobbytables · May 20, 2021, 3:37pm

5min is the default. You can tune the settings for reporting unhealthy and pod eviction

vas · May 21, 2021, 2:10am

Looks like in the template spec.

vas · May 21, 2021, 2:55am

If you know a way to change the pod-eviction-timeout (are you talking about this parameter?), can you please share how to do it in a working cluster?

I’ve tried adding --pod-eviction-timeout to kube-controller-manager but it did not effect anything, there’s the same 5 minutes timeout after a Node poweroff.

mrbobbytables · May 21, 2021, 12:35pm

Okay, I had to do a little digging and the k8s docs should be updated. >_>

pod-eviction-timeout works IF the TaintBasedEvictions featuregate is set to false.

This is because there is a newer preferred way to control these settings is to use Taints and Tolerations. These let you control it on a per pod level e.g.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-eviction
spec:
  replicas: 2
  selector:
    matchLabels:
      eviction: "true"
  template:
    metadata:
      labels:
        eviction: "true"
    spec:
      tolerations:
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 10
     - key: "node.kubernetes.io/not-ready"
       operator: "Exists"
       effect: "NoExecute"
       tolerationSeconds: 10
      containers:
      - image: busybox
        command:
        - sleep
        - "3600"
        name: busybox

Alternatively, you CAN set defaults at the cluster level, but this is controlled by by the kube-apiserver since its an admission controller setting for DefaultTolerationSeconds.

The two settings are:

--default-not-ready-toleration-seconds int     Default: 300
Indicates the tolerationSeconds of the toleration for notReady:NoExecute
that is added by default to every pod that does not already have such a
toleration.

--default-unreachable-toleration-seconds int     Default: 300
Indicates the tolerationSeconds of the toleration for unreachable:NoExecute
that is added by default to every pod that does not already have such 
 toleration.

vas · May 24, 2021, 2:38am

I have set kube-apiserver --default-not-ready-toleration-seconds=60 --default-unreachable-toleration-seconds=60 ... but the time before K8s begins to recreate pods is still over 5 minutes.

vas · May 24, 2021, 9:14am

I had to delete all deployments and create them anew for this setting to work. Still, with --default-not-ready-toleration-seconds=60 --default-unreachable-toleration-seconds=60 it takes almost 2 minutes (instead of the expected 1 minute) for the services to come online again.

Topic		Replies	Views
Does K8s pulls image from registry while creating a pod in replicaset scaling General Discussions deployment	4	800	March 9, 2023
Pods recreation rules General Discussions development	5	875	February 16, 2024
Understanding PVC, stateful sets and restarted pods General Discussions	1	1222	February 5, 2020
Pods terminating on downed node General Discussions	4	921	March 28, 2019
Why "kubectl get pods" show pod still running while its node was poweroff? General Discussions	3	3761	March 20, 2021

Static pod problem: mirror pod not created if imagePullSecrets set

Related topics