Cattle-node-agent and Canal system pods stuck in Updating status

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version:client 1.12 , server 1.15
Rancher version: v2.3.1
Cloud being used: (put bare-metal if not on a public cloud) ON PREMISES
Installation method: Manual
Host OS: redhat enterprise 7.6
CNI and version: Canal
CRI and version: docker 19.03.5

i wasn’t able to copy paste the yaml file .
You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

we have 9 VMs nodes and 6 physical nodes all part of kubernetes cluster. it happens sometimes that the cattle-node-agent and Canal system pod status gets to unavailable on one of the physical nodes , it get stuck until we delete the system pods to get recreated.

what could be the problem that can cause such behavior

Yaml file
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: “2021-03-31T07:00:42Z”
generateName: justice-box2-668795c9d9-
labels:
app.kubernetes.io/instance: justice-box2
app.kubernetes.io/name: justice-box2
ddi-radius: justice-radius-svc
pod-template-hash: 668795c9d9
name: justice-box2-668795c9d9-nnwh9
namespace: default
ownerReferences:

  • apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: justice-box2-668795c9d9
    uid: c44d7f10-e5da-40c7-9bb7-19fd4ffc6fbb
    resourceVersion: “82792333”
    selfLink: /api/v1/namespaces/default/pods/justice-box2-668795c9d9-nnwh9
    uid: a11daa16-813c-4500-b200-e790e401e495
    spec:
    containers:
  • command:
    • sh
    • -c
    • cp /ddi/profiles/run-dynamic.sh .; chmod +x run-dynamic.sh; ./run-dynamic.sh
      image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine:2.1.202
      imagePullPolicy: Always
      lifecycle:
      postStart:
      exec:
      command:
      - sh
      - -c
      - cp /ddi/profiles/poststart-dynamic.sh .; chmod +x poststart-dynamic.sh;
      ./poststart-dynamic.sh
      preStop:
      exec:
      command:
      - sh
      - -c
      - cp /ddi/profiles/prestop-dynamic.sh .; chmod +x prestop-dynamic.sh; ./prestop-dynamic.sh
      name: ddi-engine
      resources:
      limits:
      hugepages-1Gi: 320Gi
      memory: 183Gi
      requests:
      hugepages-1Gi: 320Gi
      memory: 183Gi
      securityContext:
      capabilities:
      add:
      • SYS_ADMIN
        privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
    • mountPath: /tmp
      name: path-sockets
    • mountPath: /ddi/statsfiles
      name: path-statsfiles
    • mountPath: /var/log/ddi_engine
      name: path-logfiles
    • mountPath: /dev
      name: path-dev
    • mountPath: /sys
      name: path-sys
    • mountPath: /hugepages
      name: hugep
    • mountPath: /ddi/oamfiles
      name: path-engine-oam
    • mountPath: /ddi/pcapfiles
      name: path-sftp-pcapfiles
    • mountPath: /ddi/ce_upgrade_binaries
      name: path-ce-binaries-upgrade
    • mountPath: /ddi/profiles
      name: engine-profiles-conf
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-527nt
      readOnly: true
  • args:
    • /usr/deploy/init_sftp sftp_usr:DDIalfa2019:1001; chown -R sftp_usr /home/sftp_usr;
      java -Dlogback.configurationFile=logback.xml -Dconfig.file=application.conf
      -Dddi.fake_LBS_assignments=false -Dddi.web.rest.url=https://ddi-central-web-app:9443/sharedApi/traffic-policies/site-box-policy
      -Dddi.web.rest.ssl=true -Dddi.control.traffic.pcap.retention.scheduler.interval.minutes=2
      -Dddi.control.traffic.pcap.scheduler.interval.minutes=2 -Dddi.control.traffic.pcap.retention.total.size.mb=500000
      -Dddi.control.traffic.pcap.retention.file.minutes=4320 -Dddi.sftp.pool.max.connections.per.host=10
      -Dddi.sftp.pool.use.scp=false -Dddi.control.traffic.pcap.folder=/home/sftp_usr/pcaps
      -Dddi.control.submap.collector.output.folder=/home/sftp_usr/histsubmap -Dddi.local.timezone=Asia/Beirut
      -Dddi.control.ce.binaries.folder.upgrade=/home/sftp_usr/ce/upgrade -DrootLogLevel=INFO
      -jar ddi-engine-control-sidecar.jar
      env:
    • name: NODE_HOSTNAME
      value: justice-box2
    • name: SITE_NAME
      value: justice
    • name: RADIUS_SECRET
      value: alfasecret
      image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine-sidecar:2.1.202
      imagePullPolicy: Always
      name: ddi-engine-sidecar
      ports:
    • containerPort: 5001
      hostPort: 5001
      name: grpc
      protocol: TCP
    • containerPort: 8813
      hostPort: 8813
      name: radius
      protocol: UDP
    • containerPort: 8022
      hostPort: 8022
      name: sftp
      protocol: TCP
      resources:
      limits:
      cpu: “7”
      requests:
      cpu: “5”
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
    • mountPath: /tmp
      name: path-sockets
    • mountPath: /var/log/ddi_engine
      name: path-logfiles
    • mountPath: /var/log/ddi_engine/metrics
      name: path-sidecar-oam
    • mountPath: /home/sftp_usr/statsfiles
      name: path-sftp-statsfiles
    • mountPath: /home/sftp_usr/histsubmap
      name: path-sftp-histsubmap
    • mountPath: /home/sftp_usr/pcaps
      name: path-sftp-pcapfiles
    • mountPath: /etc/ddi-ssl
      name: tls-mnt
      readOnly: true
    • mountPath: /home/sftp_usr/ce/upgrade
      name: path-ce-binaries-upgrade
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-527nt
      readOnly: true
  • args:
    • -c
    • printf ‘\n[ddiaof]\n path = /ddiaof\n read only = no\n’ >> /etc/rsyncd.conf;
      rsync --daemon --no-detach --port=1873 --dparam=uid=root --dparam=gid=root
      command:
    • /bin/sh
      image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-rsync:2.1.202
      imagePullPolicy: Always
      name: ddi-engine-rsyncdaemonredis
      ports:
    • containerPort: 1873
      hostPort: 1873
      name: rsync
      protocol: TCP
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
    • mountPath: /ddiaof
      name: path-redis-aof
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-527nt
      readOnly: true
      dnsPolicy: ClusterFirstWithHostNet
      enableServiceLinks: true
      hostNetwork: true
      nodeName: ddi-jus-2.corp.alfamobile.com.lb
      nodeSelector:
      kubernetes.io/hostname: ddi-jus-2.corp.alfamobile.com.lb
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 10
      tolerations:
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    volumes:
  • configMap:
    defaultMode: 420
    name: engine-justice-box2-deploy-config
    name: engine-profiles-conf
  • emptyDir: {}
    name: path-sockets
  • hostPath:
    path: /ddi_data/statsfiles
    type: DirectoryOrCreate
    name: path-statsfiles
  • hostPath:
    path: /dev
    type: Directory
    name: path-dev
  • hostPath:
    path: /sys
    type: Directory
    name: path-sys
  • emptyDir:
    medium: HugePages
    name: hugep
  • hostPath:
    path: /ddi_data/redisaof
    type: DirectoryOrCreate
    name: path-redis-aof
  • hostPath:
    path: /ddi_data/log
    type: DirectoryOrCreate
    name: path-logfiles
  • hostPath:
    path: /ddi_data/metrics/engine
    type: DirectoryOrCreate
    name: path-engine-oam
  • hostPath:
    path: /ddi_data/metrics/sidecar
    type: DirectoryOrCreate
    name: path-sidecar-oam
  • hostPath:
    path: /ddi_data/statsfiles
    type: DirectoryOrCreate
    name: path-sftp-statsfiles
  • hostPath:
    path: /ddi_data/histsubmap
    type: DirectoryOrCreate
    name: path-sftp-histsubmap
  • hostPath:
    path: /ddi_data/pcaps
    type: DirectoryOrCreate
    name: path-sftp-pcapfiles
  • name: tls-mnt
    secret:
    defaultMode: 420
    secretName: ddi-central-web-app
  • hostPath:
    path: /ddi_data/ceupgrade
    type: DirectoryOrCreate
    name: path-ce-binaries-upgrade
  • name: default-token-527nt
    secret:
    defaultMode: 420
    secretName: default-token-527nt
    status:
    conditions:
  • lastProbeTime: null
    lastTransitionTime: “2021-03-31T06:55:27Z”
    status: “True”
    type: Initialized
  • lastProbeTime: null
    lastTransitionTime: “2021-04-10T09:09:03Z”
    status: “False”
    type: Ready
  • lastProbeTime: null
    lastTransitionTime: “2021-03-31T06:55:46Z”
    status: “True”
    type: ContainersReady
  • lastProbeTime: null
    lastTransitionTime: “2021-03-31T07:00:42Z”
    status: “True”
    type: PodScheduled
    containerStatuses:
  • containerID: docker://7a5760de854307fd17ec2df3a60a25abd7b9d8a70d7525b5782ff1a88654bf9a
    image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine:2.1.202
    imageID: docker-pullable://ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine@sha256:7f8a93a1098c558d005a186ef73d5e63ff54a94854d01a08c1c2df67c30346e1
    lastState: {}
    name: ddi-engine
    ready: true
    restartCount: 0
    state:
    running:
    startedAt: “2021-03-31T06:55:42Z”
  • containerID: docker://5c0a1063e69e614ad20cc9916931beef8d33ba829b1c5e938e9bf31c66d6dcfc
    image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-rsync:2.1.115
    imageID: docker-pullable://ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-rsync@sha256:1dcbcf998b164408a03845e93a2697dc8d4f9b405489c11681f1d9ac1f7a2f8b
    lastState: {}
    name: ddi-engine-rsyncdaemonredis
    ready: true
    restartCount: 0
    state:
    running:
    startedAt: “2021-03-31T06:55:46Z”
  • containerID: docker://866adaeebfac271486baad0d7de3e319889313aed14fbd43cd3e8f63cfa6ee24
    image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine-sidecar:2.1.202
    imageID: docker-pullable://ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine-sidecar@sha256:3864a727e43fafbe8965cd52c9cdfe5c81e5aa48554ceff90f2083688f84ea59
    lastState: {}
    name: ddi-engine-sidecar
    ready: true
    restartCount: 0
    state:
    running:
    startedAt: “2021-03-31T06:55:45Z”
    hostIP: 192.168.233.22
    phase: Running
    podIP: 192.168.233.22
    qosClass: Burstable
    startTime: “2021-03-31T06:55:27Z”