Cattle-node-agent and Canal system pods stuck in Updating status

HadyZIade · April 12, 2021, 8:03am

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version:client 1.12 , server 1.15
Rancher version: v2.3.1
Cloud being used: (put bare-metal if not on a public cloud) ON PREMISES
Installation method: Manual
Host OS: redhat enterprise 7.6
CNI and version: Canal
CRI and version: docker 19.03.5

i wasn’t able to copy paste the yaml file .
You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

we have 9 VMs nodes and 6 physical nodes all part of kubernetes cluster. it happens sometimes that the cattle-node-agent and Canal system pod status gets to unavailable on one of the physical nodes , it get stuck until we delete the system pods to get recreated.

what could be the problem that can cause such behavior

HadyZIade · April 12, 2021, 8:11am

Yaml file
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: “2021-03-31T07:00:42Z”
generateName: justice-box2-668795c9d9-
labels:
app.kubernetes.io/instance: justice-box2
app.kubernetes.io/name: justice-box2
ddi-radius: justice-radius-svc
pod-template-hash: 668795c9d9
name: justice-box2-668795c9d9-nnwh9
namespace: default
ownerReferences:

apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: justice-box2-668795c9d9
uid: c44d7f10-e5da-40c7-9bb7-19fd4ffc6fbb
resourceVersion: “82792333”
selfLink: /api/v1/namespaces/default/pods/justice-box2-668795c9d9-nnwh9
uid: a11daa16-813c-4500-b200-e790e401e495
spec:
containers:
command:
- sh
- -c
- cp /ddi/profiles/run-dynamic.sh .; chmod +x run-dynamic.sh; ./run-dynamic.sh
  image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine:2.1.202
  imagePullPolicy: Always
  lifecycle:
  postStart:
  exec:
  command:
  - sh
  - -c
  - cp /ddi/profiles/poststart-dynamic.sh .; chmod +x poststart-dynamic.sh;
  ./poststart-dynamic.sh
  preStop:
  exec:
  command:
  - sh
  - -c
  - cp /ddi/profiles/prestop-dynamic.sh .; chmod +x prestop-dynamic.sh; ./prestop-dynamic.sh
  name: ddi-engine
  resources:
  limits:
  hugepages-1Gi: 320Gi
  memory: 183Gi
  requests:
  hugepages-1Gi: 320Gi
  memory: 183Gi
  securityContext:
  capabilities:
  add:
  - SYS_ADMIN
    privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
- mountPath: /tmp
  name: path-sockets
- mountPath: /ddi/statsfiles
  name: path-statsfiles
- mountPath: /var/log/ddi_engine
  name: path-logfiles
- mountPath: /dev
  name: path-dev
- mountPath: /sys
  name: path-sys
- mountPath: /hugepages
  name: hugep
- mountPath: /ddi/oamfiles
  name: path-engine-oam
- mountPath: /ddi/pcapfiles
  name: path-sftp-pcapfiles
- mountPath: /ddi/ce_upgrade_binaries
  name: path-ce-binaries-upgrade
- mountPath: /ddi/profiles
  name: engine-profiles-conf
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: default-token-527nt
  readOnly: true
args:
- /usr/deploy/init_sftp sftp_usr:DDIalfa2019:1001; chown -R sftp_usr /home/sftp_usr;
  java -Dlogback.configurationFile=logback.xml -Dconfig.file=application.conf
  -Dddi.fake_LBS_assignments=false -Dddi.web.rest.url=https://ddi-central-web-app:9443/sharedApi/traffic-policies/site-box-policy
  -Dddi.web.rest.ssl=true -Dddi.control.traffic.pcap.retention.scheduler.interval.minutes=2
  -Dddi.control.traffic.pcap.scheduler.interval.minutes=2 -Dddi.control.traffic.pcap.retention.total.size.mb=500000
  -Dddi.control.traffic.pcap.retention.file.minutes=4320 -Dddi.sftp.pool.max.connections.per.host=10
  -Dddi.sftp.pool.use.scp=false -Dddi.control.traffic.pcap.folder=/home/sftp_usr/pcaps
  -Dddi.control.submap.collector.output.folder=/home/sftp_usr/histsubmap -Dddi.local.timezone=Asia/Beirut
  -Dddi.control.ce.binaries.folder.upgrade=/home/sftp_usr/ce/upgrade -DrootLogLevel=INFO
  -jar ddi-engine-control-sidecar.jar
  env:
- name: NODE_HOSTNAME
  value: justice-box2
- name: SITE_NAME
  value: justice
- name: RADIUS_SECRET
  value: alfasecret
  image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine-sidecar:2.1.202
  imagePullPolicy: Always
  name: ddi-engine-sidecar
  ports:
- containerPort: 5001
  hostPort: 5001
  name: grpc
  protocol: TCP
- containerPort: 8813
  hostPort: 8813
  name: radius
  protocol: UDP
- containerPort: 8022
  hostPort: 8022
  name: sftp
  protocol: TCP
  resources:
  limits:
  cpu: “7”
  requests:
  cpu: “5”
  terminationMessagePath: /dev/termination-log
  terminationMessagePolicy: File
  volumeMounts:
- mountPath: /tmp
  name: path-sockets
- mountPath: /var/log/ddi_engine
  name: path-logfiles
- mountPath: /var/log/ddi_engine/metrics
  name: path-sidecar-oam
- mountPath: /home/sftp_usr/statsfiles
  name: path-sftp-statsfiles
- mountPath: /home/sftp_usr/histsubmap
  name: path-sftp-histsubmap
- mountPath: /home/sftp_usr/pcaps
  name: path-sftp-pcapfiles
- mountPath: /etc/ddi-ssl
  name: tls-mnt
  readOnly: true
- mountPath: /home/sftp_usr/ce/upgrade
  name: path-ce-binaries-upgrade
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: default-token-527nt
  readOnly: true
args:
- -c
- printf ‘\n[ddiaof]\n path = /ddiaof\n read only = no\n’ >> /etc/rsyncd.conf;
  rsync --daemon --no-detach --port=1873 --dparam=uid=root --dparam=gid=root
  command:
- /bin/sh
  image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-rsync:2.1.202
  imagePullPolicy: Always
  name: ddi-engine-rsyncdaemonredis
  ports:
- containerPort: 1873
  hostPort: 1873
  name: rsync
  protocol: TCP
  resources: {}
  terminationMessagePath: /dev/termination-log
  terminationMessagePolicy: File
  volumeMounts:
- mountPath: /ddiaof
  name: path-redis-aof
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: default-token-527nt
  readOnly: true
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostNetwork: true
  nodeName: ddi-jus-2.corp.alfamobile.com.lb
  nodeSelector:
  kubernetes.io/hostname: ddi-jus-2.corp.alfamobile.com.lb
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 10
  tolerations:
effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
configMap:
defaultMode: 420
name: engine-justice-box2-deploy-config
name: engine-profiles-conf
emptyDir: {}
name: path-sockets
hostPath:
path: /ddi_data/statsfiles
type: DirectoryOrCreate
name: path-statsfiles
hostPath:
path: /dev
type: Directory
name: path-dev
hostPath:
path: /sys
type: Directory
name: path-sys
emptyDir:
medium: HugePages
name: hugep
hostPath:
path: /ddi_data/redisaof
type: DirectoryOrCreate
name: path-redis-aof
hostPath:
path: /ddi_data/log
type: DirectoryOrCreate
name: path-logfiles
hostPath:
path: /ddi_data/metrics/engine
type: DirectoryOrCreate
name: path-engine-oam
hostPath:
path: /ddi_data/metrics/sidecar
type: DirectoryOrCreate
name: path-sidecar-oam
hostPath:
path: /ddi_data/statsfiles
type: DirectoryOrCreate
name: path-sftp-statsfiles
hostPath:
path: /ddi_data/histsubmap
type: DirectoryOrCreate
name: path-sftp-histsubmap
hostPath:
path: /ddi_data/pcaps
type: DirectoryOrCreate
name: path-sftp-pcapfiles
name: tls-mnt
secret:
defaultMode: 420
secretName: ddi-central-web-app
hostPath:
path: /ddi_data/ceupgrade
type: DirectoryOrCreate
name: path-ce-binaries-upgrade
name: default-token-527nt
secret:
defaultMode: 420
secretName: default-token-527nt
status:
conditions:
lastProbeTime: null
lastTransitionTime: “2021-03-31T06:55:27Z”
status: “True”
type: Initialized
lastProbeTime: null
lastTransitionTime: “2021-04-10T09:09:03Z”
status: “False”
type: Ready
lastProbeTime: null
lastTransitionTime: “2021-03-31T06:55:46Z”
status: “True”
type: ContainersReady
lastProbeTime: null
lastTransitionTime: “2021-03-31T07:00:42Z”
status: “True”
type: PodScheduled
containerStatuses:
containerID: docker://7a5760de854307fd17ec2df3a60a25abd7b9d8a70d7525b5782ff1a88654bf9a
image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine:2.1.202
imageID: docker-pullable://ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine@sha256:7f8a93a1098c558d005a186ef73d5e63ff54a94854d01a08c1c2df67c30346e1
lastState: {}
name: ddi-engine
ready: true
restartCount: 0
state:
running:
startedAt: “2021-03-31T06:55:42Z”
containerID: docker://5c0a1063e69e614ad20cc9916931beef8d33ba829b1c5e938e9bf31c66d6dcfc
image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-rsync:2.1.115
imageID: docker-pullable://ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-rsync@sha256:1dcbcf998b164408a03845e93a2697dc8d4f9b405489c11681f1d9ac1f7a2f8b
lastState: {}
name: ddi-engine-rsyncdaemonredis
ready: true
restartCount: 0
state:
running:
startedAt: “2021-03-31T06:55:46Z”
containerID: docker://866adaeebfac271486baad0d7de3e319889313aed14fbd43cd3e8f63cfa6ee24
image: ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine-sidecar:2.1.202
imageID: docker-pullable://ddi-lib-3.corp.alfamobile.com.lb:5000/ddi-engine-sidecar@sha256:3864a727e43fafbe8965cd52c9cdfe5c81e5aa48554ceff90f2083688f84ea59
lastState: {}
name: ddi-engine-sidecar
ready: true
restartCount: 0
state:
running:
startedAt: “2021-03-31T06:55:45Z”
hostIP: 192.168.233.22
phase: Running
podIP: 192.168.233.22
qosClass: Burstable
startTime: “2021-03-31T06:55:27Z”

Topic		Replies	Views
Kube-system pods stuck on ContainerCreating General Discussions	0	3254	August 28, 2020
Master Node NotReady General Discussions	1	2709	February 18, 2024
The connection to the server 192.168.0.127:6443 was refused - did you specify the right host or port? General Discussions	1	1802	September 28, 2023
Konnectivity Agent failure scenario in node General Discussions	0	1075	December 20, 2022
Nodes are un able to restart General Discussions	1	174	March 12, 2024

Cattle-node-agent and Canal system pods stuck in Updating status

Cluster information:

Related topics