Cluster information:
Kubernetes version: 1.11.5
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: Ubuntu 18.04.1 LTS
CNI and version: flannel v? might be 0.10.0
CRI and version: docker 18.09.2
Hello All,
I’ve built a bare metal K8S cluster on premise with three controller nodes (also running etcd) and five workers. Bear with me for asking novice questions, I’m relatively new to this.
Side note: I’ve reached out on the K8S Slack channel with this issue and have had no luck. Either I’m not using Slack well, or the application isn’t well suited to “forum” style discussions. I mostly see a lot of other asks for help scrolling by, and it seems like a roll of the dice whether anyone will see my post, and have knowledge of the issue, and take time to reply.
I have devs submitting Jobs to the cluster from an in-house workflow management stack, sometimes in the thousands, with each launching a Pod and a Container and most of them completing normally and getting cleaned up by the stack. But an increasing number of jobs now stick with the pod hung at “CreateContainerError.”
Specifically, the pod is complaining with:
state:
waiting:
message: ‘Error response from daemon: Conflict. The container name “jobname-podstring_namespace_k8sCreatedUID_0” is already in use by container “big_old_docker_UID”.
You have to remove (or rename) that container to be able to reuse that name.’
reason: CreateContainerError
I’ve found lots of references online describing similar situations where docker users needed to delete a or rename a container, or to restart docker or even reboot the node, but don’t know how that would apply in this situation: kubernetes is the one creating these containers so I have no control over the container name, and it doesn’t seem realistic to have to intervene on every worker node.
From kubectl get job -o yaml:
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: 2019-05-01T19:07:55Z
labels:
application_instance: delphi-evaluator-dcostanz1
queue_token: 428d08dc-58f3-40fb-ac23-2562ae8391ce
name: delphi-evaluator-dcostanz1-job-144438
namespace: xapps
resourceVersion: "43194423"
selfLink: /apis/batch/v1/namespaces/xapps/jobs/delphi-evaluator-dcostanz1-job-144438
uid: 6cfa2af4-6c44-11e9-b73b-e2d6d3513984
spec:
activeDeadlineSeconds: 3600
backoffLimit: 0
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 6cfa2af4-6c44-11e9-b73b-e2d6d3513984
template:
metadata:
creationTimestamp: null
labels:
application_instance: delphi-evaluator-dcostanz1
controller-uid: 6cfa2af4-6c44-11e9-b73b-e2d6d3513984
job-name: delphi-evaluator-dcostanz1-job-144438
queue_token: 428d08dc-58f3-40fb-ac23-2562ae8391ce
name: delphi-evaluator-dcostanz1-pod-144438
spec:
containers:
- args:
- -c
- ./run.sh
command:
- /bin/sh
image: dreg.scharp.org/xapps-delphi-transformation-type-configurable-r1
imagePullPolicy: Always
name: delphi-evaluator-dcostanz1-container-144438
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dataset-directory
name: dataset-directory
workingDir: /dataset-directory/
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: regcred
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsUser: 1058
terminationGracePeriodSeconds: 30
volumes:
- name: dataset-directory
nfs:
path: /scharp_delphi_evaluator/delphi_evaluator/dcostanz_1/428d08dc-58f3-40fb-ac23-2562ae8391ce
server: scharpdata3.pc.scharp.org
status:
conditions:
- lastProbeTime: 2019-05-01T19:12:08Z
lastTransitionTime: 2019-05-01T19:12:08Z
message: Job has reached the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
failed: 1
startTime: 2019-05-01T19:07:55Z
From kubectl get pod -o yaml:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2019-05-01T19:07:55Z
generateName: delphi-evaluator-dcostanz1-job-144438-
labels:
application_instance: delphi-evaluator-dcostanz1
controller-uid: 6cfa2af4-6c44-11e9-b73b-e2d6d3513984
job-name: delphi-evaluator-dcostanz1-job-144438
queue_token: 428d08dc-58f3-40fb-ac23-2562ae8391ce
name: delphi-evaluator-dcostanz1-job-144438-tmmrg
namespace: xapps
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: delphi-evaluator-dcostanz1-job-144438
uid: 6cfa2af4-6c44-11e9-b73b-e2d6d3513984
resourceVersion: "43194421"
selfLink: /api/v1/namespaces/xapps/pods/delphi-evaluator-dcostanz1-job-144438-tmmrg
uid: 6cfae749-6c44-11e9-ab21-96ca041346e4
spec:
containers:
- args:
- -c
- ./run.sh
command:
- /bin/sh
image: dreg.scharp.org/xapps-delphi-transformation-type-configurable-r1
imagePullPolicy: Always
name: delphi-evaluator-dcostanz1-container-144438
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dataset-directory
name: dataset-directory
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-8jnkb
readOnly: true
workingDir: /dataset-directory/
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: regcred
nodeName: kw-prod-e03
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsUser: 1058
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: dataset-directory
nfs:
path: /scharp_delphi_evaluator/delphi_evaluator/dcostanz_1/428d08dc-58f3-40fb-ac23-2562ae8391ce
server: scharpdata3.pc.scharp.org
- name: default-token-8jnkb
secret:
defaultMode: 420
secretName: default-token-8jnkb
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-05-01T19:07:59Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-05-01T19:07:59Z
message: 'containers with unready status: [delphi-evaluator-dcostanz1-container-144438]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [delphi-evaluator-dcostanz1-container-144438]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-05-01T19:07:56Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://371f7df847f3d2cd727399011a4cd9f90a474301c099979d63527dcb7d9eb52e
image: dreg.scharp.org/xapps-delphi-transformation-type-configurable-r1:latest
imageID: docker-pullable://dreg.scharp.org/xapps-delphi-transformation-type-configurable-r1@sha256:f521e98c550560f496e0cc21f5f7af3f5a62fc61dc9e29b33d750a29664abc24
lastState:
terminated:
containerID: docker://371f7df847f3d2cd727399011a4cd9f90a474301c099979d63527dcb7d9eb52e
exitCode: 0
finishedAt: null
startedAt: null
name: delphi-evaluator-dcostanz1-container-144438
ready: false
restartCount: 0
state:
waiting:
message: 'Error response from daemon: Conflict. The container name "/k8s_delphi-evaluator-dcostanz1-container-144438_delphi-evaluator-dcostanz1-job-144438-tmmrg_xapps_6cfae749-6c44-11e9-ab21-96ca041346e4_0"
is already in use by container "371f7df847f3d2cd727399011a4cd9f90a474301c099979d63527dcb7d9eb52e".
You have to remove (or rename) that container to be able to reuse that name.'
reason: CreateContainerError
hostIP: 140.107.117.52
phase: Failed
podIP: 10.244.5.228
qosClass: BestEffort
startTime: 2019-05-01T19:07:59Z
Thank you in advance for any help!