Kuberspray: install hangs on kube-proxy restart task

tomvolek · March 3, 2019, 8:37am

Trying to install kubernetes using kuberspray on a small three node lab environment. However during running of the playbook, the playbook hangs on a task which tried to restart kube-proxy pods. Anybody know what is the reason ?

OS= Ubuntu 18.04,
HW= 64G RAM, 6 core HP.

kubectl version
Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.3”, GitCommit:“721bfa751924da8d1680787490c54b9179b1fed0”, GitTreeState:“clean”, BuildDate:“2019-02-01T20:00:57Z”, GoVersion:“go1.11.5”, Compiler:“gc”, Platform:“linux/amd64”}

Server Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.3”, GitCommit:“721bfa751924da8d1680787490c54b9179b1fed0”, GitTreeState:“clean”, BuildDate:“2019-02-01T20:00:57Z”, GoVersion:“go1.11.5”, Compiler:“gc”, Platform:“linux/amd64”}

ansible-playbook -i hosts.ini --become --become-user=root cluster.yml -b -vvv

After running playbook lots of activities takes place but it stops at a Task

TASK [kubernetes/kubeadm : Restart all kube-proxy pods to ensure that they load the new configmap] **********************************************************************

task path: /home/tom/Services/kubespray/roles/kubernetes/kubeadm/tasks/main.yml:135

Sunday 03 March 2019 01:33:42 +0000 (0:00:02.043) 0:12:02.490 **********

Using module file /usr/local/lib/python3.6/dist-packages/ansible/modules/commands/command.py

<10.0.1.11> ESTABLISH SSH CONNECTION FOR USER: tom

<10.0.1.11> SSH: EXEC sshpass -d12 ssh -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o User=tom -o ConnectTimeout=10 -o ControlPath=/home/tom/.ansible/cp/4c47906b36 10.0.1.11 ‘/bin/sh -c ‘"’"‘sudo -H -S -p “[sudo via ansible, key=wpxabqqcijihhvhkhegxzjygqeebatpe] password: " -u root /bin/sh -c '”’"’"’"’"’"’"’"‘echo BECOME-SUCCESS-wpxabqqcijihhvhkhegxzjygqeebatpe; /usr/bin/python’"’"’"’"’"’"’"’"’ && sleep 0’"’"’’

Escalation succeeded

kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7dbc74fcf-c766q 0/1 Pending 0 86m
kube-system coredns-7dbc74fcf-xgj77 0/1 Pending 0 86m
kube-system kube-apiserver-server1 1/1 Running 5 86m
kube-system kube-controller-manager-server1 1/1 Running 5 86m
kube-system kube-proxy-bhz5c 1/1 Running 0 85m
kube-system kube-proxy-q7gmz 0/1 Terminating 0 86m
kube-system kube-scheduler-server1 1/1 Running 5 86m
kube-system nginx-proxy-server2 1/1 Running 0 86m

Docker instances :
root@server1:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4ddef9a27785 gcr.io/google_containers/pause-amd64:3.1 “/pause” About an hour ago Up About an hour k8s_POD_kube-proxy-q7gmz_kube-system_3c23dd55-3d54-11e9-87a8-18a90554101c_0
103749aa9db8 3a6f709e97a0 “kube-scheduler --ad…” About an hour ago Up About an hour k8s_kube-scheduler_kube-scheduler-server1_kube-system_0303621df0e68163d195543d161f2308_5
e5c8f60b250a 0482f6400933 “kube-controller-man…” About an hour ago Up About an hour k8s_kube-controller-manager_kube-controller-manager-server1_kube-system_41b7130ce6b5a26df5697ed775e36b9f_5
33761bb92463 fe242e556a99 “kube-apiserver --al…” About an hour ago Up About an hour k8s_kube-apiserver_kube-apiserver-server1_kube-system_ad9c231985127c13fd9fbc178e652357_5
aa50e3a4de73 gcr.io/google_containers/pause-amd64:3.1 “/pause” About an hour ago Up About an hour k8s_POD_kube-scheduler-server1_kube-system_0303621df0e68163d195543d161f2308_5
4399df06b627 gcr.io/google_containers/pause-amd64:3.1 “/pause” About an hour ago Up About an hour k8s_POD_kube-controller-manager-server1_kube-system_41b7130ce6b5a26df5697ed775e36b9f_5
c9ef8fa4db7d gcr.io/google_containers/pause-amd64:3.1 “/pause” About an hour ago Up About an hour k8s_POD_kube-apiserver-server1_kube-system_ad9c231985127c13fd9fbc178e652357_5
486a36c74895 quay.io/coreos/etcd:v3.2.24 “/usr/local/bin/etcd” About an hour ago Up About an hour etcd1

llarsson · March 4, 2019, 7:26am

That line looks very strange. Ask kubectl -n kube-system describe pod kube-proxy-q7gmz what is going on, maybe that can help you figure out what the problem is. A Pod should not need 86+ minutes to terminate.

tomvolek · March 7, 2019, 7:23pm

For anybody seeing this issue: Please have a look at mattymo comment in this thread: https://github.com/kubernetes-sigs/kubespray/issues/4314
Pods on down/unresponsive nodes can’t be deleted without
–force --grace-period=0.

Will test the fix and comment back here.

tomvolek · March 9, 2019, 7:51pm

This problem was solved by adding –force --grace-period=0 in one of the Ansible tasks as it is described in : https://github.com/kubernetes-sigs/kubespray/issues/4314

Topic		Replies	Views
Install kubespray fail initializing first master , provide openstack with http_proxy General Discussions	0	1616	July 8, 2019
Kube-proxy pod not starting v1.21.3 General Discussions	3	2749	August 4, 2021
Kubespray installs the cluster, but when I login to the master it is not functional General Discussions	0	833	March 7, 2019
Hi everyone i am installing kubernetes on rhel 9 after kubeadm init command if we gave get pods command kube proxy is getting crashloopback after evry min pods restarting and getting this The connection to the server 10.x.x.x:6443 was refused - did yo General Discussions	0	334	August 23, 2023
Required to restart kube-proxy on node reboot General Discussions network	0	3008	November 16, 2021

Kuberspray: install hangs on kube-proxy restart task

Related topics