How to shut down and restart Kubernetes clusters?


#1

I have used kuberspray to install Kubernetes on a three node lab infrastructure. I need to power off these hosts at night.

I have searched and read docs and I cant figure out whats the proper way of shutting down the clusters on all nodes and then restarting them the next day. Can someone point me to the proper process ?
It would be nice if we had an Ansible playbook to shut things down. (i do realize its my responsibility to shut down whats running on the containers themself )

Thanks


#2

I have never tried this but If I am about to do this I would try it in this order:

  1. take a backup of the K8s in case things go south when you try to bring the cluster online (I will use Heptio-Velero for that.)
  2. on the master node stop the following services:
  • kupe-apiserver
  • kube-scheduler
  • kube-controllers
  1. on the nodes stop the following services:
  • kubelet
  • kube-proxy

good luck :see_no_evil:


#3

lalem :

  1. which directories need to get backed up ?
  2. How would you stop backplane processes like kupe-apiserver, controller etc? I mean do you just issue a kill PID , or there is a nicer way to stop them ?

Thanks


#4

I have the same question. How that even possible that such thing like stop/start platform not described as first thing. Why people should looking answers on such simple question on forums, not acceptable.


#5
  1. I meant you backup your entire cluster not a specific directories as a first choice. I mentioned you can use Velero for that, it is a great open source utility that can backup your K8s state incase of disaster, you can check it out yourself https://github.com/heptio/velero/ . And if you do not want to use Velero to backup your cluster state then backup your etcd https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md and backup the root certificate files as well.
  2. Your master components are running inside pods so you will have to stop them using “docker stop”. Then on the nodes just stop the services using “systemctl stop”.

Again I have not tried the steps above before, but here is what I did to terminate the cluster at the end of the day and bring it backup when I need it, and it works all the time.

  1. automate the shutdown to take a Velero backup of the entire cluster.
  2. destroy the cluster fully by reseting the play book.

Then, to bring it backup again automate:

  1. bring up the cluster using the same play book.
  2. restore from the taken backup prior to shutdown.

#6

We have had hard DC crashes and when bringing things back up, we just made sure the control plane was up before our nodes and things have come back fine.

We have also powered off our cluster(s) before and for the most part did this:

  1. Scale all applications down to 0 excluding cluster services e.g. CNI DaemonSets, DNS etc.
  2. Drain all nodes excluding the control plane.
  3. Shut down nodes.
  4. Shut down all components but kube-apiserver and etcd. – If using kubelet to manage components (kubeadm), just move the manifests out of the /etc/kubernetes/manifests dir and kubelet will stop the containers gracefully.
  5. shut down kube-apiserver
  6. Stop kubelet on control plane, just ensure the etcd leader is the last one to be stopped.
  7. Backup dirs/etcd if needed.

Bringing it backup is essentially the opposite order.


#7

Thanks very much for all responses. I am just surprised there is no shut down playbook, etc. For sure its different pattern for each person/shop but there are common tasks that can be automated via a playbook which we can run after containers are drained.

I try to write something and contribute back to the repo to get some thoughts.

Thanks all