Why should I separate control plane instances from etcd instances?

Ok, here’s something I am unable to clearly understand:

This topology decouples the control plane and etcd member. It therefore provides an HA setup where losing a control plane instance or an etcd member has less impact and does not affect the cluster redundancy as much as the stacked HA topology.
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

They say the etcd cluster can be either separate or stacked with the control plane machines, but separate would be preferable.

Why is this? If you loose either the etcd or the control plane, the cluster is (as I’ve understood it) unable to heal/make changes anyways, so why separate them? It just seems like a waste of resources to have separate machines for etcd/control plane.

In a stacked scenario, with the same number of nodes as in the separated one, we could have 6 control plane nodes and 6 etcd nodes (since they are stacked). Should this not be better (except the detail that control plane and etcd should be an uneven number)?

Going one step further; why don’t we just stack the control plane and etcd on top of every worker node as well? I can’t imagine the control plane and etcd are that intensive workloads, so stacking them on the workers should just ensure even better HA. This way all nodes are exactly the same, except for the master node (because of the master status), which should reduce the complexity of the system as well.

So the questions I hope to get answered are:

  1. What is the actual benefit of separating control plane instances from etcd instances?
  2. Can you provide an scenario where the separated topology would perform better than the stacked one?
  3. Why don’t we just stack the control plane and etcd on top of ALL nodes, making the system more homogenous?

Thanks for any answer. I am just getting into k8s and I’m loving it, but I am having trouble understanding the reason for separating these functions from each other. :slight_smile:

Hi hannesknutsson:

It is hard for me to provide any deal-breaker advice on all the questions you provide, but at least regarding etcd, in the documentation they answer the What is the maximum cluster size (for etcd):

Theoretically, there is no hard limit. However, an etcd cluster probably should have no more than seven nodes. Google Chubby lock service, similar to etcd and widely deployed within Google for many years, suggests running five nodes. A 5-member etcd cluster can tolerate two member failures, which is enough in most cases. Although larger clusters provide better fault tolerance, the write performance suffers because data must be replicated across more machines.

As Kubernetes supports clusters up to 5000 nodes Best Practices for Large Clusters, this would rule-out the option to set an “homogeneous” cluster (with all nodes having simultaneously control plane and worker roles, if every node hosts an etcd member).

It is possible to do so for small clusters (3 to 5 (or even 7) nodes) before the chattiness of the etcd quorum process becomes a problem.

Etcd clusters also need highly performant storage, so it may be better to provide specialised resources to the etcd nodes while the regular control plane nodes have less optimised (and probably cheaper) resources.

From the operations point of view, removing the etcd component from the control plane nodes will make them stateless (as far as I know), so they would be easier to maintain…

I hope it helps you understand a little bit better the need for separating the control plane and etcd “nodes” if you still thinking about it.

Best regards,

Xavi