I have an application which runs on 1-2 nodes with multiple pods and an ingress controller. I need to deploy this application to multiple (say, 10+) locations over the world. Should I create a k8s cluster on every location or just use a large cluster? Also is there any way for an ingress controller to connect to only pods that are located within the same DC?
I would suggest avoiding one large stretched cluster. There are all sorts of issues that can crop up mostly around resource locality.
- Different networks at each site may require their own pod/service cidr range to not clash.
- You want in cluster services to route to the closest endpoint and not one that could potentially be in another DC.
- Issues can crop up if connectivity is lost to etcd or high latency is introduced.
- An outage at one location could bring down multiple locations (e.g. etcd goes down or a failed upgrade at 1 site)
There are a slew of other issues that can pop up.
What environment are you running in? That makes a big difference in the network capabilities.
I agree that stretching a cluster across DCs is a bad idea, unless ping time is single-digit ms (and even then, it introduces caveats).
Every node is a KVM virtual machine and I can have full control over the hypervisor and the network config. All nodes have public IPs and no firewall and it is possible to set up VPNs across the sites if needed (though it will be a pain to do this)
What UX do you want to expose to end-users? For example:
Each site gets a public IP and DNS name. “aaa.example.com”,
“bbb.example.com”, “ccc.example.com”. Traffic to “aaa” always goes to
the US installation, and traffic to “bbb” always goes to EU. End
users have to be region-aware. EU users that access “aaa” will have
The whole app gets a single DNS name, but each site gets an IP.
Users access “example.com” and access the “best” site for them.
Geo-aware DNS uses client IP to guess at which is “best”.
The whole app gets a single IP. Routing infrastructure figures out
which is the “best” site for a given client. DNS always returns 1 IP.
Your network infrastructure determines whether (3) is viable or not.
It is an anycast DNS appliance, so it is by nature (3).
So, can you configure the ingress in each site to only route to bwckends in that site? Maybe I am missing something?
If the ingress is deployed manually it will be configured to route only to backends within the site; but for the k8s automatic ingress I’m not clear if it has the ability to differentiate backends of different sites.
I think I should set up one k8s cluster on every node and use some multi-cluster config sync method (like Rancher’s multi-cluster apps) to provision them.
Maybe I misunderstand completely. Why would an ingress impl know anything about other sites unless you tell it?
If you have one cluster per site, and one ingress impl per site, won’t that automatically do what you want?
Yes you are right.
This sounds like a case where you should avoid k8s almost entirely, and just pre-plan your deployment needs.
How many different apps are you planning to put in each edge site? If the number is small (1-4) then k8s will mostly hurt you and add lots of overhead.
You can always revise the plan later
stated another way: Your ingress rules & health check results need to be communicated over BGP, not to your kube-proxy, so why bother having a kube-proxy?
I think currently we have more than 4 applications to deploy to each site (iterator, client-facing caching resolver, DoT, DoH, BGP speaker, plus monitoring for everything), and there is a need for non-disruptive rolling updates being handled correctly, plus there is only me working on the deployment and I’m tired of running scripts on every site. K8s+Helm is kind of a deployment planning tool for me now, and I think it does a good job keep every site identical.
This do come with a lot overhead but when there is abstraction there is overhead. I’m still looking for better ops solutions but k8s somehow works for my use cases for now.
In that case, a stretched cluster may be your best option? Or you could try out kubefed v2 and have 10 different single-master clusters.
I’m not very familiar with the solution space here, you might have to help build it!