We have two clusters per supported geo (for DR purposes), and all eng teams are deploying to the same clusters with different namespaces.
Our SRE team is responsible for provisioning, configuring, monitoring and upgrading the clusters. They also responsible for the layers on top of K8s such as Prometheus, Log collection, Guard, PSPs, RBAC, etc.
That said, we still keep the dev clusters separate, since it allows us to operate it differently in terms of:
- Response to crisis / SLA - if dev cluster is down it’s not as urgent as production clusters
- Access control and security - Isolated env, JIT, etc.
- Stability - dev clusters tend to contain a lot of garbage and legacy stuff
- Billing - it’s easier to divide the two
- Upgrades - we can test K8s upgrades on a different environment than Prod - you cannot upgrade a specific namespace in K8s!
Finally, if you are going to have several clusters for production, in our case we have 10 production clusters, then from COGs perspective, it doesn’t really matter if you have 10 or 11 clusters, I can understand why it’s more appealing when you only have one cluster.