When you have a production cluster, with new releases coming out very often, upgrading the clusters could be a challenge, because you already have lots of applications running, and you probably have some customized components installed.
I can see there are 2 approaches:
Keeping up with 2 versions behind the latest releases, but have the same frequency. For example, I start with 1.7 when 1.9 is released, go up to 1.8 when 1.10 is released, and go up to 1.9 when 1.11 is released, etc. The advantage of this approach is each upgrade is one minor version up, usually well supported, thus realtively easy and reliable. However, this means upgrading the clusters every 2-3 months, or even more frequently, posting risks to SLA’s, and if you have multiple clusters in multiple regions, you’re pretty much constantly upgrading your clusters.
Skip few releases, perform long jumps very half to 1 year. For example, I start with 1.7, and now upgrade to 1.10 directly. The advantage of this approach is you have more time to plan upgrades, and have less disruptions to services. But with the long gap, lot of things could have changed, it’s probably not safe to upgrade in-place, and you’ll have to build a new cluster and migrate services over, which could be a big task that involves lot of application testing.
Anyone out there have experiences or have explored the different methods on this?
Thank you in advance for any input!