Hi, I am currently running mk8s in a 12node HA Cluster (4cpu, 8gig ram per node) as our company development cluster. I was wondering how mk8s in such a scenario is handling the snap auto update? Our software utilizes nearly all of the clusters resources so we can only spare 1 max 2 nodes to go down during an update otherwise we would run out of memory… How is snap’ mk8s update able to know which nodes it can take down for an update and on which nodes it has to wait for others to be back? on the last update our whole cluster crashed an ran out of ressources, because services went crazy during reschedule and filled up the /var/log volume with crashlogs restarting several thausand times. Is snap respecting the cluster state on update or just taking everything/node down in parallel, updates the snaps and then tries to startup again? this would be a terrible behaviour imo
Since you have such requirements for the upgrade process you need to take over when the node updates will take place and what will you be updating to.
First, you need to make sure your nodes follow the same channel and probably that channel should not be the
latest/stable. This way you will get patch releases. The latest/stable track will also upgrade to the next minor release ie from 1.20 to 1.21 what that comes out. Read more on this at .
Second you should schedule when each node upgrades so you do not get more than one node upgrading at the same time. See  on how to do this.
Third you may want to look at the snap-proxy  so as to have full control on what updates land and even block the ones that break your cluster.