Hi, I am currently running mk8s in a 12node HA Cluster (4cpu, 8gig ram per node) as our company development cluster. I was wondering how mk8s in such a scenario is handling the snap auto update? Our software utilizes nearly all of the clusters resources so we can only spare 1 max 2 nodes to go down during an update otherwise we would run out of memory… How is snap’ mk8s update able to know which nodes it can take down for an update and on which nodes it has to wait for others to be back? on the last update our whole cluster crashed an ran out of ressources, because services went crazy during reschedule and filled up the /var/log volume with crashlogs restarting several thausand times. Is snap respecting the cluster state on update or just taking everything/node down in parallel, updates the snaps and then tries to startup again? this would be a terrible behaviour imo
Since you have such requirements for the upgrade process you need to take over when the node updates will take place and what will you be updating to.
First, you need to make sure your nodes follow the same channel and probably that channel should not be the latest/stable
. This way you will get patch releases. The latest/stable track will also upgrade to the next minor release ie from 1.20 to 1.21 what that comes out. Read more on this at [1].
Second you should schedule when each node upgrades so you do not get more than one node upgrading at the same time. See [2] on how to do this.
Third you may want to look at the snap-proxy [3] so as to have full control on what updates land and even block the ones that break your cluster.
[1] MicroK8s - Selecting a snap channel
[2] Managing updates | Snapcraft documentation
[3] Introduction | Snap Store Proxy documentation
Hi @kjackal ,
another problem with autoupdate of mk8s nodes is that they just do not get drained properly before the update, as required by kubernetes!? so every update from snap is like an outage!? this is somehow stupid. And in addition defining/distributung autoupdate windows for a 12-20-50 node cluster is complete nonsense in my opinion. I don’t want to monitor a big cluster updating over days to see everything is going right. And in addition i cannot even stop it if something goes wrong, because there simply is no: don’t update. I really don’t get the stubbornes of snap not allowing a simple DO NOT AUTO UPDATE! (i wanna do this node by node by myself, as defined by kubernetes!) this makes this technology somehow useless for enterprise and professional use. everytime the autoupdate kicks in the cluster goes down, sometimes only partially, services become unavailable and sometimes irrecoverable.
I will try to look at the proxy now to block this stupid autoupdate, but this seems more like an expensive, complex, time-consuming workaround for a broken technology. I really like Microk8s, but snap breaks it for everything that is not a local development installtion where availability and stability is not required!