I have problem with understanding how it can be achieved, so little background first:
I have application requirements such us:
- microservice is running in two different modes (achieved by injecting different configmaps for same build image), lest call them API(many instances) and WORKER(only one instance can be active in runtime)
- because of problems with in-application locking mechanism, worker cannot guarantee that only 1 instance is actively processing some stuff and ordering is critical, so we cannot use rolling update strategy for this mode.
- additionally we need to spawn job before each deployment that updates some persistent storage
- if such job fails, old worker instance should stay active, if jobs succeeds recreate should happen
so what we came up with is:
- for WORKER role we use recreate strategy (small downtime - 10s - is acceptable for this scenario)
- for API role we stay with rollingUpdate
- we added IntiContainers definition with condition for the Job to be finished so we can be sure that new code will not be rolled-out untill job is finished.
For happy path it works good. When job is taking longer - problem appears because we extend downtime significantly.
Another more critical problem is that because of recreate strategy, worker goes down first, and it is not even waiting for job result. If job fails we are ending up with WORKER down and someone needs manually get back to appropriate state what is unacceptable.
We think that solution can be like using: https://github.com/groundnuty/k8s-wait-for/ but we are not sure if that is correct path in this case or we are missing some other options.
Has anyone suggestions for us?
Cloud being used: (aws)
Installation method: custom