Akka cluster on k8s. Scheduling of downed unreachable nodes

EoinSoftDev · November 14, 2019, 10:53am

Running an Akka cluster on k8s and it is using a downing strategy (let’s say Autodowning), so in the case where a node goes unreachable the container which went unreachable exits. The problem is that this node went unreachable because of a network issue/ issue with the platform provided by k8s and as such the entire pod should be restarted and scheduled onto a new healthy k8s node. Because scheduling can take some time we only want to reschedule the container onto a new pod on a new node if unreachability is the cause of the failure. Is there any way to propagate failure messages to the parent in k8s like use an exit code to make the decision of when to restart the container and when to delete the pod.

Cluster information:

Kubernetes version: -
Cloud being used: AWS
Installation method:
Host OS: alpine linux
CNI and version: -
CRI and version: -

acim · November 15, 2019, 12:08pm

You can write controller and watch events on the nodes and then react when a node is declared down.

Topic		Replies	Views
Node down - pods shown still as Running for hours, others stuck in Terminating General Discussions	5	8974	August 4, 2022
Pods show running.... but node was shut down 10 minutes ago General Discussions	3	890	January 17, 2020
Traffic to a Pod located in a Dead Node General Discussions	2	1761	August 23, 2019
Why "kubectl get pods" show pod still running while its node was poweroff? General Discussions	3	3769	March 20, 2021
Scheduling pod on another cluster General Discussions	0	55	August 7, 2024

Akka cluster on k8s. Scheduling of downed unreachable nodes

Cluster information:

Related topics