Problem starting statefulset with activated readiness probe

Hi,

I am running an elasticsearch cluster of 3 master capable nodes in kubernetes.
At startup 3 nodes needs to come up, then they discover each other through headless service of the statefulset (at least this is my understanding).

I am introducing readiness probes to reach following target:

  • do not forward user traffic to the pod, if the pod is overloaded, crashed or in startup procedure
  • my definition of readiness:
    • successful if rest api is responding correctly (health page)
    • unsuccessful if rest api call is not responding (in a given time)

With activated readinessProbes kubernetes is only starting the first pod of the statefulset if do a full cluster (re)start of elasticsearch.

That is documented here:
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

My hen - egg - issue is:
The cluster can only initialize if all 3 master nodes are available to elect the master. Since I use authentication / security module in elasticsearch I need to authenticate for querying the cluster’s status.
But Authentication is failing until the cluster is formed / master is elected.

Also I noticed that my first node is not available via dns . while statefulset has same name as the headless service.

What paths can I follow to run out of this dilemma?

  • Are some annotations / taints / whatever to tell kubernetes to start all pods of statefulset, regardless of the readiness?
  • Is there a way for the headless service to ignore the readiness?
  • currently I am using the headless service of elasticsearch for kibana (client). Is it possible to define a headless service and a “normal” service for the same pods and ports?

Idea behind it:

  • all pods are starting
  • application cluster discovery is done via headless service which is ignoring the readiness state.
  • kibana (client) will access elasticsearch via a different service which is respecting the readiness.

Any ideas to implement my idea or any other idea which can solve the issue are welcome.
But I have the feeling that just taking the port availability for readiness into account is too little.

Thanks, Andreas

Have you thought about setting up file-based authentication for ElasticSearch? That should be able to work without the cluster being up.