Spark with Kubernetes connecting to pod id, not address

pferrel · February 13, 2019, 1:53am

We have a k8s deployment on AWS (EKS) of several services including Apache Spark. All services seem to be operational. Our application connects to the Spark master to submit a job using the k8s DNS service for the cluster where the master is called spark-api so we use master=spark://spark-api:7077 and we use spark.submit.deployMode=cluster. We submit the job through the API not by the spark-submit script.

This will run the “driver” and all “executors” on the cluster and this part seems to work but there is a callback to the launching code in our app from some Spark process. For some reason it is trying to connect to harness-64d97d6d6-4r4d8, which is the pod ID, not the k8s cluster IP or DNS.

How could this pod ID be getting into the system? Spark somehow seems to think it is the address of the service that called it. Needless to say any connection to the k8s pod ID fails and so does the job.

Any idea how Spark could think the pod ID is an IP address or DNS name?

BTW if we run a small sample job with master=local all is well, but the same job executed with the above config tries to connect to the spurious pod ID.

Topic		Replies	Views
Deploy Spark into Kubernetes Cluster General Discussions	7	4721	December 26, 2024
Executor lost for unknown reasons error Spark 2.3 on kubernetes General Discussions	0	3669	July 31, 2018
Airflow + Kubernetes VS Airflow + Spark General Discussions	0	897	October 11, 2018
Minikube with apache-spark General Discussions	0	826	May 18, 2020
Problem accessing to a pod General Discussions service	1	790	May 27, 2021

Spark with Kubernetes connecting to pod id, not address

Related topics