We have a k8s deployment on AWS (EKS) of several services including Apache Spark. All services seem to be operational. Our application connects to the Spark master to submit a job using the k8s DNS service for the cluster where the master is called
spark-api so we use
master=spark://spark-api:7077 and we use
spark.submit.deployMode=cluster. We submit the job through the API not by the spark-submit script.
This will run the “driver” and all “executors” on the cluster and this part seems to work but there is a callback to the launching code in our app from some Spark process. For some reason it is trying to connect to
harness-64d97d6d6-4r4d8, which is the pod ID, not the k8s cluster IP or DNS.
How could this pod ID be getting into the system? Spark somehow seems to think it is the address of the service that called it. Needless to say any connection to the k8s pod ID fails and so does the job.
Any idea how Spark could think the pod ID is an IP address or DNS name?
BTW if we run a small sample job with
master=local all is well, but the same job executed with the above config tries to connect to the spurious pod ID.