Intermittent dropped connections to kubernetes API from within pod (Dagster)

Cluster information:

Kubernetes version: 1.30.6
Cloud being used: azure
Host OS: linux (amd64)

I am running a kubernetes deployment of Dagster. Dagster will periodically call the kubernetes API to create jobs from within a running pod, using the kubernetes python client library. The system is working correctly most of the time, but intermittently, a connection to the API will drop when calling batch_v1_api.create_namespaced_job, and a job will fail to launch:

urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

To make matters more interesting, we had this problem several months ago, it resolved on its own, and now has cropped back up in the last couple of weeks.

I am just looking for help determining what could be causing this problem and what steps I can take to debug it.

Thanks very much.

3 Likes

We’re facing the same issues here with Dagster.

Kubernetes version: 1.29.13 (I know)
Azure AKS

We have not been able to reproduce outside Dagster and we haven’t identified a cause for the sudden surge of “intermittent” errors. We looked into the apiserver logs and nothing over there.

Did you deploy with the helm chart or a custom deployment? What version of Dagster are you on?