Cluster information:
Kubernetes version: 1.30.6
Cloud being used: azure
Host OS: linux (amd64)
I am running a kubernetes deployment of Dagster. Dagster will periodically call the kubernetes API to create jobs from within a running pod, using the kubernetes python client library. The system is working correctly most of the time, but intermittently, a connection to the API will drop when calling batch_v1_api.create_namespaced_job
, and a job will fail to launch:
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
To make matters more interesting, we had this problem several months ago, it resolved on its own, and now has cropped back up in the last couple of weeks.
I am just looking for help determining what could be causing this problem and what steps I can take to debug it.
Thanks very much.