K8s pod unable to connect to another pod

Cluster information:

Kubernetes version: 1.14.9-eks-c0eccc
Cloud being used: aws
Installation method: cloudformation
Host OS: 4.14.138-114.102.amzn2.x86_64
CNI and version:
CRI and version:

I am using the airflow helm chart to run airflow on k8s. However, the web pod can’t seem to connect to postgresql. The odd thing is, that other pods can.

I’ve cobbled together small scripts to check, and this is what I found:

[root@ip-10-56-173-248 bin]# cat checkpostgres.sh
docker exec -u root $1 /bin/nc -zvw2 airflow-postgresql 5432
[root@ip-10-56-173-248 bin]# docker ps --format '{{.Names}}\t{{.ID}}'|grep k8s_airflow|grep default|awk '{printf("%s ",$1); system("checkpostgres.sh " $2)}'
k8s_airflow-web_airflow-web-57c6dcd77b-dvjmv_default_67d74586-284b-11ea-8021-0249931777ef_74 airflow-postgresql.default.svc.cluster.local [] 5432 (postgresql) : Connection timed out
k8s_airflow-worker_airflow-worker-0_default_67e1703a-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [] 5432 (postgresql) open
k8s_airflow-scheduler_airflow-scheduler-5d9b688ccf-zdjdl_default_67d3fab4-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [] 5432 (postgresql) open
k8s_airflow-postgresql_airflow-postgresql-76c954bb7f-gwq68_default_67d1cf3d-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [] 5432 (postgresql) open
k8s_airflow-redis_airflow-redis-master-0_default_67d9aa36-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [] 5432 (?) open
k8s_airflow-flower_airflow-flower-79c999764d-d4q58_default_67d267e2-284b-11ea-8021-0249931777ef_0 airflow-postgresql.default.svc.cluster.local [] 5432 (postgresql) open

When I do a nslookup on the pod name, it seems to work fine:

# nslookup airflow-postgresql

Non-authoritative answer:
Name:   airflow-postgresql.default.svc.cluster.local

What would cause this behavior\what else could I check, to narrow this down?

Strange. You can capture and analyze network traffic on the host and within pods if it’s possible. You can also try connection in the opposite direction from the postgesql to the web container. You can also verify K8s Network Policies if you use them.

I don’t think I set anything like this up.

Also, I noticed a had a failing sidecar container. I’ve removed it, and it seems to be working now. I’m not sure how this could cause what I saw, but, it seems plausible?

What sidecar container did you have?

It was a container that syncs files from git.

I have no idea, this failing sync container shouldn’t give such effect…

So actually, the problem has manifested with some other pods on the cluster as well. I have checked and both pods cannot connect to the other one on various ports. They do seem to be resolving the correct IP address though.