Cannot access pod services from another pod

I can´t access the containers services from another container using services ports (I tried with ClusterIP, NodePort).

The service is ok when I access it from a node in my network using the NodePort service.

[ root@curl-5858f4ff79-s86n7:/ ]$ nslookup example-svc
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: example-svc
Address 1: 10.103.131.13 example-svc.svc.cluster.local
[ root@curl-5858f4ff79-s86n7:/ ]$ telnet example-svc 5672
Connection closed by foreign host

kubectl get svc
example-svc ClusterIP 10.103.131.13 25672/TCP,4369/TCP,5671/TCP,5672/TCP,15671/TCP,15672/TCP
examplenp-svc NodePort 10.99.185.47 25672:31216/TCP,4369:32531/TCP,5671:31512/TCP,5672:30747/TCP,15671:31929/TCP,15672:32183/TCP

telnet from a node out of the cluster:

[user@mysqhot]$ telnet kubenode01 30747
Trying 192.168.0.101…
Connected to kubenode01.
Escape character is ‘^]’.

Cluster information:

Kubernetes version: v1.18.5
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: Centos 7
CNI and version: flannel:v0.11.0-amd64
CRI and version: docker-ce-19.03.4-3.el7.x86_64

Can your pods reach each other (e.g. ping, without the Service in the middle) ?

Yes.

Status: Running
IP: 10.244.4.33
IPs:
IP: 10.244.4.33

I used kubectl describe pod to get the IP of the pod and from another pod i executed ping with success.

image

Does that work when your pods are on different nodes?

Yes. Ping from the curl pod that is on kubenode01 to api-core that is on kubenode04 works fine!

image

Telnet to the service port doesn´t work even if the pods are in different nodes.

What is your service CIDR set to? This is fishy:

[ root@curl-5858f4ff79-s86n7:/ ]$ nslookup example-svc
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

10.96.0.10 is a reasonable looking service IP (usually .10 is the DNS IP, just by convention), which would suggest 10.96.0.0/16 or something smaller is your service CIDR.

But the reply:

Name: example-svc
Address 1: 10.103.131.13 example-svc.svc.cluster.local

That suggests that either:

  • The service is headless (no proxy in the way)
  • Your service CIDR is very large
  • Something else is wrong

Then this:

kubectl get svc
example-svc ClusterIP 10.103.131.13 25672/TCP,4369/TCP,5671/TCP,5672/TCP,15671/TCP,15672/TCP
examplenp-svc NodePort 10.99.185.47 25672:31216/TCP,4369:32531/TCP,5671:31512/TCP,5672:30747/TCP,15671:31929/TCP,15672:32183/TCP

…also suggests that the service CIDR is very very large (to cover both 10.99. and 10.103., it would have to be 10.64./10 or something?

That doesn’t explain the problem, but it’s hard to see past.

Are you using kube-proxy in ipvs or iptables mode?

You might also run through this:

https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/

Hey,

I´m using kube-proxy in iptables mode.

The CIDR is the kubernetes default, i didn´t change nothing related with CIDR.

For the flannel i used --pod-network-cidr=10.244.0.0/16 in the kubeadm init.

I have figured out that the pods can comunicate each others by the ClusterIP services ports and directly. Call the nodePorts service type doesn´t work.

I was using telnet to do the tests but telnet didn´t respond, when i tried test using wget or curl i got the response of the destination pod normally.

That´s was strange. I´m studying and try to understand why telnet doesn´t work for test propose and why NodePort can´t be called by a pod (calling NodePort from out of the cluster works fine).

lgomesd
July 22

Hey,

I´m using kube-proxy in iptables mode.

The CIDR is the kubernetes default, i didn´t change nothing related with CIDR.

There isn’t really a kubernetes default. This gets passed into
kube-apiserver, and all of the examples use a /16 or smaller. Can you
check that?

For the flannel i used --pod-network-cidr=10.244.0.0/16 in the kubeadm init.

I have figured out that the pods can comunicate each others by the ClusterIP services ports and directly. Call the nodePorts service type doesn´t work.

I was using telnet to do the tests but telnet didn´t respond, when i tried test using wget or curl i got the response of the destination pod normally.

That´s was strange. I´m studying and try to understand why telnet doesn´t work for test propose and why NodePort can´t be called by a pod (calling NodePort from out of the cluster works fine).

That is strange, maybe a tcpdump would help?

Hello,
was a solution to this found?

I’m experiencing the same exact issue.

My cluster settings are basically the same, except that the versions of k8s, flannel, etc… are more recent.

The container runtime is CRI-O (assuming it matters).

Hi,
What are the symptoms? Are there any valid endpoint available to a service?

@fox-md @thockin

The Issue:
Any pod can reach any other Pod (regardless of the node) via the Pods’ IP.
Service IPs aren’t reachable from within the Pods (the failure message is always a variation of “network unreachable”), but are reachable from the hosts.

Another thing to note is that CoreDNS isn’t working properly, but that’s just a symptom of the aforementioned issue, because CoreDNS tries to reach the K8s API via the kubernetes.default service IP, which fails (as a special case of what I described in the previous paragraph):

[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.96.0.1:443/version": dial tcp 10.96.0.1:443: connect: network is unreachable
[INFO] plugin/ready: Still waiting on: "kubernetes"

I tried all the steps in https://kubernetes.io/docs/tasks/debug/debug-application/debug-service/ and they succeeded (both in iptables and ipvs mode) except for the hairpin section. I noticed that hairpin wasn’t enabled, but then I added to my provisioning automation steps to enable hairpinning and it was still failing (i.e. pod can’t reach its own service IP even with hairpinning).

What I discovered:
In the network namespaces of the Pods, there’s neither an interface belonging to the services subnet, nor a default gateway. So traffic to IPs in the service subnet has nowhere to go.
If I enter the network namespace of a Pod and manually make the interface (in that namespace) associated to the pods’ subnet the default route/gateway, then services become reachable from within that Pod.

I’m trying to understand why no default gateway/route is setup in pods’ network namespaces: https://kubernetes.slack.com/archives/C09QYUH5W/p1701189954362389 . Apparently flannel is the culprit.

My cluster setup:
I have a baremetal K8s 1.28.2 cluster on ubuntu 22.04 using CRI-O v1.28.1~0 backed by cri-o-runc v1.0.1~2 (the versions are those of the apt packages I’m downloading).

I’m deploying flannel v0.23.0 via the manifests at https://github.com/flannel-io/flannel/releases/download/v0.23.0/kube-flannel.yml, but I’m also downloading the flannel CNI plugin v1.2.0 and writing the file /etc/cni/net.d/10-crio.conf to

        {
          "name": "crio",
          "type": "flannel"
        }

as per instructions here.

I tried both iptables and ipvs, the issue appears with both.

My pod CIDR is 10.244.0.0/16, my service CIDR is 10.96.0.0/24 (originally it was 10.96.0.0/12, the issue appears in both cases).