LoadBalancer service not working

#1

I’m trying to figure out why my service (type: LoadBalancer) isn’t working. This is running on AWS.

I followed the steps in Debug Services - Kubernetes but can’t find any smoking gun.

In particular, when ssh’ing into a cluster node, the following all work:
nslookup <fq service name> <dns ip>
curl <service ip>:<port>
nslookup <external load balancer hostname>
kube-proxy is running
curl localhost:<nodeport>

What doesn’t work:
curl <external load balancer hostname>:<port>: Empty reply from server
curl <external load balancer ip>:<port>: Empty reply from server

I have another service running in the system that works just fine (meaning curl <load balancer host>:<port> returns something). I looked at the iptables entries for both services, but there doesn’t appear to be anything obvious missing for the non-working service.

(The one thing that didn’t agree with what Debug Services - Kubernetes expects is the content of /etc/resolve.conf:
domain us-west-2.compute.internal
search us-west-2.compute.internal
nameserver 172.20.0.2

However, this doesn’t appear to impact the working service.)

Can anyone suggest some next steps to get to the heart of the problem?

0 Likes

#2

What is the problem you see? Is a load balancer created on AWS (like browsing the AWS console)? Is there any event in the objects (svc)?

If you have the load balancer on AWS created, see the listener it is using (in it’s config). Is it listening in which port? And to which port is forwards traffic to the kubernetes nodes? Is that the port listed in the service as nodePort?

Do the kubernetes nodes accept traffic from the security group the load balancer is using? And if so, the security group the load balancer is using, accepts traffic from the internet in the ports it is listening?

I think one of those should be causing this issue, if I had to bet. But please try them all, and report back if it works or it doesn’t :slight_smile:

0 Likes

#3

Hi Rodrigo, thanks for your response.

The one thing I noticed is that all of the instances for the broken service’s load balancer are listed as ‘out of service’ (and - not surprisingly - for the healthy service they are ‘in service’), so I need to dig into that.

0 Likes

#4

Great. And which port is using for the health check? Is it using the nodePort of the service?

If it is, then my next bet would be that the security group the load balancer is using can’t connect to the workers on that node. So, I’d make sure to accept incoming connections from the security group the workers are running. Can you check that? :slight_smile:

0 Likes

#5

Hi Rodrigo, your guess was correct, the health check was configured incorrectly (in this case, the nodeport it was using wasn’t serving anything). When I manually reconfigured the load balancer to use a different port, one that’s actually listened on, it worked.

Thanks for your help!

A follow-on question: I noticed that the load balancers have all ec2 instances that are port of the entire cluster registered. In my specific case that is undesirable because the (single) pod that’s backing the service can only run on a single node (via a node selector). Was it a conscious choice to register all cluster nodes with every load balancer, even when the backing pods can only run on a subset of the nodes?

0 Likes

#6

Glad it worked!

Yes, registering all is intended. The thing is the following:

The ELB does not know about pods, it knows about VMs instances. So, you can only register instances and not pods. Then, how do you distribute traffic evenly across pods? One option is to register all nodes as backends and then let kubernetes do the load balancing between pods (as kubernetes does know about pods).

This is basically what is happening, and the most common setup. The nodePort, as you seen the iptables rules, load balances between pods.

If you have a good reason to not want that (but I would recommend really having a good reason), you can change the behavior by changing the service external Traffic Policy setting to LocalOnly. If you do that, the health check will fail on all nodes except on the ones that are running a pod for the application.

That way, you will see only one node as healthy. But, then, if the pod is scheduled to some other node you need to wait for that node to become healthy for the AWS load balancer (with it’s params for healthy checking) and might introduce downtime and those not wanted stuff.

IMHO, it’s better to let kubernetes do the load balancing for pods, as it knows about pods. Amazon ELB doesn’t, so it’s not the best layer for the true load balancing.

0 Likes