Kubernetes frontend service latency astronomically higher than Docker Compose-based setup

midsbie · June 16, 2024, 6:47pm

I am experiencing an astronomical latency issue with my Kubernetes setup compared to a Docker Compose setup. When making a request to the frontend service on Kubernetes, it takes close to 10 seconds to respond, while the same request against a Docker Compose instance of the exact same image takes about 200ms.

Kubernetes Node Components:

Nginx server
API service
Postgres
Redis
Web frontend service

Network Configuration:

Using Calico’s Tigera operator with the machine on subnet 192.168.88.0/24 and the node on 10.10.0.0/16
No other customizations apart from the subnet.
Followed the Calico quickstart guide: Calico Quickstart Guide

Additionally:

None of the system’s deployments specify resource limits.
No Ingress resource specified. Instead, a NodePort is declared on the nginx service. However, I cannot access it externally using the hostname nginx.my-namespace.svc.cluster.local (nslookup reports no such hostname). Since this suggests some sort of DNS resolution issue, I’m wondering if this could be related to my issue?
The latency issue is consistent even when testing inside the frontend pod with curl http://localhost....
There is no difference in latency between static pages or pages with dynamically generated content.

I’m convinced it’s NOT an application issue and suspect it to be related to the network.

What could be causing this significant latency in my Kubernetes setup, and what steps can I take to diagnose and resolve this issue?

Any insights or diagnostic steps would be greatly appreciated!

Cluster information:

Machine: bare-metal Ubuntu 22.04 8-Core 32GiB
Kubernetes version: registry.k8s.io/kube-apiserver:v1.28.11
CNI: docker.io/calico/apiserver:v3.28.0
CRI: containerd containerd.io 1.6.33

thockin · June 16, 2024, 7:32pm

10 seconds is not a performance issue, it’s a configuration issue. It’s not like “k8s is less efficient with the network” or something

Something somewhere is timing out while it waits for a response that never comes. It’s hard to pin down who or why without disassembling the application layer by layer.

I"d first figure out which part of the app is slow by comparing log timestamps or even just watching the logs in real time.

It so happens that many DNS configs have a default 5 second timeout, so it’s possible that the heart of the problem is that DNS is not set up properly.

midsbie · June 17, 2024, 3:40pm

Thanks for the insight, @thockin! I didn’t know about the DNS 5-second timeout but it does seem to point more strongly towards issues with DNS resolution.

Do you happen to know of a good guide or resource for setting up DNS properly in a Kubernetes cluster, especially when using Calico? Any recommendations or best practices would be greatly appreciated!

thockin · June 17, 2024, 4:05pm

DNS shouldn’t be hard to set up.

Can all your pods talk to each other, across nodes?
Do kube Services work?
Do you have a DNS service (in cluster or out)?
Are your pods correctly using that DNS service (via kubelet config)?

This doc may help: Debug Services | Kubernetes

midsbie · June 17, 2024, 4:34pm

I was just able to confirm our suspicions that DNS misconfiguration was involved. I ran tcpdump -i any port 53 then went inside the relevant service pod and hit an endpoint directly on 127.0.0.1. Sure enough, I got a bunch of DNS queries for api.NAMESPACE.svc.cluster.local. This is expected because the pod does talk to another pod (api). What I don’t quite understand is why it is hitting the pod’s FQDN (if I can call it that) because the code explicitly references api, not api.NAMESPACE.svc.cluster.local. Some sort of resolution magic seems to be happening.

Can also confirm that making requests across nodes does work but only if referencing the service’s name.

The only DNS service that I use is my router’s DNS server:

$ grep -P ^DNS /etc/systemd/resolved.conf 
DNS=192.168.88.1 1.1.1.1 8.8.8.8 8.8.4.4
$ nslookup wonga.com 192.168.88.1
Server:		192.168.88.1
Address:	192.168.88.1#53

Non-authoritative answer:
Name:	wonga.com
Address: 104.18.79.205

Seems like I will have to fix the *.cluster.local DNS resolution after all!

thockin · June 17, 2024, 5:20pm

When running a pod we drop a bunch of DNS search paths so api is first looking for a kube service named api in the same namespace as the client.

If you don’t want that, you can use the Pod’s dnsConfig to override.

Topic		Replies	Views
Kubernetes Not Resolving DNS Requests General Discussions	0	3005	August 9, 2018
CoreDNS service pingable, but DNS resolution is not querying the service General Discussions	1	4671	December 11, 2019
Kubernetes Cluster on nodes with multiple NICs General Discussions	4	18590	September 12, 2018
DNS name resolution intermittent General Discussions	0	832	January 16, 2019
DNS Resolution not working for busybox General Discussions development , k8s-blog , coredns	1	6795	March 19, 2020

Kubernetes frontend service latency astronomically higher than Docker Compose-based setup

Cluster information:

Related topics