Kubenet network performance degraded, solved using hostNetwork: true, with unicorn app

Hi!

I’m trying to debug an issue that is solved by using hostNetwork: true. The k8s installation is using kubenet, and the k8s version is 1.9.8.

The installation is done with kops on AWS, using m4.xlarge and c4.xlarge instances.

The problem is the following:

When we migrated this application to kubernetes, the response time (percentile 95) for a certain endpoint increased about 20-30%.

This issue is solved, though, when using hostNetwork: true in the yaml. The performance is the same than it was on VMs for this endpoint, i.e the percentile 95 for the response time is the same for this endpoint.

I’ve asked this in the kubernetes office hours on July 18th (yeah, a while ago!) and the hostNetwork: true workaround come up there. They told me to cc them if I created an issue on github, but I’m starting here as the clues are really vague. Just using the tag :smiley:

The pod has 3 containers:

  • Nginx
  • A log collector
  • The app (a ruby app running with unicorn)

These apps are also in the VMs mode

What I tried:

  • Found a way to reproduce using ab(apache benchmark)
  • ab -c 1 -n 1000 'https://...
  • The same happens with http, instead of https
  • I tried removing the nginx container, but it doesn’t change anything.
  • The log collector is used over localhost, and the very same thing is done on the VMs that do not exhibit the problem
  • I tried using unix sockets between nginx and the app, instead of localhost, but it didn’t change anything either.
  • Tried using same instances (m4.xlarge) with EKS: the same happens. Although the performance cost of not using hostNetwork: true is less, about 10%. Please note that EKS does not use kubenet and uses their own network overlay based on some open source.
  • Tried using another endpoint that just returns a string (puts “Ok”) and the issue does not happen
  • Tried using an endpoint that returns a few MBs (like "Die" * 10 * 1024 * 1024), and the issue does not happen either
  • Tried the same endpoint that has the issue with different query string params, so the response is big (9MB) or short (130kb) and both reliably reproduce the issue
  • Tried a nodejs application, that returns similar jsons from similar sources, and the issue is not present (nor with short/long responses)

What might do next:

So, I’m trying to debug this issue to understand what it is and, hopefully, stop using hostNetwork: true. There seems to path to dig further:

  • Try other CNIs (EKS showed less performance degradation) to see if the performance changes

  • See what this endpoint does or how it interacts with unicorn and the whole stack. One big difference is that unicorn is one process per request (synchronous) and nodejs is not.

  • Try to use more newer machines (m5/c5) to see if they mitigate the performance hit. But, as this issue is not present with the current instances using them as VMs, seems that if it helps, will only hide the problem

This endpoint that has the perf problem, is an endpoint in ruby that reads a database and gets returns a json. The database, the host, the network, all seem fine (monitoring CPU, disk IO, swap, etc. with vmstat, our regular tools, AWS console, checking kern.log, sysloca and that stuff also)

By any chance, did you have a similar experience? Or do you have any other ideas on how to continue to debug this issue?

Any ideas or any kind of help is more than welcome!

Rodrigo

The problem seems to be https://github.com/kubernetes/kubernetes/issues/56903

The workarounds mentioned there (like dnsPolicy: Default ) solve the issue for me.

These two post explain the problem in detail: https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts and https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/

And also provide some workarounds.

long story short: there is a race condition in nf that affects connection-less protocols (like UDP) when doing DNAT/SNAT. The weave guys have sent a patch to fix most of the races. To work-around you can either use an external dns (i.e. not kube-dns as it is exposed via a service and, so, uses DNAT), set flags for glibc (but don’t work for musl), use a minimal delay with tc , etc.

Note: Using dnsPolicy: Default does the trick because it is using an external DNS server (i.e. not hosted in kubernetes and accessed via a service, that does DNAT).

I’ll test the glibc flags for my cluster, although the dnsPolicy: Default thing does solve the issue for me, as we are using k8s DNS service resolution on some apps.

1 Like

If you use k8s DNS names, you can’t use dnsPOlicy: Default, right?

We’re looking at a DNS architecture proposal that will avoid the NAT
issues and focus on reliability.

1 Like

Hi Tim!

Yeah, I can’t, really use that. I’ll try the glibc flags. If that works for me too, that’s enough to get around this problem (it’s better than hostNetwork :-D). We are not using musl :slight_smile:

What architecture proposal? I’d be interested to have a look. Maybe even collaborate if possible : -)

Thanks for taking a look, Tim!

1 Like

Working to have a doc out very soon with a proposal that covers the range of problems.

1 Like

KEP coming soon.

1 Like

The flags for glibc (to use tcp or the reopen, etc.) didn’t work for me. They helped, as I wasy seeing the conntrack race condition in the logs, but didn’t completely solved it for me. But using an external dns server, with dnsPolicy: Default, does completely solve it.

I’ll look into this further later. Maybe even into core dns, that is going to be the default 1.13. Not sure if this is an issue with kube-dns and, if it is, if it’s worth looking instead of using the brand new dns for kubernetes.

Thanks again! :slight_smile:

1 Like