Weird performance issue in GKE


Hi all…I am facing latency issues with my application running in a pod on GKE. My application is running a grpc server which receives audio data over the network, processes it (somewhat compute intensive) and then returns the result over grpc. I am starting my server in docker entrypoint. I am facing a weird performance issue. Once the pod comes up I hit my audio to the server but my responses get severely delayed (takes a couple of seconds). When I saw “top” inside the pod cpu utilization was going over 130%. However when I go inside the pod and start the server manually and then hit from client magically my response time is reduced to 70ms!! and cpu utilization was only around 40%.
Interestingly when I kill the server and start it via kubectl exec from outside the response time is still bad (same as entrypoint).


  1. My pod is in guaranteed QOS class with integer values for requests and limits.
  2. I have tried disabling cfs quota, setting cfs quota period to 1ms. None of this worked.
  3. Removing the limits from pod spec improved my response time to around 900ms. But still no way close to 70ms when I manually start the server going inside the pod.

Requesting some guidance on this issue.

Cluster information:

Kubernetes version: 1.12.6-gke.10
Cloud being used: Google Cloud
Installation method:
Host OS: Ubuntu
CNI and version:
CRI and version: