Hi all…I am facing latency issues with my application running in a pod on GKE. My application is running a grpc server which receives audio data over the network, processes it (somewhat compute intensive) and then returns the result over grpc. I am starting my server in docker entrypoint. I am facing a weird performance issue. Once the pod comes up I hit my audio to the server but my responses get severely delayed (takes a couple of seconds). When I saw “top” inside the pod cpu utilization was going over 130%. However when I go inside the pod and start the server manually and then hit from client magically my response time is reduced to 70ms!! and cpu utilization was only around 40%.
Interestingly when I kill the server and start it via kubectl exec from outside the response time is still bad (same as entrypoint).
Note:
- My pod is in guaranteed QOS class with integer values for requests and limits.
- I have tried disabling cfs quota, setting cfs quota period to 1ms. None of this worked.
- Removing the limits from pod spec improved my response time to around 900ms. But still no way close to 70ms when I manually start the server going inside the pod.
Requesting some guidance on this issue.
Cluster information:
Kubernetes version: 1.12.6-gke.10
Cloud being used: Google Cloud
Installation method:
Host OS: Ubuntu
CNI and version:
CRI and version: