GKE ignores readiness probe from pod during high load

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.17.14
Cloud being used: Google
Host OS: Ubuntu 18.04

I have an app running in kubernetes, on a couple of pods. I’m trying to improve our deployment experience (we’re using rolling deployment), which is currently causing pains.

What I want to achieve:

  • each pod first goes not ready, so it gets no more traffic
  • then it will finish the requests it’s processing currently
  • then it can be removed

This should all be possible and just work - you create a deployment that contains readiness and liveness probes. The load balancer will pick these up and route traffic accordingly. However, when I test my deployment, I see pods getting requests even when switching to not ready. Specifically, it looks like the load balancer won’t update when a lot of traffic comes in. I can see pods going “not ready” when I signal them - and if they don’t get traffic when they switch state, they will not receive traffic afterwards. But if they’re getting traffic while switching, the load balancer just ignores the state change.

I’m starting to wonder how to handle this, because I can’t see what I’m missing - it must be possible to host a high traffic app on kubernetes with pods going “not ready” without losing tons of requests.

My configurations

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-service
  name: my-service
  namespace: mine
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-service
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
    app: my-service
    env: production
    spec:
      containers:
      - name: my-service
    image: IMAGE ID
    imagePullPolicy: Always
    volumeMounts:
    - name: credentials
      mountPath: "/run/credentials"
      readOnly: true
    securityContext:
      privileged: true
    ports:
    - containerPort: 8080
      protocol: TCP
    lifecycle:
      preStop:
        exec:
          command: ["/opt/app/bin/graceful-shutdown.sh"]
    readinessProbe:
      httpGet:
         path: /ready
         port: 8080
      periodSeconds: 1
      initialDelaySeconds: 5
      failureThreshold: 1
    livenessProbe:
      httpGet:
         path: /alive
         port: 8080
      periodSeconds: 1
      failureThreshold: 2
      initialDelaySeconds: 60
    resources:
      requests:
        memory: "500M"
        cpu: "250m"
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 60
      nodeSelector:
    cloud.google.com/gke-nodepool: stateful

Service/loadbalancer

apiVersion: v1
kind: Service
metadata:
  name: stock-service-loadbalancer
  namespace: stock
spec:
  selector:
    app: stock-service
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

There are a few things that could be going on here.

Once a pod starts it’s deletion cycle (the API receives a delete operation) it begins “graceful termination”. I can’t tell if it’s just here, but your YAML indenting (sigh) is wrong and fields like terminationGracePeriodSeconds might be ignored.

Once a pod is marked as deleting, that grace period starts. At roughly the same time, watchers are notified. Kubelet will soon send a SIGTERM (by default) and endpoints controller will consider it to be not-ready and remove it from the set.

That should stop the LBs from routing to the terminating pod, but this is all a little async. If you are still seeing traffic to the pod after 5-10 seconds, either some of the routing infra is slow or stalled, or (more likely) you have a client with an open connection that is still sending traffic on that socket.

You should be able to “prove” this by looking at the Endpoints for your Service during this time.

I’m using kubectl logs to see what’s happening - the app logs all requests, so I can see if/when the traffic stops. Which it doesn’t - it just keeps going. However, I wasn’t aware that an open socket could keep the load balancer from updating. That could be the explanation - will check for that.

Thanks!!

Many LBs do connection reuse, as do many clients. Often I see this problem reported as “all my traffic goes to one backend”, and it turns out they have a load-generating client that reuses connections.

We have the same issue in our deployment. We also want a POD to not receive any traffic anymore, once it goes into unhealthy state (via a readiness probe). BUT we want it to be able to continue running, until another condition is met, which will result in the readiness probe to succeed again. I have created an issue (as a general question) here, because I’m not sure, if I understand the function of readiness probe correcty (kube-proxy: persistent connection kept alive although readiness checks fails (and POD not ready) · Issue #100492 · kubernetes/kubernetes · GitHub).

So once a POD (in our case) goes above a disk threshold (disk usage), it should not get any more traffic from clients (for this service it is providing), but still be able to process events until the disk usage is again below a threshold (maybe even another one), that is then causing the readiness probe to succeed. But once a connection is done (and established), even if the POD is reported unhealthy, it still receives data as long as the connection is active.

Thanks
Robert