GKE ignores readiness probe from pod during high load

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.17.14
Cloud being used: Google
Host OS: Ubuntu 18.04

I have an app running in kubernetes, on a couple of pods. I’m trying to improve our deployment experience (we’re using rolling deployment), which is currently causing pains.

What I want to achieve:

  • each pod first goes not ready, so it gets no more traffic
  • then it will finish the requests it’s processing currently
  • then it can be removed

This should all be possible and just work - you create a deployment that contains readiness and liveness probes. The load balancer will pick these up and route traffic accordingly. However, when I test my deployment, I see pods getting requests even when switching to not ready. Specifically, it looks like the load balancer won’t update when a lot of traffic comes in. I can see pods going “not ready” when I signal them - and if they don’t get traffic when they switch state, they will not receive traffic afterwards. But if they’re getting traffic while switching, the load balancer just ignores the state change.

I’m starting to wonder how to handle this, because I can’t see what I’m missing - it must be possible to host a high traffic app on kubernetes with pods going “not ready” without losing tons of requests.

My configurations

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-service
  name: my-service
  namespace: mine
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-service
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
    app: my-service
    env: production
    spec:
      containers:
      - name: my-service
    image: IMAGE ID
    imagePullPolicy: Always
    volumeMounts:
    - name: credentials
      mountPath: "/run/credentials"
      readOnly: true
    securityContext:
      privileged: true
    ports:
    - containerPort: 8080
      protocol: TCP
    lifecycle:
      preStop:
        exec:
          command: ["/opt/app/bin/graceful-shutdown.sh"]
    readinessProbe:
      httpGet:
         path: /ready
         port: 8080
      periodSeconds: 1
      initialDelaySeconds: 5
      failureThreshold: 1
    livenessProbe:
      httpGet:
         path: /alive
         port: 8080
      periodSeconds: 1
      failureThreshold: 2
      initialDelaySeconds: 60
    resources:
      requests:
        memory: "500M"
        cpu: "250m"
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 60
      nodeSelector:
    cloud.google.com/gke-nodepool: stateful

Service/loadbalancer

apiVersion: v1
kind: Service
metadata:
  name: stock-service-loadbalancer
  namespace: stock
spec:
  selector:
    app: stock-service
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

There are a few things that could be going on here.

Once a pod starts it’s deletion cycle (the API receives a delete operation) it begins “graceful termination”. I can’t tell if it’s just here, but your YAML indenting (sigh) is wrong and fields like terminationGracePeriodSeconds might be ignored.

Once a pod is marked as deleting, that grace period starts. At roughly the same time, watchers are notified. Kubelet will soon send a SIGTERM (by default) and endpoints controller will consider it to be not-ready and remove it from the set.

That should stop the LBs from routing to the terminating pod, but this is all a little async. If you are still seeing traffic to the pod after 5-10 seconds, either some of the routing infra is slow or stalled, or (more likely) you have a client with an open connection that is still sending traffic on that socket.

You should be able to “prove” this by looking at the Endpoints for your Service during this time.

I’m using kubectl logs to see what’s happening - the app logs all requests, so I can see if/when the traffic stops. Which it doesn’t - it just keeps going. However, I wasn’t aware that an open socket could keep the load balancer from updating. That could be the explanation - will check for that.

Thanks!!

Many LBs do connection reuse, as do many clients. Often I see this problem reported as “all my traffic goes to one backend”, and it turns out they have a load-generating client that reuses connections.