Configuring working liveness and readiness probes for high load pods

Kubernetes version: 1.27.3
Cloud being used: GKE

Im running laravel php app in container build from php-fpm with nginx container to handle traffic to php-fpm.

Everything will work fine until i make load test to my app and when pod is under high load. http response time can be over 15sec and during this time first my readiness probe will start to fail and after that livenness probe will also restart the container. Im bit stuck here with probes and how to configure them correctly and how can i handle situations when site is under heavy load, and even that point that all my resources are in use.

I think in my case the correct probe to use is http as then i know for sure that is the application running, but also when php is under load, response times will get longer.

What this then make in real life, all the pods will start to fail and restart to that point that there is no pods available, kubernetes will get some pod up and then readiness probe will fail again and the pod will be restarted after some time.

Situations when this happen will be ok even the request times will be high, but i should get the probes to work also correctly.

This is now my config for them:

       livenessProbe:
            httpGet:
              path: /hc
              port: 80
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /hc
              port: 80
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
          startupProbe:
            httpGet:
              path: /hc
              port: 80
            failureThreshold: 30
            periodSeconds: 15

Failing a liveness probe literally means “my process is so messed up it needs to be killed and restarted”. If you think a 15s latency is “healthy” then you can set timeout to something higher than that.

The downside is that whatever this timeout is determines the min time it takes to notice that your app has deadlocked or something.

Many people make some custom logic to handle the probe URL with maximum priority to avoid this.

Thanks for the answer, i have also same thoughts about it, and about the 15sec loading time, with php without cache this will occur quite fast even the resources are not over loaded, and im handling data that cannot be cached, i have to run the code for every request and the resources needed for this can come quite high, so its ok to wait 15sec for req in this point and not scale the whole app so big.

Also this happen so often that its not a big deal.

And for the app, i think i have to check better options to build the hc, its quite slow now for the response even without the load.