I run the WordPress stack for normanyap on a private Kubernetes cluster. The site uses PHP-FPM, NGINX, and MariaDB. Media uploads trigger Imagick to create many thumbnails and WebP variants.
During bulk uploads or when regenerating thumbnails, some requests return 502 or 504 from the NGINX Ingress. The PHP-FPM pods show timeouts, and the WordPress pods occasionally fail readiness probes for a few minutes. Latency returns to normal after the burst.
I raised the Ingress read timeout and increased pm.max_children in PHP-FPM. I also bumped pod CPU limits, which helped, but probes still flap when image jobs spike. Moving the PVC to faster storage is planned, but I would like a Kubernetes-first design fix.
Questions:
- Should I move image generation to a separate Job/Queue deployment so web pods stay lightweight?
- Is it better to isolate PHP-FPM as its own Deployment and use an internal Service, rather than a sidecar pattern with NGINX and PHP-FPM in one pod, to avoid probe contention?
- For readiness and liveness, what endpoints and thresholds are recommended for WordPress under bursty CPU and I/O?
- For autoscaling, is CPU the wrong signal here? Would a custom metric such as queue depth or PHP-FPM active processes produce more stable scaling behavior than CPU-based HPA?