Website Pods Randomly Restarting and Failing Health Checks in Kubernetes Cluster

joeroot · November 11, 2025, 7:06pm

Hi everyone, I’m running my restaurant-focused website (which features Texas Roadhouse menu pages, reviews, and coupon listings) inside a Kubernetes cluster on a managed cloud service. The setup uses Nginx as a reverse proxy, a Node.js backend API, and a MySQL database deployed via StatefulSet. Everything was running fine for a few months, but recently, I’ve started noticing random pod restarts, failed health checks, and unstable traffic routing through my services.

The main issue is that my Node.js application pods occasionally fail the liveness probe. Kubernetes automatically restarts them, but the logs don’t show any critical application-level errors — just normal request handling, followed by a sudden termination. I checked the pod events with kubectl describe pod, and I see “Readiness probe failed: connection refused” before each restart. Increasing the initial delay and timeout values helped temporarily, but the issue returns every few hours.

I also noticed that CPU and memory usage spike right before a restart. I’ve already added resource limits and requests to the deployment YAML, but the metrics in kubectl top pods show random bursts that cause throttling. The Node.js app handles dynamic data (menu updates, images, and API calls), so I wonder if these bursts are linked to higher traffic or inefficient memory usage. However, the same code runs smoothly in a standalone Docker environment, so I suspect this might be related to Kubernetes resource management or autoscaling behavior.

Another concern is with persistent storage. My MySQL pod is using a PersistentVolumeClaim with an SSD-backed disk, but occasionally the application logs show “database connection lost” errors for a few seconds. The database container doesn’t crash, but it seems like there’s a temporary I/O freeze. I’ve verified that the PVC isn’t being rescheduled, so I’m unsure if this is a networking issue between pods or something related to storage throttling.

I also noticed that when one pod restarts, the load balancer doesn’t immediately remove it from the Service endpoints, which causes a few 502 errors during user requests. I’m using a standard ClusterIP service with an Nginx Ingress controller. I thought readiness probes were supposed to handle that gracefully, but maybe my configuration isn’t tuned properly. I’ve been experimenting with shorter probe intervals and longer thresholds, but I haven’t found a balance that avoids downtime completely.

Has anyone else experienced similar issues where pods intermittently fail health checks and restart even though the app itself isn’t crashing? I’d love some advice on best practices for configuring readiness and liveness probes for Node.js apps, handling storage latency in MySQL StatefulSets, and ensuring smoother traffic routing during pod restarts. My goal is to make the website’s Kubernetes deployment as stable and resilient as it used to be before these issues began. Sorry for the long post!

joeroot · November 15, 2025, 8:48pm

joeroot:

Hi everyone, I’m running my restaurant-focused website (which features Texas Roadhouse menu pages, reviews, and coupon listings) inside a Kubernetes cluster on a managed cloud service. The setup uses Nginx as a reverse proxy, a Node.js backend API, and a MySQL database deployed via StatefulSet. Everything was running fine for a few months, but recently, I’ve started noticing random pod restarts, failed health checks, and unstable traffic routing through my services.

The main issue is that my Node.js application pods occasionally fail the liveness probe. Kubernetes automatically restarts them, but the logs don’t show any critical application-level errors — just normal request handling, followed by a sudden termination. I checked the pod events with kubectl describe pod, and I see “Readiness probe failed: connection refused” before each restart. Increasing the initial delay and timeout values helped temporarily, but the issue returns every few hours.

I also noticed that CPU and memory usage spike right before a restart. I’ve already added resource limits and requests to the deployment YAML, but the metrics in kubectl top pods show random bursts that cause throttling. The Node.js app handles dynamic data (menu updates, images, and API calls), so I wonder if these bursts are linked to higher traffic or inefficient memory usage. However, the same code runs smoothly in a standalone Docker environment, so I suspect this might be related to Kubernetes resource management or autoscaling behavior.

Another concern is with persistent storage. My MySQL pod is using a PersistentVolumeClaim with an SSD-backed disk, but occasionally the application logs show “database connection lost” errors for a few seconds. You can visit thetexasroadhousemenu.com to explore website. The database container doesn’t crash, but it seems like there’s a temporary I/O freeze. I’ve verified that the PVC isn’t being rescheduled, so I’m unsure if this is a networking issue between pods or something related to storage throttling. Has anyone else experienced similar issues where pods intermittently fail health checks and restart even though the app itself isn’t crashing?

I also noticed that when one pod restarts, the load balancer doesn’t immediately remove it from the Service endpoints, which causes a few 502 errors during user requests. I’m using a standard ClusterIP service with an Nginx Ingress controller. I thought readiness probes were supposed to handle that gracefully, but maybe my configuration isn’t tuned properly. I’ve been experimenting with shorter probe intervals and longer thresholds, but I haven’t found a balance that avoids downtime completely.

Has anyone else experienced similar issues where pods intermittently fail health checks and restart even though the app itself isn’t crashing? I’d love some advice on best practices for configuring readiness and liveness probes for Node.js apps, handling storage latency in MySQL StatefulSets, and ensuring smoother traffic routing during pod restarts. My goal is to make the website’s Kubernetes deployment as stable and resilient as it used to be before these issues began.

Is there anyone who can help me?

ccmm1992 · December 11, 2025, 2:16pm

I’m experiencing this issue as well and am looking for a way to fix it. I encountered this issue after upgrading to version 1.34.

Kartik · December 13, 2025, 7:26am

can you show how does your readiness probe looks like? are you sending an http request or executing some commands in readiness probe?

Topic		Replies	Views
Pod Restarts for My Website on Kubernetes Cluster General Discussions	2	73	February 26, 2026
Kube-apiserver Pod restarted with "Readiness probe failed: HTTP probe failed with statuscode: 500" General Discussions	0	147	August 21, 2025
How to reschedule pod on another node if node fails? How to speed up rescheduling? General Discussions	1	13100	July 17, 2019
Kubernetes api pod keep on restarting General Discussions network	0	530	February 24, 2024
Nginx-ingress controller not working in our Kubernetes setup, all of a sudden unexpected issues as pods crashed and restarting General Discussions docs , network	1	661	April 16, 2023

Website Pods Randomly Restarting and Failing Health Checks in Kubernetes Cluster

Related topics