Kube-apiserver being restarted more than 400 times (each) with exit code 137 (non OOM killed)

We do have a cluster where all API servers are failing regularly (about 400 times each).

When I get a description of the pods, I get something like:

    State:          Running
      Started:      Mon, 03 Apr 2023 03:00:39 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 06 Mar 2023 03:00:41 +0100
      Finished:     Mon, 03 Apr 2023 03:00:38 +0200
    Ready:          True
    Restart Count:  450
      cpu:        250m
    Liveness:     http-get https://*.*.*.*:6443/livez delay=10s timeout=15s period=10s #success=1 #failure=8
    Readiness:    http-get https://*.*.*.*:6443/readyz delay=0s timeout=15s period=1s #success=1 #failure=3
    Startup:      http-get https://*.*.*.*:6443/livez delay=10s timeout=15s period=10s #success=1 #failure=30

When looking on the web, it seams that the exit code means the pod was killed (exit code: 128+9 (SIGKILL))

However, the reason should be OOM killed. In our case we do have the reason Error.

Furthermore, there’s no trace of any process having been OOM killed.

How can I troubleshoot that further?
Have to pods been killed because of the ‘Readiness / Liveness’ probes?

Thanks in advance for any pointer.

hey @jaep for OMM out of memory. for the pod how much recourses you assigned at request and limits section in your Yaml file. try to increase the pod recourse. before that check your node recourses.

Our current configuration is a CPU request of 200m. This should not have an impact on the node since it is a request (only evaluated at scheduling time) and the value is really low.

reconfigure CNI in your case you said OMM error that mean you have to increase your pod Resources Memory and CPU. delete the pods they will recreate by stateful set.