How can we tell if the OOMKilled in k8s is because the node is running out of memory and thus killing the pod, or if the pod itself is being killed because the memory it has requested exceeds the limt declaration limit?

ponponon · December 4, 2023, 2:03am

How can I tell if a k8s OOMKilled is because the node is running out of memory and thus killing the pod, or if the pod itself is being killed because it is requesting more memory than the limt declaration limit?

Can I see it directly from kubectl describe?

From the output of the describe subcommand below, you can only see OOMKilled, but you can’t tell if the pod itself is OOMKilled or if it’s external.

─➤  kb describe -n nononoxx pod image-vector-api-server-prod-5fffcd4884-j9447                                             
Name:             image-vector-api-server-prod-5fffcd4884-j9447
Namespace:        mediawise
Priority:         0
Service Account:  default
Node:             nononox.nononoxx/nononox
Start Time:       Wed, 01 Nov 2023 17:25:54 +0800
Labels:           app=image-vector-api
                  pod-template-hash=5fffcd4884
Annotations:      kubernetes.io/psp: ack.privileged
Status:           Running
IP:               nononoxx
IPs:
  IP:           nononoxx
Controlled By:  ReplicaSet/image-vector-api-server-prod-5fffcd4884
Containers:
  image-vector-api:
    Container ID:  docker://78dc88a880d769d5cb4a553672d8a4b4a0b69b720fcbf9380096a77d279c5645
    Image:         registry-vpc.cn-nononox.nononox.com/nonono-cn/image-vector:master-nononononono
    Image ID:      docker-pullable://nononoxx.nononox.com/nonono-cn/image-vector@sha256:058c43265845a975d7cc537911ddcc203fa26f608714fe8b388d5dfd1eb02d92
    Port:          9205/TCP
    Host Port:     0/TCP
    Command:
      python
      api.py
    State:          Running
      Started:      Wed, 01 Nov 2023 18:35:49 +0800
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 01 Nov 2023 18:25:34 +0800
      Finished:     Wed, 01 Nov 2023 18:35:47 +0800
    Ready:          True
    Restart Count:  8
    Limits:
      cpu:     2
      memory:  2000Mi
    Requests:
      cpu:        10m
      memory:     1000Mi
    Liveness:     http-get http://:9205/ delay=60s timeout=1s period=30s #success=1 #failure=3
    Readiness:    http-get http://:9205/ delay=60s timeout=1s period=30s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2kwj9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-2kwj9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

Want to analyse this kind of problem without introducing external tools (like prometheus)

Alexandru_Lazarev · December 13, 2024, 9:09pm

@ponponon
In Pods are running container(s) - usualy one.

Limits are settled at the level of container, if container exceeded MEM limits then it will be killed by OOMKille Linux Kernel GCroup sub-sytem with signal SIGKILL (9), BUT pod WILL NOT BE KILLED - container will be restarted (if restartPolicy allow) within the same Pod - so here is in your case is not becuase of Node, but because of containewr within a pod exceeded limit. Exit code is 128 + Killed Signal = 128 + 9 = 137 in your case, and “OOMKilled” refferes to container.

When Node is Out Of Memory and failed to reclaim something, it will evict (kill) entire pode, usually, which consume more memory than requested, and you’ll no more see old pod. You can see it in node conditions (kubect describe node my-node1)

Topic		Replies	Views
Execution of OOMkill General Discussions	0	637	August 10, 2020
Node "not ready" state when sum of all running pods exceed node capacity General Discussions	13	8085	September 29, 2019
Kubelet doesn't recognize child process oom General Discussions	3	2208	June 16, 2019
Correctly handle OOM killed job General Discussions	1	1712	June 13, 2019
Kubectl execution in pod getting OOMKilled General Discussions development	4	1359	September 26, 2023

How can we tell if the OOMKilled in k8s is because the node is running out of memory and thus killing the pod, or if the pod itself is being killed because the memory it has requested exceeds the limt declaration limit?

Related topics