How can I tell if a k8s OOMKilled is because the node is running out of memory and thus killing the pod, or if the pod itself is being killed because it is requesting more memory than the limt declaration limit?
Can I see it directly from kubectl describe?
From the output of the describe subcommand below, you can only see OOMKilled, but you can’t tell if the pod itself is OOMKilled or if it’s external.
─➤ kb describe -n nononoxx pod image-vector-api-server-prod-5fffcd4884-j9447
Name: image-vector-api-server-prod-5fffcd4884-j9447
Namespace: mediawise
Priority: 0
Service Account: default
Node: nononox.nononoxx/nononox
Start Time: Wed, 01 Nov 2023 17:25:54 +0800
Labels: app=image-vector-api
pod-template-hash=5fffcd4884
Annotations: kubernetes.io/psp: ack.privileged
Status: Running
IP: nononoxx
IPs:
IP: nononoxx
Controlled By: ReplicaSet/image-vector-api-server-prod-5fffcd4884
Containers:
image-vector-api:
Container ID: docker://78dc88a880d769d5cb4a553672d8a4b4a0b69b720fcbf9380096a77d279c5645
Image: registry-vpc.cn-nononox.nononox.com/nonono-cn/image-vector:master-nononononono
Image ID: docker-pullable://nononoxx.nononox.com/nonono-cn/image-vector@sha256:058c43265845a975d7cc537911ddcc203fa26f608714fe8b388d5dfd1eb02d92
Port: 9205/TCP
Host Port: 0/TCP
Command:
python
api.py
State: Running
Started: Wed, 01 Nov 2023 18:35:49 +0800
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 01 Nov 2023 18:25:34 +0800
Finished: Wed, 01 Nov 2023 18:35:47 +0800
Ready: True
Restart Count: 8
Limits:
cpu: 2
memory: 2000Mi
Requests:
cpu: 10m
memory: 1000Mi
Liveness: http-get http://:9205/ delay=60s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:9205/ delay=60s timeout=1s period=30s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2kwj9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-2kwj9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Want to analyse this kind of problem without introducing external tools (like prometheus)