Kubelet doesn't recognize child process oom

Hi, I encounter a issue that kubelet doesn’t recognize child process oom

For the test, I found it’s relate to the child process, the main process killed does cause pod restart correctly.

Currently, I’m suppose it should be kill, if oom signal been found on any process.

What should I do, is this a bug?

How should I handle this issue?

Jun 13 13:28:28 online-node-81-113 kernel: php invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
Jun 13 13:28:28 online-node-81-113 kernel: php cpuset=4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc mems_allowed=0-1
Jun 13 13:28:28 online-node-81-113 kernel: CPU: 28 PID: 25345 Comm: php Not tainted 4.14.15-1.el7.elrepo.x86_64 #1
Jun 13 13:28:28 online-node-81-113 kernel: Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.3.4 11/08/2016
Jun 13 13:28:28 online-node-81-113 kernel: Call Trace:
Jun 13 13:28:28 online-node-81-113 kernel:  dump_stack+0x63/0x85
Jun 13 13:28:28 online-node-81-113 kernel:  dump_header+0x9f/0x234
Jun 13 13:28:28 online-node-81-113 kernel:  ? mem_cgroup_scan_tasks+0x96/0xf0
Jun 13 13:28:28 online-node-81-113 kernel:  oom_kill_process+0x21c/0x430
Jun 13 13:28:28 online-node-81-113 kernel:  out_of_memory+0x114/0x4a0
Jun 13 13:28:28 online-node-81-113 kernel:  mem_cgroup_out_of_memory+0x4b/0x80
Jun 13 13:28:28 online-node-81-113 kernel:  mem_cgroup_oom_synchronize+0x2f9/0x320
Jun 13 13:28:28 online-node-81-113 kernel:  ? get_mctgt_type_thp.isra.30+0xc0/0xc0
Jun 13 13:28:28 online-node-81-113 kernel:  pagefault_out_of_memory+0x36/0x7c
Jun 13 13:28:28 online-node-81-113 kernel:  mm_fault_error+0x65/0x152
Jun 13 13:28:28 online-node-81-113 kernel:  __do_page_fault+0x456/0x4f0
Jun 13 13:28:28 online-node-81-113 kernel:  do_page_fault+0x38/0x130
Jun 13 13:28:28 online-node-81-113 kernel:  ? page_fault+0x36/0x60
Jun 13 13:28:28 online-node-81-113 kernel:  page_fault+0x4c/0x60
Jun 13 13:28:28 online-node-81-113 kernel: RIP: 0033:0x7f4861d086bf
Jun 13 13:28:28 online-node-81-113 kernel: RSP: 002b:00007ffc7c9fa7d8 EFLAGS: 00010206
Jun 13 13:28:28 online-node-81-113 kernel: RAX: 00007f485b01b040 RBX: 0000000002800000 RCX: 00000000003a9208
Jun 13 13:28:28 online-node-81-113 kernel: RDX: 0000000002800000 RSI: 00007f48592d2000 RDI: 00007f485bad2000
Jun 13 13:28:28 online-node-81-113 kernel: RBP: 00007f4861f33420 R08: 0000000006440000 R09: 00007f485ec1b048
Jun 13 13:28:28 online-node-81-113 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00007f485881b040
Jun 13 13:28:28 online-node-81-113 kernel: R13: 00007f485ec1b040 R14: 0000000006400000 R15: 000055e9863099e0
Jun 13 13:28:28 online-node-81-113 kernel: Task in /docker/4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc killed as a result of limit of /docker/4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc
Jun 13 13:28:28 online-node-81-113 kernel: memory: usage 57344kB, limit 57344kB, failcnt 28
Jun 13 13:28:28 online-node-81-113 kernel: memory+swap: usage 57344kB, limit 114688kB, failcnt 0
Jun 13 13:28:28 online-node-81-113 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jun 13 13:28:28 online-node-81-113 kernel: Memory cgroup stats for /docker/4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc: cache:0KB rss:57344KB rss_huge:49152KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:57344KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 13 13:28:28 online-node-81-113 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 13 13:28:28 online-node-81-113 kernel: [25345]     0 25345    42688    17068      63       3        0             0 php
Jun 13 13:28:28 online-node-81-113 kernel: Memory cgroup out of memory: Kill process 25345 (php) score 1195 or sacrifice child
Jun 13 13:28:28 online-node-81-113 kernel: Killed process 25345 (php) total-vm:170752kB, anon-rss:57176kB, file-rss:11096kB, shmem-rss:0kB
Jun 13 13:28:28 online-node-81-113 kernel: oom_reaper: reaped process 25345 (php), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

fore mor example oom https://pastebin.com/ECdhtg2z

Cluster information:

Kubernetes version: v1.14.1
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: kubeadm
Host OS: centos CentOS Linux release 7.4.1708, 4.14.15-1.el7.elrepo.x86_64
CNI and version: kube-router:v0.3.0
CRI and version: 18.06.2-ce

I understand it is not a bug. The container is killed when the entry point process finishes.

The process it’s alive, all is good (just the container won’t be able to use more mem than permitted, though). The main process, though, should keep an eye on it’s child’s, I think (as it should on any system, containers or not).

Am I missing something?

thanks for your reply

I have created the issue:

I think we should do something, I don’t know how to do it properly

  1. manual restart pod? ( our current way )
    // I’m not sure php-fpm can limit it’s max memory usage, I’ll try it though

  2. I think we should handle the child oom event too? I’d leave this design relate concern to the designer
    // maybe an annotation says it should be kill or not for child oom?

Another way to consider this (and why I think there is bug):

Imagine the scenario without containers involved. You have your php process and one child consumed enough mem to call the OOM in the kernel.

In that scenario the very same happens: if child process is killed and that’s it.

If the parent process can’t manage that situation correctly without containers, I don’t see why containers should behave differently :-/

What is the real problem you are seeing? I understand this is a symptom