Hi, I encounter a issue that kubelet doesn’t recognize child process oom
For the test, I found it’s relate to the child process, the main process killed does cause pod restart correctly.
Currently, I’m suppose it should be kill, if oom signal been found on any process.
What should I do, is this a bug?
How should I handle this issue?
Jun 13 13:28:28 online-node-81-113 kernel: php invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
Jun 13 13:28:28 online-node-81-113 kernel: php cpuset=4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc mems_allowed=0-1
Jun 13 13:28:28 online-node-81-113 kernel: CPU: 28 PID: 25345 Comm: php Not tainted 4.14.15-1.el7.elrepo.x86_64 #1
Jun 13 13:28:28 online-node-81-113 kernel: Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.3.4 11/08/2016
Jun 13 13:28:28 online-node-81-113 kernel: Call Trace:
Jun 13 13:28:28 online-node-81-113 kernel: dump_stack+0x63/0x85
Jun 13 13:28:28 online-node-81-113 kernel: dump_header+0x9f/0x234
Jun 13 13:28:28 online-node-81-113 kernel: ? mem_cgroup_scan_tasks+0x96/0xf0
Jun 13 13:28:28 online-node-81-113 kernel: oom_kill_process+0x21c/0x430
Jun 13 13:28:28 online-node-81-113 kernel: out_of_memory+0x114/0x4a0
Jun 13 13:28:28 online-node-81-113 kernel: mem_cgroup_out_of_memory+0x4b/0x80
Jun 13 13:28:28 online-node-81-113 kernel: mem_cgroup_oom_synchronize+0x2f9/0x320
Jun 13 13:28:28 online-node-81-113 kernel: ? get_mctgt_type_thp.isra.30+0xc0/0xc0
Jun 13 13:28:28 online-node-81-113 kernel: pagefault_out_of_memory+0x36/0x7c
Jun 13 13:28:28 online-node-81-113 kernel: mm_fault_error+0x65/0x152
Jun 13 13:28:28 online-node-81-113 kernel: __do_page_fault+0x456/0x4f0
Jun 13 13:28:28 online-node-81-113 kernel: do_page_fault+0x38/0x130
Jun 13 13:28:28 online-node-81-113 kernel: ? page_fault+0x36/0x60
Jun 13 13:28:28 online-node-81-113 kernel: page_fault+0x4c/0x60
Jun 13 13:28:28 online-node-81-113 kernel: RIP: 0033:0x7f4861d086bf
Jun 13 13:28:28 online-node-81-113 kernel: RSP: 002b:00007ffc7c9fa7d8 EFLAGS: 00010206
Jun 13 13:28:28 online-node-81-113 kernel: RAX: 00007f485b01b040 RBX: 0000000002800000 RCX: 00000000003a9208
Jun 13 13:28:28 online-node-81-113 kernel: RDX: 0000000002800000 RSI: 00007f48592d2000 RDI: 00007f485bad2000
Jun 13 13:28:28 online-node-81-113 kernel: RBP: 00007f4861f33420 R08: 0000000006440000 R09: 00007f485ec1b048
Jun 13 13:28:28 online-node-81-113 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00007f485881b040
Jun 13 13:28:28 online-node-81-113 kernel: R13: 00007f485ec1b040 R14: 0000000006400000 R15: 000055e9863099e0
Jun 13 13:28:28 online-node-81-113 kernel: Task in /docker/4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc killed as a result of limit of /docker/4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc
Jun 13 13:28:28 online-node-81-113 kernel: memory: usage 57344kB, limit 57344kB, failcnt 28
Jun 13 13:28:28 online-node-81-113 kernel: memory+swap: usage 57344kB, limit 114688kB, failcnt 0
Jun 13 13:28:28 online-node-81-113 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jun 13 13:28:28 online-node-81-113 kernel: Memory cgroup stats for /docker/4ed8f6475d229008a46f46d7fd1e33d7ff591f176c293cdfaa4ec240619ecddc: cache:0KB rss:57344KB rss_huge:49152KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:57344KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 13 13:28:28 online-node-81-113 kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 13 13:28:28 online-node-81-113 kernel: [25345] 0 25345 42688 17068 63 3 0 0 php
Jun 13 13:28:28 online-node-81-113 kernel: Memory cgroup out of memory: Kill process 25345 (php) score 1195 or sacrifice child
Jun 13 13:28:28 online-node-81-113 kernel: Killed process 25345 (php) total-vm:170752kB, anon-rss:57176kB, file-rss:11096kB, shmem-rss:0kB
Jun 13 13:28:28 online-node-81-113 kernel: oom_reaper: reaped process 25345 (php), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
fore mor example oom https://pastebin.com/ECdhtg2z
Cluster information:
Kubernetes version: v1.14.1
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: kubeadm
Host OS: centos CentOS Linux release 7.4.1708, 4.14.15-1.el7.elrepo.x86_64
CNI and version: kube-router:v0.3.0
CRI and version: 18.06.2-ce