Yeah so this was an OOM because the firing up of the bash (as a sub-process) tipped the pod over its total allocation and the node terminated the sub-process. I watched dmesg with
sudo dmesg -wH
and then varied the resource limits on our dev cluster and observed that
- if the main process exceeds the limits.memory. the pod is restarted
- if a subprocess pushes the resources over the limits.memory the subprocess is killed but the pod remains running.
- if the limits.memory is high enough the pod is not restarted and the sub process executes ok
Thanks again ![]()