Subprocess Killed with a 137 error

Yeah so this was an OOM because the firing up of the bash (as a sub-process) tipped the pod over its total allocation and the node terminated the sub-process. I watched dmesg with

sudo dmesg -wH

and then varied the resource limits on our dev cluster and observed that

  • if the main process exceeds the limits.memory. the pod is restarted
  • if a subprocess pushes the resources over the limits.memory the subprocess is killed but the pod remains running.
  • if the limits.memory is high enough the pod is not restarted and the sub process executes ok

Thanks again :slight_smile: