General appetite for custom Restart Policies or an AlwaysUnlessBlocked restart policy

Hi folks,

I have a rather generic question involving restart policies. I am very new to the Kube codebase so please forgive me if this is already covered and I am naive to the coverage.

While working on my day job I am wanting to give users of our tooling the ability do use a restart policy that is something like AlwaysUnlessBlockedBySecurity.

The idea here is for Kube to be contextually aware that a the container itself did not necessarily exit with some bad status, but instead be aware that the while container may (or may not) have been fine… its legs were kicked out from underneath it via some compliance/security tooling and Kube should not try and reschedule the workload anymore.

This would be fundamentally different from cases where the application/workload/environment itself was the cause of the crash/failure… in those cases we would want the usual “Always” restart policy to behave normally, trying to reschedule the workload where ever possible.

In my limited knowledge this seems like a good opportunity to have a generic, non vendor specific restart policy mechanism that allows for AlwaysUnlessSecurityDenial or some similar concept.

I guess you could genericize things even further to have custom restart policies that say AlwaysUnless${MyDescribedCondition} to not limit the feature to a security denial context, but I am not sure what other applications would be obvious here.

Thanks in advance, and please let me know if this is already possible yet I missed it… or if there is upstream appetite for merging such a concept. I’d be happy to dedicate some cycles to a PR.

Cheers!

When looking at Pod Lifecycles, how is this different than OnFailure? If you set this to OnFailure and exit with a status of 0, things wont be restarted.

I believe it will say “Completed”, but it’s functionally the same isn’t it?

I think its fundamentally different if a container crashes / oopses on an exception in the code or has a resource allocation issue vs this pod was taken out / stopped via a security mechanism.

ie if a container fails because of a resource hiccup you definitely want the workload rescheduled.

If a container fails because some security mechanism stepped in and killed it on purpose because security policies state “this is not allowed here” I would not want Kube to reschedule the workload. Or at least let that be an option … it may not apply in all cases.

That make sense?

I understand.

A pod is the equivalent in Kubernetes to a program in a traditional OS. Any return code of non-zero is a failure, regardless of if the failure is because of a security mechanism.

You can get the return codes though if you want them. The example below I create a pod that fails with an exit code of 42 and I just pull the exit code from the pod object.

Does this help?

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: somepod
spec:
  containers:
    - name: will-fail
      image: bash
      command:
        - bash
        - -c
        - |
          exit 42
EOF
pod/somepod created

$ kubectl get pods somepod -o json | jq '.status.containerStatuses[].lastState.terminated.exitCode'
42

Another thing which is a detail that I didn’t mention (although I am not sure it matters a ton here) is in my case I am actually controlling the containerd gRPC response.

So I am actually responding with an err on the CreateContainer request in a way where I can control the gRPC response.

Im not sure that we should limit to my case though.

But still there is no way to handle such a case at the restart policy level eh?

You are definitely right. As-is, you could write an operator that watches for pod failures and the operator can do whatever task is necessary, such as delete/recreate the pod.

I honestly can’t say if it makes sense or not to extend the functionality for the restartPolicy in a way to account for error codes. Though it seems that we do have all the tools necessary to handle this particular use-case.

Ahh yeah, thank you. That compromise seems sane, I think it’s a reasonable alternative to a first class option.

If you need something to kindof springboard off of, I wrote this in an attempt to restart pods when configmaps were updated, before I learned that the Downward API auto updated volume files. What it’s good for is getting an idea of how to use bash for a lazy operator. Also, I don’t know if the role actually works, I found out recently that Docker for Desktop deploys with a role that just gives all service accounts full access.

Almost forgot to share the operator pattern documentation: Operator pattern | Kubernetes

If you detect a pod is not allowed because of security, don’t just crash it - delete it. Have a webhook which validates pods as they are created. The relevant workload controller will start getting errors as the pods fail to start. That’s where a human will be looking, anyway.

IMO restartPolicy is the wrong place to be enforcing this.

1 Like

That’s a pretty solid solution, that would just require invoking RuntimeService.RemoveContainer(…) while RuntimeService.CreateContainer(...) is happening, right?

@iphands I was thinking, is this something you could handle with an admission webhook to vet the pod spec before hand?

Throkin thanks that is a good idea. In this case though I am working in at the containerd interface level and I deny the request during the CreateContainer call when I see its not allowed… there I return err instead of passing the req on to containerd at all. So there is really no workload container to delete.

But you are talking pod. I guess I can invoke RemoveContainer on another container that is holding the pod namespace open? Is that what you are saying?

Sorry I’m fairly new to Kube internals. I’ve been hosting workloads on Kube, but im new to digging in.
Thanks again to both of ya.

I think containerd is the absolutely wrong layer to be doing security policy. It might play a role in the enforcement of policy, but I don’t see how it is superior to simply deleting pods at the kube API level

For sure. I am using it for enforcement not policy. Though it’s not the only place I am doing enforcement.

The solution I am working on should also work for platforms sitting above containerd that are not Kube.