Deleting over 300,000 secrets fast

Today I stumbled across a cluster that had over 300,000 secrets left over from tokens being rotated.

The Symptoms

An ingress-nginx controller deployment was failing readiness checks with 500 errors; the pods were getting stuck in a crash loops.

Killing the pods caused them to be rescheduled, where they would just start working. The logs were not very helpful on startup.

Reviewing the cpu and memory usage, my next best guess was that it was being scheduled somewhere that it was getting choked on resources. It is one of a handful of pods that still need requests and limits set.

Now the pods are staying up longer, but still failing the readiness check. The logs also didn’t seem very telling (see references below). When a readiness check fails, the container gets a kill signal. The hint was after the first message about this signal:

E1123 18:51:38.291628       7 store.go:177] timed out waiting for caches to sync

At this point it seems that the pod is just slow. Guaranteeing the resources it needed did not work; so the next thought was that it was waiting on something that took to long.

The obvious things it pulls are configmaps and secrets. Running kubectl get cm -A worked fine. Then trying kubectl get secrets -A was hanging.

I was able to see the secrets in the ingress-nginx namespace, so the problem isn’t with all secrets. This was the test I devised to see what hangs:

for ns in $(kubectl get namespaces); do
  echo trying ${ns} ...;
  kubectl -n ${ns} get secrets;
done

This halted at the kube-system. After which I dumped a copy of the contents to inspect, this took more than 10 minutes to complete and yielded a 1.8G file.

kubectl -n kube-system get secrets -oyaml > /tmp/kube-system-secrets.yaml

Now it was possible to start reviewing it yq '.items[] | .metadata.name'. It turns out there were many abandoned service account tokens. Over 300,000 of them. This was slowing down nginx-ingress.

Challenges

The obvious approach is a time consuming nightmare. Suppose you pre-generate kubectl commands like this. These are blocking processes and will take a very long time.

kubectl -n kube-system delete secret <NAME>

Batching the deletes like this aren’t any better either. kubectl wants to do get requests which significantly slows down the deletion process.

kubectl -n kube-system delete secret <NAME> <NAME> <NAME> <NAME> <NAME> <NAME> <NAME>

Why not run these in parallel in the background and wait? Or perhaps do what this guy did?

kubectl -n kube-system delete secret <NAMES-HERE> &
...
wait

It’s slow… and assuming you have reasonable RBAC settings, you get throttled by the API or even have auth problems if you are using an AWS role.

Resolution

I manged to delete over 380,000 secrets in less than 2 hours. For this I ended up having to pull down some break-glass credentials that used cluster-admin to ensure I was a cluster admin, my AWS IAM role was in my way; you can make a service account that uses the cluster-admin ClusterRole.

Earlier I made a file with all of the secrets in the namespace. I filtered for all of the relevant secrets that were older than the oldest pod using the related service account.

yq '.items[] | select(.metadata.name | contains("<REDACTED-SERVICE-ACCOUNT-NAME>")) | select(.metadata.creationTimestamp < "2022-10-28T03:48:23Z") | ([.metadata.name, .metadata.creationTimestamp ] | join(" "))' /tmp/kube-system-secrets.yaml > /tmp/kube-system-deletable-secrets.yaml

With the filtered list, I generated a bash script with awk to delete the secrets.

While my first deletion script was running, I was looking for a cause in the audit logs. This is where I saw that kubectl was doing get and delete. This was extra work being done, I believed it could be done faster.

The final iteration took 2 hours to delete over 380,000 secrets. This is the awk script that generated my bash script to iteratively queue 1000 backgrounded raw delete requests.

awk 'BEGIN {
  startAfter="<EMPTY-OR-NAME-OF-LAST-DELETED>";
  cont=0;
  ct=0;
  remaining=403140;

  waitEvery=1000;
  waitAfter=waitEvery;

  print "#!/bin/bash";
  print "set -m";
  print "set -e";
} {
  ct++;
  remaining--;

  if(waitAfter < 1) {
    waitAfter=waitEvery;
    print "echo reached " waitEvery " requests... waiting for background processes to finish";
    print "echo done=" ct " remaining=" remaining;
    print "wait";
    print "echo continuing...";
  }

  if (cont > 0 || startAfter == "") {
    waitAfter--;
    print "kubectl delete --raw /api/v1/namespaces/kube-system/secrets/" $1 " >> /tmp/kube-system-deletable.log &";
  }

  if ($1 == startAfter) {
    cont=1;
  }
} END {
  print "wait";
  print "echo done....";
}' /tmp/kube-system-deletable-secrets.yaml > /tmp/kube-system-delete-secrets.sh

The final result was a script that could be ran: bash /tmp/kube-system-delete-secrets.sh

Further Thoughts

This only addresses an extreme symptom and does nothing to prevent it moving forward. The idea should apply to any object type that may need pruning.

Understanding why the resources exists requires knowing what is causing them to be generated. Audit logs can help you identify the cause for that. Now that my symptom has a treatment, that’s what I’m doing next.

Hopefully someone finds this helpful. The title says “fast” and contextually, I was able to reduce what was about to amount to 40 hours of waiting to less than 5 hours.

References

Ingress Nginx Controller Logs

$ kubectl -n ingress-nginx logs ingress-nginx-controller-28d4aaa96e-tlz4e
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.4.0
  Build:         50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

W1123 18:50:48.605225       7 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1123 18:50:48.605341       7 main.go:209] "Creating API client" host="https://<REDACTED-IP>:443"
I1123 18:50:48.613579       7 main.go:253] "Running in Kubernetes cluster" major="1" minor="21" git="v1.21.14" state="clean" commit="0f77da5bd4809927e15d1658fb4aa8f13ad890a5" platform="linux/amd64"
I1123 18:50:48.784302       7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I1123 18:50:48.800315       7 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I1123 18:50:48.811468       7 nginx.go:260] "Starting NGINX Ingress controller"
I1123 18:50:48.813881       7 store.go:521] "ignoring ingressclass as the spec.controller is not the same of this ingress" ingressclass="nginx-internal"
I1123 18:50:48.826469       7 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"9ae00822-3145-4ef6-8b1d-8b8e36fd8598", APIVersion:"v1", ResourceVersion:"965776375", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
I1123 18:51:38.291443       7 sigterm.go:36] "Received SIGTERM, shutting down"
I1123 18:51:38.291478       7 nginx.go:379] "Shutting down controller queues"
E1123 18:51:38.291628       7 store.go:177] timed out waiting for caches to sync
I1123 18:51:38.291660       7 trace.go:205] Trace[1423673000]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.25.2/tools/cache/reflector.go:169 (23-Nov-2022 18:50:48.811) (total time: 49480ms):
Trace[1423673000]: [49.480004904s] [49.480004904s] END
I1123 18:51:38.312622       7 nginx.go:387] "Stopping admission controller"
I1123 18:51:38.312643       7 nginx.go:395] "Stopping NGINX process"
2022/11/23 18:51:38 [notice] 51#51: signal process started
2022/11/23 18:51:38 [error] 51#51: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory)
nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory)
W1123 18:51:38.316740       7 sigterm.go:40] Error during shutdown: exit status 1
I1123 18:51:38.316754       7 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds
E1123 18:51:39.292724       7 store.go:203] timed out waiting for caches to sync
I1123 18:51:39.292778       7 nginx.go:303] "Starting NGINX process"
I1123 18:51:39.292805       7 leaderelection.go:248] attempting to acquire leader lease ingress-nginx/ingress-nginx-external...
E1123 18:51:39.293234       7 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:initial-sync,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}"
I1123 18:51:39.293258       7 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
E1123 18:51:39.293283       7 nginx.go:326] "Error listening for TLS connections" err="http: Server closed"
I1123 18:51:39.306449       7 status.go:84] "New leader elected" identity="ingress-nginx-controller-28d4aaa96e-aslk"
I1123 18:51:48.317211       7 sigterm.go:47] "Exiting" code=1