### What happened?
Some of our developers were running a workload that regula…rly does `kubectl exec` into pods running in an EKS cluster with a command that collects some metrics, then exits.
We noticed an increase of memory usage by the containerd process on the host running the pod into which the `kubectl exec` connected. Turning on the debug socket of containerd and comparing the output of the `ctr pprof goroutines` command executed multiple times over the span of several minutes showed that goroutines were leaking inside containerd. All leaked goroutines had a stack trace like this:
```
goroutine 82060 [chan receive]:
k8s.io/apiserver/pkg/util/wsstream.(*Conn).Open(0xc006ba0af0, {0x55fe91acc880?, 0xc0055d8ee0}, 0xc00a1bcb00)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/vendor/k8s.io/apiserver/pkg/util/wsstream/conn.go:185 +0xc5
github.com/containerd/containerd/pkg/cri/streaming/remotecommand.createWebSocketStreams(0xc00a1bcb00?, {0x55fe91acc880, 0xc0055d8ee0}, 0xc0085c17f4, 0xd18c2e28000)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/remotecommand/websocket.go:114 +0x32c
github.com/containerd/containerd/pkg/cri/streaming/remotecommand.createStreams(0xc00a1bcb00, {0x55fe91acc880, 0xc0055d8ee0}, 0xc0085c1690?, {0x55fe926be380, 0x4, 0x4}, 0x203001?, 0xc006d8bb00?)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/remotecommand/httpstream.go:126 +0x9b
github.com/containerd/containerd/pkg/cri/streaming/remotecommand.ServeExec({0x55fe91acc880?, 0xc0055d8ee0?}, 0x6?, {0x55fe91ab5af8, 0xc0006b0ed0}, {0x0, 0x0}, {0x0, 0x0}, {0xc006b918c0, ...}, ...)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/remotecommand/exec.go:61 +0xc5
github.com/containerd/containerd/pkg/cri/streaming.(*server).serveExec(0xc00040a090, 0xc004cfa630, 0xc006be77a0)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/pkg/cri/streaming/server.go:302 +0x19e
github.com/emicklei/go-restful.(*Container).dispatch(0xc00040a1b0, {0x55fe91acc880, 0xc0055d8ee0}, 0xc00a1bcb00)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/vendor/github.com/emicklei/go-restful/container.go:288 +0x8c8
net/http.HandlerFunc.ServeHTTP(0x0?, {0x55fe91acc880?, 0xc0055d8ee0?}, 0x0?)
/usr/lib/golang/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x72?, {0x55fe91acc880, 0xc0055d8ee0}, 0xc00a1bcb00)
/usr/lib/golang/src/net/http/server.go:2462 +0x149
github.com/emicklei/go-restful.(*Container).ServeHTTP(0x0?, {0x55fe91acc880?, 0xc0055d8ee0?}, 0xc0074c0000?)
/builddir/build/BUILD/containerd-1.6.19-1.amzn2.0.1/src/github.com/containerd/containerd/vendor/github.com/emicklei/go-restful/container.go:303 +0x27
net/http.serverHandler.ServeHTTP({0xc004cfa510?}, {0x55fe91acc880, 0xc0055d8ee0}, 0xc00a1bcb00)
/usr/lib/golang/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc0054b1a40, {0x55fe91acdbd8, 0xc003dc2420})
/usr/lib/golang/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/lib/golang/src/net/http/server.go:3071 +0x4db
```
The number of leaked goroutines can be checked with:
```console
$ ctr pprof goroutines | grep createWebSocketStreams | wc -l
```
This looks like the websocket connection from kubelet to containerd is somehow stuck, however the `kubectl exec` command works just fine.
The version of kubectl used by the workload is `v1.30.2`, while the cluster itself is running AWS EKS `v1.26.15-eks-db838b0` (kubelet: `v1.26.4-eks-0a21954`) and we could reproduce this as well on a cluster running `v1.28.11-eks-db838b0` (kubelet: `v1.28.11-eks-1552ad0`).
After checking some versions, it looks like this behavior started with kubectl `v1.30.0` and was not present in `v1.29.4` yet.
We are aware that this is an unsupported version discrepancy, but it shouldn't be possible for a normal user of the Kubernetes API to trigger a memory leak in the container runtime by simply running `kubectl exec` repeatedly against a pod with an incompatible version of kubectl.
### What did you expect to happen?
Neither kubectl nor any other Kubernetes API client should be able to cause a memory leak in the container runtime of the host running the affected pod.
### How can we reproduce it (as minimally and precisely as possible)?
1. Start an EKS cluster v1.26 or v1.28 (probably 1.27 or earlier versions and plain Kubernetes clusters are affected as well, but we couldn't check this) and ensure some nodes are up and running and some pods are running as well.
2. SSH into a node and enable the containerd debug socket by appending the following to `/etc/containerd/config.toml`:
```toml
[debug]
address = "/run/containerd/debug.sock"
```
Then restart containerd with `systemctl restart containerd`, then run:
```console
$ watch "ctr pprof goroutines | grep createWebSocketStreams | wc -l"
```
4. In a second terminal, run `kubectl exec <pod name> -- true` or similar repeatedly against any pod running on the node on which you SSH'd.
5. Notice the number reported by the `watch` command increase for every execution of `kubectl`.
### Anything else we need to know?
We were unsure where the right place to report this issue would be. We believe containerd should have timeout on this goroutine to abort it if something gets stuck. On the other hand, this can only be triggered with a newer certain version of kubectl, which also hints a bug in the Kubernetes API server or in kubelet.
### Kubernetes version
<details>
1.26 cluster:
```console
$ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.15-eks-db838b0
WARNING: version difference between client (1.30) and server (1.26) exceeds the supported minor version skew of +/-1
```
1.28 cluster:
```console
$ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.11-eks-db838b0
WARNING: version difference between client (1.30) and server (1.28) exceeds the supported minor version skew of +/-1
```
</details>
### Cloud provider
<details>
AWS EKS
</details>
### OS version
<details>
1.26 cluster:
```console
$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
$ uname -a
Linux ip-a-b-c-d.us-east-2.compute.internal 5.10.179-168.710.amzn2.x86_64 #1 SMP Mon May 22 23:10:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
```
1.28 cluster:
```console
$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
$ uname -a
Linux ip-a-b-c-d.us-east-1.compute.internal 5.10.220-209.869.amzn2.x86_64 #1 SMP Wed Jul 17 15:10:20 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
```
</details>
### Install tools
<details>
</details>
### Container runtime (CRI) and version (if applicable)
<details>
Verified on containerd `1.6.19` and `1.7.11`
</details>
### Related plugins (CNI, CSI, ...) and versions (if applicable)
<details>
</details>