Guaranteed Pod much slower than nearly identical, burstable pod?

I have a pod that does nothing but allocating a large array (68GiB), writing to each array element a value, and then sleeping endlessly. The array is allocated with malloc (the pod’s command source code is written in C), so I expect the working set of the pod to incrementally grow to 68 GiB as more and more array elements are written.

I create the pod on a cluster node where the kubelet is configured with static policies for CPU and memory manager, and best effort policy for the topology manager. No other pods are running on the node.

If I run the pod as a guaranteed pod requesting 20 CPUs and 69 GiB of memory, it runs incredibly slowly and its working set never reaches even 1 GiB (it was taking so long that I gave up).

However, if I run the pod as a burstable pod (by changing the limit from 20 to 21 CPUs), it runs way faster and I can quickly see its working set reaching 68 GiB.

Even stranger, when running the pod as guaranteed, I recorded which CPUs it was exclusively allocated by the kubelet, and from which NUMA nodes it was allowed to get memory from the kubelet (via cgroups and cpusets). If I then manually run the same code running in the pod as a normal process on the node, and manually create cgroups and cpusets to restrict it to the same CPUs and NUMA nodes, it runs as fast as in the burstable case, and can again see that its working set quickly reaches 68 GiB.

What might be causing this issue?

Cluster information:

Kubernetes version: v1.33.0
Cloud being used: bare metal
Installation method: manual (standalone kubelet)
Host OS: Ubuntu 22.04.4 LTS, Linux kernel 6.6.0
CNI and version: CRIO 1.0.0
CRI and version: CRI-O 1.34.0