I’m currently tasked with setting up some applications that use a lot of computing resources, and am setting up a K8S deployment that reserves a single CPU socket per pod using some rather expensive hardware:
Each of the 4 sockets will be populated with a Skylake 4116 12-Core CPU, totaling 24 logical CPUs per core due to Hyperthreading.
I would like to accomplish the following:
Run one instance of my “super greedy app” per CPU socket (i.e. reserve 95% of the computing ability of an entire CPU socket for my app; leaving the remaining 95% for other things like Flannel, CoreDNS, etc.). All 24 logical cores assigned to the app must be on the same physical CPU/socket.
Ensure that, in order to maximize cache hits, and avoid inter-CPU overhead, a pod never uses logical cores from different CPUs/sockets.
If It’s not trivial to reserve 95% of the compute abilities of 24 logical cores from the same CPU/socket, it’s fine to just reserve 23 cores from the same CPU/socket, and leave the last remaining core for Flannel, CoreDNS, etc.
Question: is it possible to, in K8S, create “CPU groups” that would allow me to accomplish the above-noted goals? There appears to be an Intel plugin for just this purpose, but we may try the same project/approach in the future with a ARM-based board, and I need to make sure this is a vendor/platform-independent solution.
The API, as far as I know, does not expose a way to reserve same/different physical cores, but if out of 24 cores you reserve 23, then there is no way that 22 of them aren’t running on the same physical cores another thread of the same app is using.
Does something like this would be enough for your use case?
This does help address one of the use cases (thank you), but there are some cases where I need to reserve specific cores for specific apps, which would require additional capabilities.
Atthis point in k8s, I think your best option would be to run a privileged container (init container or not) and program cgroup interfaces yourself. We do not have the sort of high-powered APIs for sub-machine resource pinning that you are asking for (yet). This has been a topic for a long time and while we have some work here (as linked) it’s not as fine-grained as you seem to need.