How does HPA make sense

So I think, I have a misunderstanding of the Horizontal Pod Autoscaler.
Or lets say, I think I dont yet understand the exact/most common reason for it to exist.

Pods are running on Nodes. Nodes are VMs with lets say immutable hardware, e.g. 4CPU, 16RAM.

When my app receives too much traffic, the HPA would scale up by replicating my app in form of another Pod. The only case where HPA would make sense now, is, when my app has a “fix” number of requests it can handle, right?

So my thoughts are these:

If the app uses threading to make use of concurrency, it would be useless to enable HPA, as there can never be handled more than 4 requests at a time (with 4 CPU cores). When HPA scales out, there are not more CPU-cores available, thus it is even worse because it consumes resources for the process of scaling out, while 2 pods can also only use 4 cores altogether.

So what use-cases / benefits are there (except redundancy in case of error of one Pod)?

The prometheus-adapter docs state:

…except you’re not sure that just one instance will handle all the traffic once it goes viral. Thankfully, you’ve got Kubernetes

And then refer to the HPA. It sounds like adding another pod would easily fix it.

Arent the final reasons for a scale-out always due to resource-shortage?
Too many requests => too little CPU/RAM

I currently have a cluster for only one kind of application.
I could imagine that HPA It is useful, if there is a very powerful underlying VM with e.g. 36 CPU cores and 128GB RAM, which works like a “storage” of computing-power and if the app receives too much traffic, HPA adds a Pod on the node, which receives some of the left ofer compute-power on the vm (each pod could have resource-limits).

But then, there would always be “unused” CPU/RAM-resources which would be costly…

As this is not a real Question, but more like me throwing my thoughts out there, demanding for information, every food for thought is appreciated.