Asking for help? Comment out what you need so we can get more information to help you!
Kubernetes version: 1.12.8-gke.10
Cloud being used: GKE
Host OS: (machine type) n1-standard-1
CNI and version: default
CRI and version: default
During node scaling, HPA couldn’t get CPU metric. At the same time,
kubectl top pod and
kubectl top node output is:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
For more details, I’ll show you the flow of my problem occurs:
- Suddenly many requests arrive at the GKE server. (Using testing tool)
- HPA detects current CPU usage above target CPU usage(50%), thus try pod scale up
Insufficient CPUwarning occurs when creating pods, thus GKE try node scalie up
- Soon the HPA fails to get the metric, and
kubectl top nodeor
kubectl top pod
doesn’t get a response.
- At this time one or more
OutOfcpupods are found, and several pods are in
- After node scale-up is complete and some time has elapsed (about a few minutes),
HPA starts to fetch the CPU metric successfully and try to scale up/down based on
- Same situation happens when node scale down.
This causes pod scaling to stop and raises some failures on responding to client’s requests. Is this normal?
I think HPA should get CPU metric(or other metrics) on running pods even during node scaling, to keep track of the optimal pod size at the moment. So when node scaling done, HPA create the necessary pods at once (rather than incrementally).
Can I make my cluster work like this?