Hi Team,
We are passing ExternalMetrics to HPA to scale our application workloads these days.
Is there any method which allows us to prevent a scale-in when ExternalMetrics endpoint like AWS Container Insights or Datadog went down and no longer sending data to cluster.
Recently, we experienced and outage with our external metrics endpoint(AWS ContainerInsights) and all the values of “current” in HPA became Zero triggering a scale-down of pods.
Screenshot: https://i.stack.imgur.com/VHUZ1.png
Cluster information:
Kubernetes version: 1.19
Cloud being used: AWS
Installation method: EKS Console
Host OS: Debian
CNI and version: vpc-cni v1.8.0-eksbuild.1
CRI and version: docker:19.3.13
Sample ExternalMetric.
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
name: sample-app-cpu
namespace: ns-stagingv1
spec:
name: sample-app-cpu
queries:
- id: sample_app_cpu
metricStat:
metric:
dimensions:
- name: PodName
value: sample-app
- name: ClusterName
value: EKS-Staging
- name: Namespace
value: ns-stagingv1
metricName: pod_cpu_utilization_over_pod_limit
namespace: ContainerInsights
period: 120
stat: Average
unit: Percent
returnData: true
resource:
resource: deployment
HPA
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: sample-app-cpu-hpa
namespace: ns-stagingv1
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: sample-app