How we shall prevent HPA scale-in when the external metrics is missing

Hi Team,

We are passing ExternalMetrics to HPA to scale our application workloads these days.

Is there any method which allows us to prevent a scale-in when ExternalMetrics endpoint like AWS Container Insights or Datadog went down and no longer sending data to cluster.

Recently, we experienced and outage with our external metrics endpoint(AWS ContainerInsights) and all the values of “current” in HPA became Zero triggering a scale-down of pods.

Screenshot: https://i.stack.imgur.com/VHUZ1.png

Cluster information:

Kubernetes version: 1.19
Cloud being used: AWS
Installation method: EKS Console
Host OS: Debian
CNI and version: vpc-cni v1.8.0-eksbuild.1
CRI and version: docker:19.3.13

Sample ExternalMetric.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: sample-app-cpu
  namespace: ns-stagingv1
spec:
  name: sample-app-cpu
  queries:
    - id: sample_app_cpu
      metricStat:
        metric:
          dimensions:
            - name: PodName
              value: sample-app
            - name: ClusterName
              value: EKS-Staging
            - name: Namespace
              value: ns-stagingv1
          metricName: pod_cpu_utilization_over_pod_limit
          namespace: ContainerInsights
        period: 120
        stat: Average
        unit: Percent
      returnData: true
  resource:
    resource: deployment

HPA

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: sample-app-cpu-hpa
  namespace: ns-stagingv1
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sample-app