DNS lookup failure. (AWS EKS)

Cluster information:

Kubernetes version: 1.24
Cloud being used: yes, AWS EKS
Installation method: terraform
Host OS: amazon linux
CNI and version: v1.11.4-eksbuild.1

i have a working EKS cluster in AWS.
i’m using metricbeat to gather data from this cluster and send it to elasticsearch server (which is on prem). I’m getting some data in my elasticsearch, but not everything.

so i have deployed an app (default namespace) and exposed the prometheus endpoint via port 9090.
i’ve updated the metricbeat configuration to use the prometheus module so that metricbeat is able to gather scrape data from the endpoint…

however the logs are not coming in, the metricbeat logs are giving these kind of error’s:

{"log.level":"warn","@timestamp":"2023-03-01T13:14:19.142Z","log.logger":"transport","log.origin":{"file.name":"transport/tcp.go","file.line":52},"message":"DNS lookup failure \"node\": lookup node on no such host","service.name":"metricbeat","ecs.version":"1.6.0"}

and this has me puzzled, I’m not sure what is going wrong here.
Initially i thought that “node” is being taken literally in the metricbeat configuration, however as far as i understand is that this isn’t the case.

I’ve done various checks to see if i have a DNS issue in my cluster, but it seems this is not the case.
I’ve followed:

  • The eks dns troubleshooter (here)
    ** result was that coredns is working correctly
  • an aws troubleshooting guide (here)
    ** also here the result was that coredns is working correctly.

then i thought this might be due to the fact that i didn’t create a service for metricbeat to use. so I created following service:

apiVersion: v1
kind: Service
  name: testservice
    app: myapp
    - port: 9090
  type: ClusterIP

how ever the error persisted

Some details:

  • the test app is deployed in the default namespace
  • Metricbeat is deployed in the kube-system namespace
  • Coredns is deployed in the kube-system namespace
  • i’m using the prometheus module for metricbeat to gather data:
- module: prometheus
  period: 10s
  hosts: ["node:9090"]
  metrics_path: /metrics

sorry for the wall of text, but i tried to be as complete as possible.

Thanks for any help/advise!

Is metricbeat trying to lookup metricbeat as the hostname? That doesn’t seem right to me. Could you share your pod spec and maybe the related configmap for metricbeat?

Just take care replace anything sensitive with <redacted>, like passwords and identifying names.

Thanks for the reply protosam. this is the configmap of metricbeat.

apiVersion: v1
kind: ConfigMap
  name: metricbeat-daemonset-config
  namespace: kube-system
    k8s-app: metricbeat
  metricbeat.yml: |-
      # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
      # Reload module configs as they change:
      reload.enabled: false

        - type: kubernetes
          scope: cluster
          node: ${NODE_NAME}
          # In large Kubernetes clusters consider setting unique to false
          # to avoid using the leader election strategy and
          # instead run a dedicated Metricbeat instance using a Deployment in addition to the DaemonSet
          unique: true
            - config:
                - module: kubernetes
                  hosts: ["kube-state-metrics:8080"]
                  period: 10s
                  add_metadata: true
                    - state_node
                    - state_deployment
                    - state_daemonset
                    - state_replicaset
                    - state_pod
                    - state_container
                    - state_job
                    - state_cronjob
                    - state_resourcequota
                    - state_statefulset
                    - state_service
                    - state_persistentvolume
                    - state_persistentvolumeclaim
                    - state_storageclass
                  # If `https` is used to access `kube-state-metrics`, uncomment following settings:
                  # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  # ssl.certificate_authorities:
                  #   - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
                - module: kubernetes
                    - apiserver
                  hosts: ["https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                    - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                  period: 30s
                # Uncomment this to get k8s events:
                #- module: kubernetes
                #  metricsets:
                #    - event
                - module: prometheus
                  period: 10s
                  hosts: ["${NODE_NAME}:9090"]
                  metrics_path: /metrics
        # To enable hints based autodiscover uncomment this:
        #- type: kubernetes
        #  node: ${NODE_NAME}
        #  hints.enabled: true

      - add_cloud_metadata:
      - add_fields:
          target: orchestrator.cluster
            name: cluster_name

    cloud.id: ${ELASTIC_CLOUD_ID}
    cloud.auth: ${ELASTIC_CLOUD_AUTH}

      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      ssl.protocol: "https"
      ssl.certificate_authorities: ["/somepath/elasticsearch-ca.crt"]
      ssl.certificate: "/somepath/elasticsearch.crt"
      ssl.key: "/somepath/elasticsearch.key"
      host: https://a_url:5601

Please note, the complete yaml can be found here:. i’m using the standard yaml provided by Elastic


related to the pod spec, can you elaborate what you mean?

thanks for any help!

update 1:

  • even when i hard code the dns name of the service i created in the metricbeat config, metricbeat is not able to resolve the hostname.
  • so logged in manually to the metricbeat container and do a curl to the service i created, it fails.
  • when i add the cluster ip and the hostname of the service into /etc/hosts (in the metericbeat container), then i’m able to curl the url testservice:9090/metrics (and i see the expected data)

the service yaml is:

apiVersion: v1
kind: Service
  name: testservice
    app: myapp
    - port: 9090
  type: ClusterIP

so this leads me to believe that there is a DNS issue.

for reference adding the /etc/resolv.conf file of the metricbeat container:

search kube-system.svc.cluster.local svc.cluster.local cluster.local <redacted>.internal eu-central-1.compute.internal
options ndots:5

so i figured is out, since the app and metricbeat where in a different namespaces, i used the wrong url to connect. i connected to:


but instead i should have connected to

1 Like

Sorry to leave you hanging last week. Glad to see you figured it out. :slightly_smiling_face:

Your solution makes sense to me if this is happening across namespaces.

If everything was inside the default namespace, I don’t think you need to specify that.

On this note, if you do this in a production setup, take advantage of partitioning things logically between namespaces. It will make managing things easier long-term.

np :slight_smile: thanks for the reply and advice!