DNS lookup failure. (AWS EKS)

Cluster information:

Kubernetes version: 1.24
Cloud being used: yes, AWS EKS
Installation method: terraform
Host OS: amazon linux
CNI and version: v1.11.4-eksbuild.1

Description:
i have a working EKS cluster in AWS.
i’m using metricbeat to gather data from this cluster and send it to elasticsearch server (which is on prem). I’m getting some data in my elasticsearch, but not everything.

so i have deployed an app (default namespace) and exposed the prometheus endpoint via port 9090.
i’ve updated the metricbeat configuration to use the prometheus module so that metricbeat is able to gather scrape data from the endpoint…

however the logs are not coming in, the metricbeat logs are giving these kind of error’s:

{"log.level":"warn","@timestamp":"2023-03-01T13:14:19.142Z","log.logger":"transport","log.origin":{"file.name":"transport/tcp.go","file.line":52},"message":"DNS lookup failure \"node\": lookup node on 10.12.0.10:53: no such host","service.name":"metricbeat","ecs.version":"1.6.0"}

and this has me puzzled, I’m not sure what is going wrong here.
Initially i thought that “node” is being taken literally in the metricbeat configuration, however as far as i understand is that this isn’t the case.

I’ve done various checks to see if i have a DNS issue in my cluster, but it seems this is not the case.
I’ve followed:

  • The eks dns troubleshooter (here)
    ** result was that coredns is working correctly
  • an aws troubleshooting guide (here)
    ** also here the result was that coredns is working correctly.

then i thought this might be due to the fact that i didn’t create a service for metricbeat to use. so I created following service:

apiVersion: v1
kind: Service
metadata:
  name: testservice
spec:
  selector:
    app: myapp
  ports:
    - port: 9090
  type: ClusterIP

how ever the error persisted

Some details:

  • the test app is deployed in the default namespace
  • Metricbeat is deployed in the kube-system namespace
  • Coredns is deployed in the kube-system namespace
  • i’m using the prometheus module for metricbeat to gather data:
- module: prometheus
  period: 10s
  hosts: ["node:9090"]
  metrics_path: /metrics

sorry for the wall of text, but i tried to be as complete as possible.

Thanks for any help/advise!

Is metricbeat trying to lookup metricbeat as the hostname? That doesn’t seem right to me. Could you share your pod spec and maybe the related configmap for metricbeat?

Just take care replace anything sensitive with <redacted>, like passwords and identifying names.

Thanks for the reply protosam. this is the configmap of metricbeat.

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-daemonset-config
  namespace: kube-system
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    metricbeat.config.modules:
      # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
      # Reload module configs as they change:
      reload.enabled: false

    metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          node: ${NODE_NAME}
          # In large Kubernetes clusters consider setting unique to false
          # to avoid using the leader election strategy and
          # instead run a dedicated Metricbeat instance using a Deployment in addition to the DaemonSet
          unique: true
          templates:
            - config:
                - module: kubernetes
                  hosts: ["kube-state-metrics:8080"]
                  period: 10s
                  add_metadata: true
                  metricsets:
                    - state_node
                    - state_deployment
                    - state_daemonset
                    - state_replicaset
                    - state_pod
                    - state_container
                    - state_job
                    - state_cronjob
                    - state_resourcequota
                    - state_statefulset
                    - state_service
                    - state_persistentvolume
                    - state_persistentvolumeclaim
                    - state_storageclass
                  # If `https` is used to access `kube-state-metrics`, uncomment following settings:
                  # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  # ssl.certificate_authorities:
                  #   - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
                - module: kubernetes
                  metricsets:
                    - apiserver
                  hosts: ["https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.certificate_authorities:
                    - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                  period: 30s
                # Uncomment this to get k8s events:
                #- module: kubernetes
                #  metricsets:
                #    - event
                - module: prometheus
                  period: 10s
                  hosts: ["${NODE_NAME}:9090"]
                  metrics_path: /metrics
        # To enable hints based autodiscover uncomment this:
        #- type: kubernetes
        #  node: ${NODE_NAME}
        #  hints.enabled: true

    processors:
      - add_cloud_metadata:
      - add_fields:
          target: orchestrator.cluster
          fields:
            name: cluster_name

    cloud.id: ${ELASTIC_CLOUD_ID}
    cloud.auth: ${ELASTIC_CLOUD_AUTH}

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      username: ${ELASTICSEARCH_USERNAME}
      password: ${ELASTICSEARCH_PASSWORD}
      ssl.protocol: "https"
      ssl.certificate_authorities: ["/somepath/elasticsearch-ca.crt"]
      ssl.certificate: "/somepath/elasticsearch.crt"
      ssl.key: "/somepath/elasticsearch.key"
    setup.kibana:
      host: https://a_url:5601

Please note, the complete yaml can be found here:. i’m using the standard yaml provided by Elastic

https://raw.githubusercontent.com/elastic/beats/8.5/deploy/kubernetes/metricbeat-kubernetes.yaml

related to the pod spec, can you elaborate what you mean?

thanks for any help!

update 1:

  • even when i hard code the dns name of the service i created in the metricbeat config, metricbeat is not able to resolve the hostname.
  • so logged in manually to the metricbeat container and do a curl to the service i created, it fails.
  • when i add the cluster ip and the hostname of the service into /etc/hosts (in the metericbeat container), then i’m able to curl the url testservice:9090/metrics (and i see the expected data)

the service yaml is:

apiVersion: v1
kind: Service
metadata:
  name: testservice
spec:
  selector:
    app: myapp
  ports:
    - port: 9090
  type: ClusterIP

so this leads me to believe that there is a DNS issue.

for reference adding the /etc/resolv.conf file of the metricbeat container:

search kube-system.svc.cluster.local svc.cluster.local cluster.local <redacted>.internal eu-central-1.compute.internal
nameserver 10.12.0.10
options ndots:5

so i figured is out, since the app and metricbeat where in a different namespaces, i used the wrong url to connect. i connected to:

testservice

but instead i should have connected to

testservice.default
1 Like

Sorry to leave you hanging last week. Glad to see you figured it out. :slightly_smiling_face:

Your solution makes sense to me if this is happening across namespaces.

If everything was inside the default namespace, I don’t think you need to specify that.

On this note, if you do this in a production setup, take advantage of partitioning things logically between namespaces. It will make managing things easier long-term.

np :slight_smile: thanks for the reply and advice!