Cluster information:
Kubernetes version: 1.24
Cloud being used: yes, AWS EKS
Installation method: terraform
Host OS: amazon linux
CNI and version: v1.11.4-eksbuild.1
Description:
i have a working EKS cluster in AWS.
i’m using metricbeat to gather data from this cluster and send it to elasticsearch server (which is on prem). I’m getting some data in my elasticsearch, but not everything.
so i have deployed an app (default namespace) and exposed the prometheus endpoint via port 9090.
i’ve updated the metricbeat configuration to use the prometheus module so that metricbeat is able to gather scrape data from the endpoint…
however the logs are not coming in, the metricbeat logs are giving these kind of error’s:
{"log.level":"warn","@timestamp":"2023-03-01T13:14:19.142Z","log.logger":"transport","log.origin":{"file.name":"transport/tcp.go","file.line":52},"message":"DNS lookup failure \"node\": lookup node on 10.12.0.10:53: no such host","service.name":"metricbeat","ecs.version":"1.6.0"}
and this has me puzzled, I’m not sure what is going wrong here.
Initially i thought that “node” is being taken literally in the metricbeat configuration, however as far as i understand is that this isn’t the case.
I’ve done various checks to see if i have a DNS issue in my cluster, but it seems this is not the case.
I’ve followed:
- The eks dns troubleshooter (here)
** result was that coredns is working correctly - an aws troubleshooting guide (here)
** also here the result was that coredns is working correctly.
then i thought this might be due to the fact that i didn’t create a service for metricbeat to use. so I created following service:
apiVersion: v1
kind: Service
metadata:
name: testservice
spec:
selector:
app: myapp
ports:
- port: 9090
type: ClusterIP
how ever the error persisted
Some details:
- the test app is deployed in the default namespace
- Metricbeat is deployed in the kube-system namespace
- Coredns is deployed in the kube-system namespace
- i’m using the prometheus module for metricbeat to gather data:
- module: prometheus
period: 10s
hosts: ["node:9090"]
metrics_path: /metrics
sorry for the wall of text, but i tried to be as complete as possible.
Thanks for any help/advise!