Hello! I am new in k8s and Prometheus. Could you please give an advice what could be the problem and what to check?
I’ve created 3 VM’s with UbuntuServer 22.04 in VirtualBox with 1 NAT and 1 VirtualHost adapter each. Installed docker 24.0.5 and kubernetes 1.22.10-00 with flannel network.
(from https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml)
Then I installed kube-prometheus-stack from https://prometheus-community.github.io/helm-charts.
So I have these pods in monitoring namespace:
kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 8 (107m ago) 8h
kube-prometheus-stack-grafana-5d5986498-j4cdx 3/3 Running 15 (107m ago) 10h
kube-prometheus-stack-kube-state-metrics-787df9c684-xlpnh 1/1 Running 15 (107m ago) 2d10h
kube-prometheus-stack-operator-7b8745877f-7qrnb 1/1 Running 15 (107m ago) 2d10h
kube-prometheus-stack-prometheus-node-exporter-4csc8 1/1 Running 15 (107m ago) 2d10h
kube-prometheus-stack-prometheus-node-exporter-76gzt 1/1 Running 14 (107m ago) 2d10h
kube-prometheus-stack-prometheus-node-exporter-k4x9d 1/1 Running 20 (107m ago) 2d10h
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 30 (107m ago) 2d10h
And I want to send messages to telegram on alerts. I created telegram bot added it to new channel and copied its token and chat id. When I try to test a new telegram contact point in grafana web interface I’m getting error like this: Failed to send test alert.: the receiver timed out: failed to send telegram message: Post “https://api.telegram.org/bot/sendMessage”: context deadline exceeded
in grafana log I see this:
logger=ngalert.notifier.telegram notifierUID=d46231dc-72cb-4da5-8729-349c0e65b582 t=2023-12-13T18:08:20.414859711Z level=error msg="Missing receiver"
logger=ngalert.notifier.telegram notifierUID=d46231dc-72cb-4da5-8729-349c0e65b582 t=2023-12-13T18:08:20.414965142Z level=error msg="Missing group labels"
logger=cleanup t=2023-12-13T18:08:31.533358788Z level=info msg="Completed cleanup jobs" duration=9.174282ms
logger=context userId=1 orgId=1 uname=admin t=2023-12-13T18:08:35.396509325Z level=info msg="Request Completed" method=POST path=/api/alertmanager/grafana/config/api/v1/receivers/test status=408 remote_addr=127.0.0.1 time_ms=15015 durat
ion=15.015160343s size=565 referer="http://192.168.56.10:8080/alerting/notifications/receivers/telegram/edit?alertmanager=grafana" handler=/api/alertmanager/grafana/config/api/v1/receivers/test
logger=grafana.update.checker t=2023-12-13T18:08:51.532006493Z level=error msg="Update check failed" error="failed to get latest.json repo from github.com: Get \"https://raw.githubusercontent.com/grafana/grafana/main/latest.json\": dial
tcp: lookup raw.githubusercontent.com: i/o timeout" duration=10.002185432s
Is the problem is that I didn’t configure a receiver in alertmanager?
I’ve also tried to configure an alertmanager by configMap like this:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'my-telegram'
receivers:
- name: 'my-telegram'
telegram_configs:
- bot_token: '<MyTelegramBotToken>'
api_url: 'https://api.telegram.org'
chat_id: <MyTelegramChatId>
parse_mode: ''
But no success, and it’s harder to debug. So I wonder if there is a way to check it easier like with grafana web interface.
I’ve also checked prometheus web interface. There are such firing alerts:
AlertmanagerClusterDown (1 active)
etcdMembersDown (1 active)
etcdInsufficientMembers (1 active)
TargetDown (6 active)
Watchdog (1 active)
NodeClockNotSynchronising (3 active)
Thanks in advance!
UPD.
prometheus logs:
ts=2023-12-14T08:24:48.725Z caller=notifier.go:530 level=error component=notifier alertmanager=http://10.244.1.30:9093/api/v2/alerts count=6 msg="Error sending alert" err="Post \"http://10.244.1.30:9093/api/v2/alerts\": dial tcp 10.
alertmanager logs:
ts=2023-12-14T07:21:52.314Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9093
ts=2023-12-14T07:21:52.314Z caller=tls_config.go:313 level=info msg="TLS is disabled." http2=false address=[::]:9093
ts=2023-12-14T07:21:57.279Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2023-12-14T07:21:57.280Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
If I try to check alertmanager access from prometheus I get “No route to host”:
kubectl exec -it prometheus-kube-prometheus-stack-prometheus-0 -n monitoring -- wget -O- http://10.244.1.30:9093
Connecting to 10.244.1.30:9093 (10.244.1.30:9093)
wget: can't connect to remote host (10.244.1.30): No route to host
command terminated with exit code 1