High Available prometheus using Thanos sidecar

nasil · March 6, 2024, 10:22am

I’m on the hunt for a solid monitoring solution for Kubernetes, aiming to pinpoint issues even when the entire cluster crashes. The challenge lies in the fact that if Prometheus, our monitoring tool, crashes along with the cluster, Grafana won’t be able to display what went wrong. My idea is to create a monitoring system robust enough to remain operational under any circumstances. Here’s a simplified breakdown of how I envision it working, but I’m open to suggestions on making it more effective:

Collecting Metrics:
1.1- Prometheus exporters gather metrics from various sources.
1.2- We also capture metrics from short-lived jobs.
Pulling Metrics:
2.1 - The Prometheus server pulls metrics from exporters.
2.2 - It also collects metrics pushed to the Pushgateway.
Data Storage:
- Prometheus stores this data on an SSD
Thanos Integration:
- A Thanos Sidecar attached to the Prometheus server helps share and manage this data.
- It helps send TSDB (Time Series Database) data to S3 for long-term storage.
Optimization and Retrieval:
- The Thanos Compactor works to downsample and remove duplicate metrics, keeping our data efficient.
- The Thanos Store retrieves data from S3 when needed.
Querying Data:
- When someone queries through Grafana, the request goes to Thanos Query, which fetches the data from S3.

To ensure that dashboard access isn’t lost if the cluster goes down, I’ve placed Grafana, Thanos Query, Compactor, and Store outside of the main cluster. I’m looking for the best approach to ensure our monitoring system remains available and effective, regardless of the cluster’s state."
i dont know is t possible or not ? I am open to suggestions and would greatly appreciate any feedback on the reliability of this architecture. Does anyone have insights or proposed enhancements?

Topic		Replies	Views
Prometheus and external Grafana setup General Discussions prometheus	0	1431	September 22, 2021
Use central monitoring for multiple k8s cluster or each have their own General Discussions	2	27	March 17, 2025
Uber's M3 vs Thanos General Discussions prometheus	2	3261	March 24, 2019
Monitoring DB host machine in k8s prometheus General Discussions prometheus	1	747	January 7, 2022
Scarping Metrics From outside Cluster General Discussions prometheus	0	598	July 30, 2021

High Available prometheus using Thanos sidecar

Related topics