Uber's M3 vs Thanos

achilles42 · November 18, 2018, 2:04pm

Hello Everyone,

We are currently using Prometheus for monitoring Kubernetes components and services running on it. As we have Prometheus per Kubernetes cluster(we have around 30 K8s clusters), we are facing issues mostly around global search, high availability of monitoring data and data retention.

I was exploring Thanos and M3.

Thoughts from the community?

brancz · November 21, 2018, 10:15am

Hi,

I would say it depends on your requirements a bit. If you are looking for a general purpose time-series database M3 looks like a great choice, it’s also a distributed database, which comes with maintenance cost. That said, it has only recently been open sourced so really no one but the people from uber can comment on long term usage.

In terms of performance, Prometheus’s tsdb and M3’s embedded type database have been benchmarked against each other and their performance seems to be close to identical (unsurprising as they are based on the same paper). They deviate a bit but that’s there are different trade-offs like uber’s tsdb decided to always offload time-series ID tracking to the application and different atomicity semantics.[0]

What I appreciate about Thanos is that its setup and maintenance cost is incredibly low, you start by just adding the sidecar to your existing Prometheus server, which essentially acts like a backup mechanism to an s3 like storage and you deploy the storage gateway and a querier and you suddenly have all the features you asked for without a distributed storage. Queriers can be hierarchically deployed so you can also just have a querier per cluster and one that collects to all queriers. That said, Thanos does somewhat require that you have direct network access, which may be an additional setup burden for multi-cluster setups.

My personal preference would be to use Thanos as it is operationally much simpler, but I also know the Prometheus and Thanos codebase quite well, so I might be biased .

[0] https://github.com/prometheus/tsdb/pull/445

achilles42 · March 24, 2019, 4:29am

Thanks Brancz.

We ended up using central Cortex with Prometheus remote write configs as our requirement is to build monitoring as a service internally.

Topic		Replies	Views
High Available prometheus using Thanos sidecar General Discussions prometheus	0	786	March 6, 2024
Kubernetes Podcast from Google: Monitoring, Metrics and M3, with Martin Mao and Rob Skillington General Discussions podcast	0	764	December 17, 2019
Monitoring DB host machine in k8s prometheus General Discussions prometheus	1	747	January 7, 2022
Kubernetes Podcast: Prometheus and OpenMetrics, with Richard Hartmann General Discussions podcast	0	979	January 22, 2019
[KCDSpain2023] Cómo hemos convertido una DB open source en un SaaS multi-tenant usando K8s Spanish kcdspain , kcdspain2023	0	268	December 12, 2023

Uber's M3 vs Thanos

Related topics