Databases on Kubernetes

Hi All,

I have been speaking about deploying databases on Kubernetes, and how it might become a very popular option in the feature with the rise of operators and with Local Persistent Storage going GA.

I would like to hear about your experiences with it and what is your vision for the next 1 year.

Ps: If you don’t mind, I would like to leave the link of one of my articles about it:
https://blog.couchbase.com/databases-on-kubernetes/

*Contrary opinions are more than welcome.

4 Likes

I’ve had some time to play around with getting SQL Server 2017 running on persistent storage, long term goal is to upgrade our SQL Server environment in kubernetes. Waiting on Microsoft to develop better AD authentication integration with SQL Server / pods.

In the next year I’ll be exploring how to migrate our current Mongo/ SQL Server environments and supporting a new Postgress environment in kubernetes.

I work on a project with automates PostgreSQL on Kubernetes: https://github.com/zalando/patroni

My experience is that where Kubernetes shines as a platform for databases is when your databases are small and single-application, and it becomes increasingly difficult with larger, more central-to-the-enterprise databases. Zalando, the majority contributors to Patroni, for example, have succeeded with it because they’ve gone “full DevOps” where each development team owns their own database. Kubernetes-based automation makes this model possible.

4 Likes

@jberkus I have watched your talk at Kubecon, Congratz.

Well, I face a lot of people arguing that they are against running databases on docker that is why I brought this topic here.

About Kubernetes shinning just for small databases, it might be a problem of the database itself. My experience with cloud-native databases is the exactly the opposite: It shines when there are a lot instances to manage.

1 Like

I’m facing the same arguments from my own team. A lot of it is training / becoming familiar with the new tech, some DBAs like to dip their toes in new waters some do not. I try to explain that containers may not work for every database solution but it should be a starting point to see if it does work.

@deniswsrosa well, I think this applies to distributed databases almost equally. That is, if you have a huge Cassandra instance that processes 100K writes per second, you’re going to have more problems running that on Kube than on bare metal. The main difference with distributed databases is that they are harder to install/manage than single-master databases, and Kubernetes helps with installation and node management.

Also, I fully expect the “big database” exception to be temporary. As Kubernetes storage options improve, and as cgroups improve, we will get to the stage where it will be easier to run databases of all sizes on Kube. We’re just not there yet.

Through some our customer engagements at Kasten we see folks who are actually going with both of the models described above - (1) using lots of small databases embedded in each application or microservice (more common with folks building out new microservice style applications from scratch) or (2) Using Kubernetes to actually build out larger centralized databased. You can find a more detailed discussion of this at https://kasten.io/databases/. @Justin_Hartman happy to share some more detailed observations if that will be helpful as you are looking to move some of your environment. Have seen quite a bit of Mongo and Postgres in this space.

Agree with @jberkus that as storage options and support for stateful workloads continue to evolve we will see a lot more databases on Kubernetes as people look to further consolidate their infrastructure.

I also expect to see more solutions (operator based or unifying framework focused https://github.com/kanisterio/kanister) that aim to simplify the ongoing management of such workloads, whether we are talking about many small instances or fewer larger instances.

I could add to that topic that we’ve developed a system together with LXC, which we’ve integrated into CRI to manage stateful services in a robust and performant way. Kubernetes itself is not a show stopper for handling data. The container layer itself is more or less implemented in a good way to handle data (i.e. a restart of docker in case of updating the docker daemon).

The stateless implementation of docker in case of updating stuff is also a bit difficult because a production SQL server shouldn’t just stop in case that one file has to be updated in the container. So for state it makes sense to handle state in a good way. :slight_smile: That is another reason to choose a container engine which is able to handle state.

However, we observed that a lot of interfaces/assumptions exists in Kubernetes around docker. For example we saw, that the CRI doesn’t contain an interface to transport a hostname in case of host networking. We’ve opened an issue on Github for that.

We plan to open some parts of our development if there is interest around.

1 Like

We (Zalando) run PostgreSQL (powered by Patroni as mentioned by @jberkus) and Elasticsearch in production on Kubernetes. We created custom operators (Open Source) for both:

2 Likes