Microservice Configuration Research Question

Hello Kubernetes Community!

My name is Gavin Kliger, and I am a cloud computing and microservice researcher at UC Berkeley’s NetSys Lab.

In our research, we assume that microservices are configured independently, despite being a single component in a larger application. These configurations might include the max number of requests or request backoff time. We are interested in identifying cases where the independence assumption breaks down, resulting in unexpected interactions (e.g., cascading failures) between different microservices.

For example, Target runs logging sidecars with all of its applications. When Kafka became semi-unresponsive due to intermittent network connectivity, all of these logging sidecars woke up and began to use cpu time. The resulting high load on Docker daemons caused containers to say they were unhealthy and be repeatedly rescheduled. This overwhelmed Consul. When Vault found it could no longer communicate quickly with Consul, it sealed itself. You can read in more detail here.

We’d love to talk about your experiences working through issues like this in your deployments.

Thank you for your time,

Gavin Kliger
Berkeley NetSys Lab

1 Like

@gkliger I’m not sure how much of a response you’ll get, but you might find a bit more in what you’re looking for from the chaos engineering folks.

@mrbobbytables Thanks for the reply! Can you drop me a pointer to a better forum/community for this sort of question?

There is a #chaosengineering channel in the cncf slack. This awesome lists also has a slew of resources:

Great resource, appreciate it!