Hello Kubernetes Community!
My name is Gavin Kliger, and I am a cloud computing and microservice researcher at UC Berkeley’s NetSys Lab.
In our research, we assume that microservices are configured independently, despite being a single component in a larger application. These configurations might include the max number of requests or request backoff time. We are interested in identifying cases where the independence assumption breaks down, resulting in unexpected interactions (e.g., cascading failures) between different microservices.
For example, Target runs logging sidecars with all of its applications. When Kafka became semi-unresponsive due to intermittent network connectivity, all of these logging sidecars woke up and began to use cpu time. The resulting high load on Docker daemons caused containers to say they were unhealthy and be repeatedly rescheduled. This overwhelmed Consul. When Vault found it could no longer communicate quickly with Consul, it sealed itself. You can read in more detail here.
We’d love to talk about your experiences working through issues like this in your deployments.
Thank you for your time,
Berkeley NetSys Lab