viewing eyes design

Lorne Kligerman

Director of Product at Gremlin

Lorne currently leads the product team at Gremlin, helping companies improve reliability and avoid outages by running proactive chaos engineering experiments. He last worked at Google Cloud as a Product Manager on App Engine, empowering developers to build applications on a fully managed and resilient platform.

Talk Title: Monitoring Graceful Failure of a Distributed System

How can you be sure that your team is alerted of a failure before it causes an outage for your users?

The move from monolith to microservice has allowed pieces of functionality to be deployed individually and on demand. Having functionality isolated allows the opportunity for one microservice to fail without bringing down the whole system. However, the complexity of releasing and monitoring API calls being made across services has increased.

Whether you’re launching a new product or iterating on a feature, delivering a delightful experience is crucial to your success. If something is to fail, you’d prefer your users didn’t know. Be thoughtful about how your system will degrade, how to inject failure to verify your design, and how this is monitored.

In this talk, Lorne Kligerman, Director of Product at Gremlin, will cover failing gracefully as an engineering goal which can be confidently tested and monitored with Chaos Engineering. By purposely causing failure of one service at a time in a controlled environment, you can safely observe and react in a timely manner to limit the effect on the end user.