Trying to convince a company to embrace the theory and idea of Chaos Engineering is an uphill battle. When a site keeps breaking, Gremlin’s plan involves breaking things intentionally. How do you introduce chaos as a step toward making things better?
Today, we’re talking to Ho Ming Li, lead solutions architect at Gremlin. He takes a strategic approach to deliver holistic solutions, often diving into the intersection of people, process, business, and technology. His goal is to enable everyone to build more resilient software by means of Chaos Engineering practices.
Some of the highlights of the show include:
Ho Ming Li previously worked as a technical account manager (TAM) at Amazon Web Services (AWS) to offer guidance on architectural/operational best practices
Difference between and transition to solutions architect and TAM at AWS
Role of TAM as the voice and face of AWS for customers
Ultimate goal is to bring services back up and make sure customers are happy
Amazon Leadership Principles: Mutually beneficial to have the customer get what they want, be happy with the service, and achieve success with the customer
Chaos Engineering isn’t about breaking things to prove a point
Chaos Engineering takes a scientific approach
Other than during carefully staged DR exercises, DR plans usually don’t work
Availability Theater: A passive data center is not enough; exercise DR plan
Chaos Engineering is bringing it down to a level where you exercise it regularly to build resiliency
Start small when dealing with availability
Chaos Engineering is a journey of verifying, validating, and catching surprises in a safe environment
Get started with Chaos Engineering by asking: What could go wrong?
Embrace failure and prepare for it; business process resilience
Gremlin’s GameDay and Chaos Conf allows people to share experiences
Links:
.