cover of episode SLOs for Everyone

SLOs for Everyone

2022/5/1
logo of podcast The Cloudcast

The Cloudcast

Shownotes Transcript

No Brian (Gracely) for a Sunday Perspective this week. We have Brian Singer (@brian_singer, CPO @nobl9inc) talking about Service Level Objectives (SLO), what they are, why they matter, and how to use SLOs to focus on innovation vs. technical debt.

SHOW: 613

**CLOUD NEWS OF THE WEEK - **http://bit.ly/cloudcast-cnotw)

**CHECK OUT OUR NEW PODCAST - ****"CLOUDCAST BASICS"**)

SHOW SPONSORS:

  • SLOConf) - Free virtual event May 9th-12th - Register Today!
  • Revelo): Sidestep the competitive US talent market by hiring remote engineers in Latin America. Source, hire, and pay Latin American engineers in US time zones with one service. Revelo manages all the paperwork including benefits, payroll, and compliance. Hire a full-time engineer with a 14-day trial. Revelo.com/cloudcast)
  • Datadog Synthetic Monitoring): Frontend and Backend Modern Monitoring
  • Ensure frontend issues don’t impair user experience by detecting user-facing issues with API and browser tests with a free 14 day Datadog trial). Listeners of The Cloudcast will also receive a free Datadog T-shirt. 
  • strongDM) - Secure infrastructure access for the modern stack. 
  • Manage access to any server, database, or Kubernetes instance in minutes. Fully auditable, replayable, secure, and drag-and-drop easy. Try it free for 14 days - www.strongdm.com/signup)

SHOW NOTES:

  • Nobl9) - website
  • The Cloudcast #502) - Reliability as a Service

**Topic 1 - **Brian, welcome to the show! We had Alex on the podcast last year and look forward to continuing the SLO conversation. Give everyone a brief introduction.

**Topic 2 - **If someone isn’t familiar with SLO’s (Service Level Objectives), how do you define them? Why do they matter? What problem do they solve? How are they different from SLA’s?

**Topic 3 - **Is this a transition from max reliability to instead look at errors as a “budget”? How can you manage a certain window of unreliability and keep customers happy? 

**Topic 4 - **How do you create SLOs? Who creates them? Is this an SRE connecting up to existing systems or new tooling and plumbing? Does it fit into an existing GitOps workflow for instance - SLOs-as-Code? Is there automation triggers that happen when conditions are met?

**Topic 5 - **How does an SRE know which metrics matter? I would imagine not all downtime is equal? How does this correlate with business KPIs? Do you fine tune over time?

**Topic 6 - **The big question is always the focus on technical debt vs. innovation. Does this help and if so how?

FEEDBACK?

  • Email: show at the cloudcast dot net
  • Twitter: @thecloudcastnet)