How do distributed systems work? If you’ve got a database spread over three servers, how do they elect a leader? How does that change when we spread those machines out across data centers, situated around the globe? Do we even need to understand how it works, or can we relegate those problems to an off the shelf tool like Zookeeper?
Joining me this week is Distributed Systems Doctor—Benjamin Bengfort—for a deep dive into consensus algorithms. We start off by discussing how much of “the clustering problem” is your problem, and how much can be handled by a library. We go through many of the constraints and tradeoffs that you need to understand either way. And we eventually reach Benjamin’s surprising message - maybe the time is ripe to roll your own. Should we be writing our own bespoke Raft implementations? And if so, how hard would that be? What guidance can he offer us?
Somewhere in the recording of this episode, I decided I want to sit down and try to implement a leader election protocol. Maybe you will too. And if not, you’ll at least have a better appreciation for what it takes. Distributed systems used to be rocket science, but they’re becoming deployment as usual. This episode should help us all to keep up!
--
KubeCon talk on the FCD bug: https://kccncna2022.sched.com/event/182N9/lessons-learned-from-etcd-the-data-inconsistency-issues-marek-siarkowicz-google-benjamin-wang-vmwareThe Raft paper by Diego Ongaro and John Ousterhout: https://raft.github.io/raft.pdfThe EPaxos Algorithm: https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdfLevelDB: https://github.com/google/leveldb
Benjamin on Twitter: https://twitter.com/bbengfortBenjamin on LinkedIn: https://www.linkedin.com/in/bbengfortBenjamin on GitHub: https://github.com/bbengfortRotational Labs: https://rotational.io (check out the blog!)Kris on Twitter: https://twitter.com/krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/