Home
cover of episode Ep11. Designing Data-Intensive Applications - Partitioning

Ep11. Designing Data-Intensive Applications - Partitioning

2022/2/21
logo of podcast Eng Cafe

Eng Cafe

Frequently requested episodes will be transcribed first

Chapters

Shownotes Transcript

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

🔴  这一期偏重技术话题,我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便,希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。****# Show Notes

  • 📕 Designing Data-Intensive Applications)

  • What is partitioning?)

  • A partition is a division of a logical database or its constituent elements into distinct independent parts.

  • Main reason: scalability - the query load can be distributed across many processors.

  • Youtube / Vitess scaling story)

  • Single MySQL → Add read replica → Write can’t catchup up → Partition

  • How to partition?

  • Partitioning by Key Range (e.g., Bigtable)

  • Assign a continuous range of keys to each partition

  • Pro: range scan is easier, data locality

  • Cons: certain access patterns can lead to hot spots (timestamp)

  • Cons: finding split points and managing rebalancing is hard

  • Partitioning by Hash

  • Good hash function: uniformly distribute keys

  • Con: no easy range queries

  • Cassandra does KKV (partitioning key, sort key, value)

  • Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber)

  • Secondary indexes: Local index

  • Efficient write, expensive read

  • ElasticSearch

  • Secondary indexes: Global index

  • Efficient read, expensive write

  • Using Global Secondary Indexes in DynamoDB) (这里说错了,DynamoDB 支持 20 global secondary indexes per table)

  • Rebalancing partitions

  • Move loads to other nodes

  • Fixed number of partitions

  • New node steals partitions from every existing node

  • Notion: 480 partitions)

  • Dynamic partitioning

  • 📈: split partition into 2

  • 📉: merge 2 partitions into 1

  • Fixed number of partitions per node

  • https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30)

  • Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)

  • Request Routing

  • 3 approaches: nodes talk to each other, separate routing tier, smart client

  • Separate coordination service such as ZooKeeper

  • Notes by xg)

联系方式