Home

Ep11. Designing Data-Intensive Applications - Partitioning

2022/2/21

Eng Cafe

Frequently requested episodes will be transcribed first

Chapters

Shownotes Transcript

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

🔴 这一期偏重技术话题，我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便，希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。****# Show Notes

📕 Designing Data-Intensive Applications)
What is partitioning?)
A partition is a division of a logical database or its constituent elements into distinct independent parts.
Main reason: scalability - the query load can be distributed across many processors.
Youtube / Vitess scaling story)
Single MySQL → Add read replica → Write can’t catchup up → Partition
How to partition?
Partitioning by Key Range (e.g., Bigtable)
Assign a continuous range of keys to each partition
Pro: range scan is easier, data locality
Cons: certain access patterns can lead to hot spots (timestamp)
Cons: finding split points and managing rebalancing is hard
Partitioning by Hash
Good hash function: uniformly distribute keys
Con: no easy range queries
Cassandra does KKV (partitioning key, sort key, value)
Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber)
Secondary indexes: Local index
Efficient write, expensive read
ElasticSearch
Secondary indexes: Global index
Efficient read, expensive write
Using Global Secondary Indexes in DynamoDB) (这里说错了，DynamoDB 支持 20 global secondary indexes per table）
Rebalancing partitions
Move loads to other nodes
Fixed number of partitions
New node steals partitions from every existing node
Notion: 480 partitions)
Dynamic partitioning
📈: split partition into 2
📉: merge 2 partitions into 1
Fixed number of partitions per node
https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30)
Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
Request Routing
3 approaches: nodes talk to each other, separate routing tier, smart client
Separate coordination service such as ZooKeeper
Notes by xg)

联系方式

官网: eng.cafe)
微信公众号: Eng Cafe
Twitter: @engcafefm)
Youtube: Eng Cafe)
小宇宙播客)
泛用型播客客户端: eng.cafe/subscribe)
Email: [email protected])

Ep11. Designing Data-Intensive Applications - Partitioning

Eng Cafe

Chapters

什么是分区?

分区的主要原因是什么?

Youtube / Vitess 的扩展故事

如何进行分区?

按键范围分区（例如，Bigtable）

按哈希分区

Cassandra 的 KKV 分区

热点问题：Justin Bieber 对 Twitter 服务器的影响

二级索引：本地索引

二级索引：全局索引

重新平衡分区

固定数量的分区

Notion：480 个分区

动态分区

每个节点的固定分区数量

操作：全自动（危险）/半自动/全手动（繁琐）

请求路由

Shownotes Transcript

联系方式

Ep11. Designing Data-Intensive Applications - Partitioning 33:46 Share

Eng Cafe

Chapters

什么是分区?

分区的主要原因是什么?

Youtube / Vitess 的扩展故事

如何进行分区?

按键范围分区（例如，Bigtable）

按哈希分区

Cassandra 的 KKV 分区

热点问题：Justin Bieber 对 Twitter 服务器的影响

二级索引：本地索引

二级索引：全局索引

重新平衡分区

固定数量的分区

Notion：480 个分区

动态分区

每个节点的固定分区数量

操作：全自动（危险）/半自动/全手动（繁琐）

请求路由

Shownotes Transcript

联系方式

Ep11. Designing Data-Intensive Applications - Partitioning