Consistent hashing
Hash both keys and nodes onto a ring. Rebalance moves ~1/N of keys when a node joins or leaves. Virtual nodes smooth the distribution.
TL;DR
Sharding splits a dataset across nodes. Interview points: consistent hashing minimizes churn on rebalance; hot shards (one user dominates traffic) need mitigation; cross-shard queries require scatter-gather or denormalization.
Hash both keys and nodes onto a ring. Rebalance moves ~1/N of keys when a node joins or leaves. Virtual nodes smooth the distribution.
One key (e.g., one celebrity user's tweets) can dominate a shard. Mitigate: split by secondary key, separate hot-user path, or denormalize the hot dataset into its own tier.
Queries that span shards require scatter-gather or a materialized cross-shard index. Narrate the query pattern before picking the shard key.
Adding capacity requires moving data. Consistent hashing minimizes movement; double-write + backfill + cut-over is the standard live-migration playbook.
Book a 45-minute system design mock with a transcript. Included in the $19/mo subscription.