System design · primitive

Sharding — consistent hashing, resharding, hot-shard pain.

TL;DR

Sharding splits a dataset across nodes. Interview points: consistent hashing minimizes churn on rebalance; hot shards (one user dominates traffic) need mitigation; cross-shard queries require scatter-gather or denormalization.

Consistent hashing

Hash both keys and nodes onto a ring. Rebalance moves ~1/N of keys when a node joins or leaves. Virtual nodes smooth the distribution.

Hot shards

One key (e.g., one celebrity user's tweets) can dominate a shard. Mitigate: split by secondary key, separate hot-user path, or denormalize the hot dataset into its own tier.

Cross-shard queries

Queries that span shards require scatter-gather or a materialized cross-shard index. Narrate the query pattern before picking the shard key.

Resharding

Adding capacity requires moving data. Consistent hashing minimizes movement; double-write + backfill + cut-over is the standard live-migration playbook.

Frequently asked questions

What is consistent hashing?
A sharding scheme where both keys and nodes are hashed onto a ring. Adding or removing a node only moves ~1/N of keys, instead of remapping everything.
How do I pick a shard key?
Pick the key your queries filter by most. The shard key determines which shard a query hits — a bad choice forces scatter-gather on every read.
How do I mitigate a hot shard?
Options: split by secondary key (user_id + bucket), separate hot-user path with its own replica set, or denormalize the hot data into a specialized store.

Practice this live.

Book a 45-minute system design mock with a transcript. Included in the $19/mo subscription.