System Design · 19 min read

Design a Distributed Cache: Eviction, Coherency, and Stampede Control

Memcached vs Redis patterns, single-flight, and what 'TTL' implies for your consistency story.

3,750 words

Design a Distributed Cache: Eviction, Coherency, and Stampede Control. Memcached vs Redis patterns, single-flight, and what 'TTL' implies for your consistency story. This long-form guide sits in the Alpha Code library because interview prep should feel structured, not superstitious: we anchor advice to what loops actually measure, how time pressure distorts judgment, and how to rehearse behaviors that stay stable under stress. You will find six concrete chapters below, each with checklists and recovery patterns you can reuse across companies and levels. We wrote it for candidates who already know the basics but want a disciplined narrative — the kind of document you can skim before a phone screen and deep-read before an onsite. Expect explicit tradeoffs, not cheerleading: some strategies cost time, some require partners, and some only make sense at certain seniority bands. If a section does not apply to your target loop, skip it without guilt; the goal is optionality, not completionism. By the end, you should be able to describe your prep plan to a mentor in five minutes and sound like you have a system, not a pile of bookmarks.

placement and topology — what interviewers measure in the first five minutes

This section focuses on placement and topology — what interviewers measure in the first five minutes. Candidates preparing for Design a Distributed Cache often underestimate how much interviewers infer from process: how you decompose the prompt, name tradeoffs, and verify before you optimize. The behaviors that look boring — restating constraints, proposing a baseline, testing a tiny example — are exactly what separates hire from no-hire when two solutions have similar asymptotics. We connect this theme to what hiring committees actually write in feedback forms, not abstract advice. Treat the next paragraphs as a script you can steal: say the quiet parts out loud, label your invariants, and narrate recovery when you misread a constraint. Practice until it feels mechanical, because stress will strip your polish unless the habits are automatic.

Time management is where strong candidates lose offers. You do not get partial credit for a perfect approach you never finished. A working solution that passes tests beats an elegant idea that lives only on the whiteboard. Practice cutting scope early: start with brute force if it clarifies invariants, then tighten. Interviewers often prefer a clean linear scan plus verbalized next steps over a half-written optimal algorithm.

Start every design with users and workloads. Who reads, who writes, and what latency matters? Without those anchors, caching and sharding discussions float uselessly. A social feed and a payment ledger have different consistency requirements — say that explicitly before drawing boxes.

Language choice matters less than fluency. Pick one primary interview language and know its standard library idioms cold: heaps, ordered maps, string handling, and common pitfalls. Switching languages mid-loop to chase marginal performance gains usually costs more in mistakes than it saves in asymptotics. Fluency is the optimization target.

The best onsite performances look boring from the outside: clear steps, explicit assumptions, and a solution that actually finishes.
Composite feedback from mock interview coaches
  • Restate the heart of "placement and topology — what interviewers measure in the first five minutes" and confirm inputs, outputs, and edge cases.
  • Propose a brute-force or baseline you can finish — name its complexity honestly.
  • Walk a hand trace on a small example; only then refactor toward the optimal structure.
  • Reserve the final minutes for tests: null/empty, duplicates, extremes, and off-by-one boundaries.
  • Close with a one-sentence summary of tradeoffs and what you would monitor in production.

Start every design with users and workloads. Who reads, who writes, and what latency matters? Without those anchors, caching and sharding discussions float uselessly. A social feed and a payment ledger have different consistency requirements — say that explicitly before drawing boxes.

Time management is where strong candidates lose offers. You do not get partial credit for a perfect approach you never finished. A working solution that passes tests beats an elegant idea that lives only on the whiteboard. Practice cutting scope early: start with brute force if it clarifies invariants, then tighten. Interviewers often prefer a clean linear scan plus verbalized next steps over a half-written optimal algorithm.

First moves: framing eviction policies before you reach for code

This section focuses on First moves: framing eviction policies before you reach for code. Candidates preparing for Design a Distributed Cache often underestimate how much interviewers infer from process: how you decompose the prompt, name tradeoffs, and verify before you optimize. The behaviors that look boring — restating constraints, proposing a baseline, testing a tiny example — are exactly what separates hire from no-hire when two solutions have similar asymptotics. We connect this theme to what hiring committees actually write in feedback forms, not abstract advice. Treat the next paragraphs as a script you can steal: say the quiet parts out loud, label your invariants, and narrate recovery when you misread a constraint. Practice until it feels mechanical, because stress will strip your polish unless the habits are automatic.

Rubrics differ by level. Junior loops emphasize implementation correctness and learning speed. Mid-level loops add system reasoning and collaboration. Senior-plus loops trade some coding intensity for scope, ambiguity, and multi-team tradeoffs. If you are preparing for a Staff loop with only LeetCode hards, you are misaligned. If you are preparing for an L4 coding screen with only architecture blog posts, you are also misaligned. Match the tool to the level.

Data models should match access patterns. Normalize when invariants matter; denormalize when read latency dominates and you can tolerate eventual consistency. Event sourcing is powerful but operational overhead is real — do not propose it unless the problem benefits from auditability or replay.

Communication is a first-class deliverable. Even solo coding rounds are graded partly on whether a hiring manager could follow your reasoning six months later from notes. That means naming variables honestly, stating assumptions explicitly, and checking in before you disappear into twenty minutes of silence. If you are remote, narrate a little more than feels natural — the interviewer cannot see your facial cues.

  • Restate the heart of "First moves: framing eviction policies before you reach for code" and confirm inputs, outputs, and edge cases.
  • Propose a brute-force or baseline you can finish — name its complexity honestly.
  • Walk a hand trace on a small example; only then refactor toward the optimal structure.
  • Reserve the final minutes for tests: null/empty, duplicates, extremes, and off-by-one boundaries.
  • Close with a one-sentence summary of tradeoffs and what you would monitor in production.

Data models should match access patterns. Normalize when invariants matter; denormalize when read latency dominates and you can tolerate eventual consistency. Event sourcing is powerful but operational overhead is real — do not propose it unless the problem benefits from auditability or replay.

Rubrics differ by level. Junior loops emphasize implementation correctness and learning speed. Mid-level loops add system reasoning and collaboration. Senior-plus loops trade some coding intensity for scope, ambiguity, and multi-team tradeoffs. If you are preparing for a Staff loop with only LeetCode hards, you are misaligned. If you are preparing for an L4 coding screen with only architecture blog posts, you are also misaligned. Match the tool to the level.

MomentWhat to say
StartI'll restate the goal, then propose a baseline I can complete in time.
MidpointHere's the invariant I'm maintaining — I'll verify it on the example.
StuckI'm stuck on X; I'll try a smaller case and see what breaks.
EndI'll run these edge cases, then summarize complexity and tradeoffs.

Tradeoffs, pitfalls, and honest complexity around invalidation stories

This section focuses on Tradeoffs, pitfalls, and honest complexity around invalidation stories. Candidates preparing for Design a Distributed Cache often underestimate how much interviewers infer from process: how you decompose the prompt, name tradeoffs, and verify before you optimize. The behaviors that look boring — restating constraints, proposing a baseline, testing a tiny example — are exactly what separates hire from no-hire when two solutions have similar asymptotics. We connect this theme to what hiring committees actually write in feedback forms, not abstract advice. Treat the next paragraphs as a script you can steal: say the quiet parts out loud, label your invariants, and narrate recovery when you misread a constraint. Practice until it feels mechanical, because stress will strip your polish unless the habits are automatic.

Data structures are not Pokemon; you do not collect them for their own sake. You pick the structure that makes the operations your algorithm needs cheap. If you need fast membership and order does not matter, a set or map is the conversation. If you need order statistics, heaps or balanced trees enter. If the problem is about connectivity, graphs are near. Practice explaining that mapping in one sentence before you write code.

Data models should match access patterns. Normalize when invariants matter; denormalize when read latency dominates and you can tolerate eventual consistency. Event sourcing is powerful but operational overhead is real — do not propose it unless the problem benefits from auditability or replay.

Rubrics differ by level. Junior loops emphasize implementation correctness and learning speed. Mid-level loops add system reasoning and collaboration. Senior-plus loops trade some coding intensity for scope, ambiguity, and multi-team tradeoffs. If you are preparing for a Staff loop with only LeetCode hards, you are misaligned. If you are preparing for an L4 coding screen with only architecture blog posts, you are also misaligned. Match the tool to the level.

  • Restate the heart of "Tradeoffs, pitfalls, and honest complexity around invalidation stories" and confirm inputs, outputs, and edge cases.
  • Propose a brute-force or baseline you can finish — name its complexity honestly.
  • Walk a hand trace on a small example; only then refactor toward the optimal structure.
  • Reserve the final minutes for tests: null/empty, duplicates, extremes, and off-by-one boundaries.
  • Close with a one-sentence summary of tradeoffs and what you would monitor in production.

Data models should match access patterns. Normalize when invariants matter; denormalize when read latency dominates and you can tolerate eventual consistency. Event sourcing is powerful but operational overhead is real — do not propose it unless the problem benefits from auditability or replay.

Data structures are not Pokemon; you do not collect them for their own sake. You pick the structure that makes the operations your algorithm needs cheap. If you need fast membership and order does not matter, a set or map is the conversation. If you need order statistics, heaps or balanced trees enter. If the problem is about connectivity, graphs are near. Practice explaining that mapping in one sentence before you write code.

When hot key mitigation goes sideways: recovery scripts that still score

This section focuses on When hot key mitigation goes sideways: recovery scripts that still score. Candidates preparing for Design a Distributed Cache often underestimate how much interviewers infer from process: how you decompose the prompt, name tradeoffs, and verify before you optimize. The behaviors that look boring — restating constraints, proposing a baseline, testing a tiny example — are exactly what separates hire from no-hire when two solutions have similar asymptotics. We connect this theme to what hiring committees actually write in feedback forms, not abstract advice. Treat the next paragraphs as a script you can steal: say the quiet parts out loud, label your invariants, and narrate recovery when you misread a constraint. Practice until it feels mechanical, because stress will strip your polish unless the habits are automatic.

Time management is where strong candidates lose offers. You do not get partial credit for a perfect approach you never finished. A working solution that passes tests beats an elegant idea that lives only on the whiteboard. Practice cutting scope early: start with brute force if it clarifies invariants, then tighten. Interviewers often prefer a clean linear scan plus verbalized next steps over a half-written optimal algorithm.

Observability is part of design, not an appendix. Metrics for latency percentiles, error budgets, tracing across services, and structured logs for debugging — pick two to emphasize based on the prompt. Staff interviewers want to know how you would operate what you designed.

Language choice matters less than fluency. Pick one primary interview language and know its standard library idioms cold: heaps, ordered maps, string handling, and common pitfalls. Switching languages mid-loop to chase marginal performance gains usually costs more in mistakes than it saves in asymptotics. Fluency is the optimization target.

The best onsite performances look boring from the outside: clear steps, explicit assumptions, and a solution that actually finishes.
Composite feedback from mock interview coaches
  • Restate the heart of "When hot key mitigation goes sideways: recovery scripts that still score" and confirm inputs, outputs, and edge cases.
  • Propose a brute-force or baseline you can finish — name its complexity honestly.
  • Walk a hand trace on a small example; only then refactor toward the optimal structure.
  • Reserve the final minutes for tests: null/empty, duplicates, extremes, and off-by-one boundaries.
  • Close with a one-sentence summary of tradeoffs and what you would monitor in production.

Observability is part of design, not an appendix. Metrics for latency percentiles, error budgets, tracing across services, and structured logs for debugging — pick two to emphasize based on the prompt. Staff interviewers want to know how you would operate what you designed.

Time management is where strong candidates lose offers. You do not get partial credit for a perfect approach you never finished. A working solution that passes tests beats an elegant idea that lives only on the whiteboard. Practice cutting scope early: start with brute force if it clarifies invariants, then tighten. Interviewers often prefer a clean linear scan plus verbalized next steps over a half-written optimal algorithm.

A two-week drill plan with milestones tied to consistency caveats

This section focuses on A two-week drill plan with milestones tied to consistency caveats. Candidates preparing for Design a Distributed Cache often underestimate how much interviewers infer from process: how you decompose the prompt, name tradeoffs, and verify before you optimize. The behaviors that look boring — restating constraints, proposing a baseline, testing a tiny example — are exactly what separates hire from no-hire when two solutions have similar asymptotics. We connect this theme to what hiring committees actually write in feedback forms, not abstract advice. Treat the next paragraphs as a script you can steal: say the quiet parts out loud, label your invariants, and narrate recovery when you misread a constraint. Practice until it feels mechanical, because stress will strip your polish unless the habits are automatic.

Testing your solution should be habitual, not heroic. Walk a small example by hand, then translate that walk into asserts or print debugging if the environment allows. If tests fail, read the failure mode: off-by-one errors cluster at boundaries; infinite loops often mean your termination condition moved; wrong answers without crashes often mean a logic gap in state updates. Label those categories in your post-mortem so you see patterns across problems.

Sharding strategies range from hash by key to range partitions. Hot partitions can negate sharding gains; mention salting or dynamic resharding if traffic skew is likely. For global systems, region placement and cross-region replication costs belong in the first ten minutes, not as an afterthought.

Mock interviews fail when they are too polite. The point is not confidence; the point is diagnostic signal. You want a partner who will interrupt, ask why you chose a data structure, and force you to state invariants explicitly. Record audio if you can. The gap between what you think you explained and what you actually said is where most surprises live.

  • Restate the heart of "A two-week drill plan with milestones tied to consistency caveats" and confirm inputs, outputs, and edge cases.
  • Propose a brute-force or baseline you can finish — name its complexity honestly.
  • Walk a hand trace on a small example; only then refactor toward the optimal structure.
  • Reserve the final minutes for tests: null/empty, duplicates, extremes, and off-by-one boundaries.
  • Close with a one-sentence summary of tradeoffs and what you would monitor in production.

Sharding strategies range from hash by key to range partitions. Hot partitions can negate sharding gains; mention salting or dynamic resharding if traffic skew is likely. For global systems, region placement and cross-region replication costs belong in the first ten minutes, not as an afterthought.

Testing your solution should be habitual, not heroic. Walk a small example by hand, then translate that walk into asserts or print debugging if the environment allows. If tests fail, read the failure mode: off-by-one errors cluster at boundaries; infinite loops often mean your termination condition moved; wrong answers without crashes often mean a logic gap in state updates. Label those categories in your post-mortem so you see patterns across problems.

Day-of checklist: monitoring hit rate, timeboxing, and how to close strong

This section focuses on Day-of checklist: monitoring hit rate, timeboxing, and how to close strong. Candidates preparing for Design a Distributed Cache often underestimate how much interviewers infer from process: how you decompose the prompt, name tradeoffs, and verify before you optimize. The behaviors that look boring — restating constraints, proposing a baseline, testing a tiny example — are exactly what separates hire from no-hire when two solutions have similar asymptotics. We connect this theme to what hiring committees actually write in feedback forms, not abstract advice. Treat the next paragraphs as a script you can steal: say the quiet parts out loud, label your invariants, and narrate recovery when you misread a constraint. Practice until it feels mechanical, because stress will strip your polish unless the habits are automatic.

Time management is where strong candidates lose offers. You do not get partial credit for a perfect approach you never finished. A working solution that passes tests beats an elegant idea that lives only on the whiteboard. Practice cutting scope early: start with brute force if it clarifies invariants, then tighten. Interviewers often prefer a clean linear scan plus verbalized next steps over a half-written optimal algorithm.

Start every design with users and workloads. Who reads, who writes, and what latency matters? Without those anchors, caching and sharding discussions float uselessly. A social feed and a payment ledger have different consistency requirements — say that explicitly before drawing boxes.

Language choice matters less than fluency. Pick one primary interview language and know its standard library idioms cold: heaps, ordered maps, string handling, and common pitfalls. Switching languages mid-loop to chase marginal performance gains usually costs more in mistakes than it saves in asymptotics. Fluency is the optimization target.

  • Restate the heart of "Day-of checklist: monitoring hit rate, timeboxing, and how to close strong" and confirm inputs, outputs, and edge cases.
  • Propose a brute-force or baseline you can finish — name its complexity honestly.
  • Walk a hand trace on a small example; only then refactor toward the optimal structure.
  • Reserve the final minutes for tests: null/empty, duplicates, extremes, and off-by-one boundaries.
  • Close with a one-sentence summary of tradeoffs and what you would monitor in production.

Start every design with users and workloads. Who reads, who writes, and what latency matters? Without those anchors, caching and sharding discussions float uselessly. A social feed and a payment ledger have different consistency requirements — say that explicitly before drawing boxes.

Time management is where strong candidates lose offers. You do not get partial credit for a perfect approach you never finished. A working solution that passes tests beats an elegant idea that lives only on the whiteboard. Practice cutting scope early: start with brute force if it clarifies invariants, then tighten. Interviewers often prefer a clean linear scan plus verbalized next steps over a half-written optimal algorithm.

MomentWhat to say
StartI'll restate the goal, then propose a baseline I can complete in time.
MidpointHere's the invariant I'm maintaining — I'll verify it on the example.
StuckI'm stuck on X; I'll try a smaller case and see what breaks.
EndI'll run these edge cases, then summarize complexity and tradeoffs.

Stop grinding. Start patterning.

Alpha Code is a patterns-first interview prep platform — coding, system design, behavioral, mocks, and ML/AI engineering all under one $19/mo subscription.