Start now →

CAP Theorem, PACELC & Replication — The Ultimate System Design Guide

By Ishwar Ambare · Published February 25, 2026 · 24 min read · Source: Level Up Coding
Blockchain
CAP Theorem, PACELC & Replication — The Ultimate System Design Guide

Part 3

Last Updated: February 2026 Author: Ishwar Ambare
Topics: Sharding, Replication, CAP Theorem, PACELC Theorem, Master-Slave Architecture, Consistency vs Availability

📋 Table of Contents

Part 1: Foundations

  1. The Big Picture Architecture
  2. Sharding — Distributing Data
  3. Replication — Copying Data
  4. Sharding vs Replication — Side by Side

Part 2: CAP Theorem

  1. What is CAP Theorem?
  2. Consistency ©
  3. Availability (A)
  4. Partition Tolerance (P)
  5. CAP in Practice — The Hardep & Nikl Story
  6. Choosing CP vs AP

Part 3: PACELC Theorem

  1. What is PACELC Theorem?
  2. Latency vs Consistency Trade-off
  3. Real-World Examples

Part 4: Master-Slave Replication

  1. Master-Slave Architecture
  2. Read Replicas
  3. Failover & Recovery

Part 5: Summary & Interview Prep

  1. Quick Reference Cheatsheet
  2. Practice Exercises & Interview Questions
  3. Solutions
  4. References and Resources

PART 1: FOUNDATIONS

🏗️ The Big Picture Architecture

Before diving into CAP Theorem, let’s ground ourselves in how large-scale systems look:

                        ┌─────────────────────────────────┐
│ CLIENT (Browser) │
└──────────────┬──────────────────┘
│ DNS Resolution

┌─────────────────────────────────┐
│ DNS │
└──────────────┬──────────────────┘
│ IP Address returned

┌─────────────────────────────────┐
│ GATEWAY / LOAD BALANCER │ ◄── SSL termination here
│ (First point of contact) │ Protocol translation
└────────┬──────────┬─────────────┘ Rate limiting, etc.
│ │
┌────────────┘ └────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ App Server 1 │ │ App Server 2 │
│ (Compute Layer) │ │ (Compute Layer) │
└────────┬─────────┘ └────────┬─────────┘
│ │
└────────────────┬───────────────────┘

┌──────────────▼───────────────────┐
│ GLOBAL CACHE │ ◄── Redis / Memcached
└──────────────┬───────────────────┘

┌──────────────▼───────────────────┐
│ STORAGE / DATABASE │ ◄── Source of Truth
└──────────────────────────────────┘
Key insight: The Gateway box is called a “gateway” — not just a load balancer. It does more than load balancing. It handles SSL termination, protocol translation, rate limiting, authentication, and more.

Why Multiple App Servers?

https://medium.com/media/767978e30f65c550dfdd7e7d6cadfe1b/href

Elasticity is the ability to adjust resources. You can add more compute when traffic is high and remove it when traffic is low. Cloud platforms like AWS support this via Auto Scaling.

💡 Storage ≠ Compute — Compute is highly elastic (scale up/down rapidly). Storage is deliberately static — you plan ahead, you don’t rapidly add/remove DB boxes on the fly.

🔀 Sharding — Distributing Data

The Problem: Too Much Data for One Machine

Consider Facebook’s User Database:

Users Table (3 Billion Entries):
┌─────────┬────────┬────────┬─────────────────┬──────────┬───────────────┐
│ user_id │ name │ gender │ relationship_st │ DOB │ last_updated │
├─────────┼────────┼────────┼─────────────────┼──────────┼───────────────┤
│ 107 │ Raju │ Male │ Married │ 1990-01 │ 2024-01-01 │
│ 998 │ ... │ ... │ ... │ ... │ ... │
│ 1070 │ ... │ ... │ ... │ ... │ ... │
└─────────┴────────┴────────┴─────────────────┴──────────┴───────────────┘

User Friends Table (Billions more entries):
┌─────────┬───────────┐
│ user_id │ friend_id │
├─────────┼───────────┤
│ 107 │ 998 │ ← Raju is friends with user 998
│ 107 │ 1070 │ ← Raju is friends with user 1070
│ 107 │ 66 │ ← Raju is friends with user 66
└─────────┴───────────┘

A single machine CANNOT handle:

What is Sharding?

Sharding = Distributing data across multiple machines in a Mutually Exclusive, Collectively Exhaustive (MECE) way.
UNIVERSE OF ALL DATA
┌──────────────────────────────────────────────────────────────────┐
│ user:107 user:998 user:1070 user:66 user:442 user:5001 ... │
└──────────────────────────────────────────────────────────────────┘

SHARDING splits this into:

┌──────────┬──────────┬──────────┬──────────┬──────────┐
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ SHARD A │ │ SHARD B │ │ SHARD C │ │ SHARD D │ │ SHARD E │
│ u:107 │ │ u:998 │ │ u:1070 │ │ u:66 │ │ u:442 │
│ u:28 │ │ u:319 │ │ u:2050 │ │ u:7123 │ │ u:9001 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘

✅ All shards TOGETHER = ALL the data (Collectively Exhaustive)
✅ No two shards have the SAME data (Mutually Exclusive)

Sharding Keys

The sharding key determines which shard a piece of data goes to.

https://medium.com/media/e740c438eace6bb66819d3a13e3cc9a3/href
💡 Connection to Consistent Hashing: Remember the circle with servers placed around it? Each server on that circle is a shard. The request/user ID is hashed to a position on the ring, and you move clockwise to find the shard.

Why Keep Related Data in the Same Shard?

This is the concept of co-location. Consider this example:

Query: "Get all friends of user 107"

❌ BAD (friends in different shards):
→ Query Shard A for 107's friends
→ Query Shard B for 107's friends
→ Query Shard C for 107's friends
→ Aggregate results
= INTERSHARD query (slow, network-heavy, inefficient)

✅ GOOD (user + their friends in same shard):
→ Query Shard A only (107 AND all his friend entries are here)
= INTRASHARD query (fast, single machine, efficient)
Rule of thumb: Design your sharding key so that frequently joined data lives in the same shard. This converts expensive inter-shard queries into cheap intra-shard queries.

Intrashard vs Intershard Requests

Type Description Performance Preferred? Intra-shard Query handled within a single shard. Fast ✅YES Intershard Query requires going to multiple shards. Slow ❌Avoid

The Hot Shard Problem 🔥

A hot shard occurs when one shard receives disproportionately more traffic than others.

Example: Justin Bieber's data on a celebrity-heavy shard

Shard A (celebrities): 💥💥💥💥💥💥💥 (millions of read requests)
Shard B (normal users): 💡 (low traffic)
Shard C (normal users): 💡 (low traffic)

Solutions:

  1. Hot-specific sharding — Put popular users in dedicated shards with higher replication
  2. Caching — Cache celebrity posts so DB is not hit every time (Facebook News Feed approach)

🔁 Replication — Copying Data

The Problem: Too Many Requests for One Machine

Scenario: Justin Bieber's posts stored on a single machine

User 1 ──┐
User 2 ──┤
User 3 ──┤──► Single DB Machine ◄── 💥 Overloaded!
User 4 ──┤ (Millions of read requests)
User 5 ──┘

Even after sharding, a single machine per shard may be overwhelmed by read requests.

What is Replication?

Replication = Creating multiple identical copies of the same data on different machines so more users can read from different copies simultaneously.
BEFORE Replication:
Users ──► [Single Copy of Justin Bieber's Posts] ◄── 💥 Overloaded

AFTER Replication:
Users 1-100M ──► [Copy A — Original] ◄── ✅ Manageable!
Users 100M-200M ► [Copy B — Replica 1] ◄── ✅ Manageable!
Users 200M-300M ► [Copy C — Replica 2] ◄── ✅ Manageable!

Sharding + Replication Together

These two concepts can and should exist simultaneously:

3 Billion Facebook Users


┌──────────────────────────────────────────┐
│ SHARDING │
│ (Because storage is too large) │
└──────┬─────────────┬──────────────┬──────┘
│ │ │
▼ ▼ ▼
[SHARD 1] [SHARD 2] [SHARD 3]
Users:1-1B Users:1B-2B Users:2B-3B
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ REPLICATION │
│ (Because request volume is too high) │
└──────────────────────────────────────────┘
│ │ │
┌─────┤ ┌─────┤ ┌────┤
│ │ │ │ │ │
[S1 R1][S1 R2] [S2 R1][S2 R2] [S3 R1][S3 R2]
Shard1 Shard1 Shard2 Shard2 Shard3 Shard3
Replica1 Repl2 Repl1 Repl2 Repl1 Repl2
💡 Think of each shard as a subset of your data universe, and each replica as a clone of that shard.

⚖️ Sharding vs Replication — Key Differences

https://medium.com/media/3a4360b2153bcafc7754f487c7498d45/href
SHARDING:                           REPLICATION:
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│ A-F │ │ G-M │ │ N-Z │ │ ALL │ │ ALL │ │ ALL │
│data │ │data │ │data │ │data │ │data │ │data │
└─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘
Different data on each Same data on each

PART 2: CAP THEOREM

🧩 What is CAP Theorem?

CAP Theorem states that in any distributed system, you can only guarantee two out of these three properties simultaneously:
* C — Consistency
* A — Availability
* P — Partition Tolerance

🎬 Visual Overview — CAP Theorem & PACELC Animated Diagram

The diagram above animates the full CAP triangle, database placements (CP / AP / CA), and the PACELC decision flow — all on a single reference poster.
            CONSISTENCY (C)

/ \
/ \
/ ⚠️ \
/ Pick \
/ 2 of 3 \
/ \
AVAILABILITY ───────────── PARTITION
(A) TOLERANCE (P)

ChoiceProperties GuaranteedSacrificedCAConsistent + AvailablePartition ToleranceCPConsistent + Partition TolerantAvailabilityAPAvailable + Partition TolerantConsistency

⚠️ Critical Insight: In practice, network partitions ALWAYS happen (networks are unreliable). Therefore, partition tolerance is non-negotiable in distributed systems. The real choice is always between CP or AP.

✅ C — Consistency

Consistency means every read returns the latest write, OR equivalently, every machine has the same latest view of the truth.
CONSISTENT System:
WRITE: "Raju's DOB = 1990"

┌──────▼──────┐
│ Machine 1 │ ── "Raju DOB = 1990" ✅
└─────────────┘
┌─────────────┐
│ Machine 2 │ ── "Raju DOB = 1990" ✅
└─────────────┘
READ from any machine → same answer ✅
INCONSISTENT System:
WRITE went to Machine 1 only:
┌─────────────┐
│ Machine 1 │ ── "Raju DOB = 1990" ✅
└─────────────┘
┌─────────────┐
│ Machine 2 │ ── "Raju DOB = ???" ❌ (stale)
└─────────────┘
READ from Machine 2 → wrong/missing answer ❌

Formal Definition

Every READ must return the result of the most recent WRITE
(or return an error — never stale data)

Real-World Analogy

✅ A — Availability

Availability means the system always responds to every request — even if the response isn’t the most current data. A system is available if it responds without error.
⚠️ Availability does NOT guarantee the response is correct — just that there IS a response.
AVAILABLE System (even if stale):
User asks: "How many views does this YouTube video have?"

Machine 1 says: "10,242 views" ← May not be exactly right
Machine 2 says: "10,238 views" ← May not be exactly right

But BOTH respond! No error. = AVAILABLE ✅

NOT AVAILABLE System:
User asks: "What is my ATM balance?"

System says: "Sorry, try again later" ← ERROR response = NOT AVAILABLE ❌

Real-World Analogy

✅ P — Partition Tolerance

Partition = A network partition — a scenario where two machines temporarily cannot communicate with each other (the network link between them is broken).
NETWORK PARTITION:

┌─────────────┐ ✂️ BROKEN ✂️ ┌─────────────┐
│ Machine 1 │ ─────────── X ────────── │ Machine 2 │
└─────────────┘ └─────────────┘

- M1 can still talk to clients ✅
- M2 can still talk to clients ✅
- M1 and M2 CANNOT talk to each other ❌
Partition Tolerance = The system continues to function even when a network partition occurs.

Why Partitions Always Happen

Networks are always unreliable. Even inside a single data center:

Network is always unreliable. This is a fundamental truth of distributed computing.

📖 CAP in Practice — The Hardep & Nikl Story

Let’s walk through a concrete example to understand CP vs AP.

The Scenario

Hardep runs an “Event Reminder Service” — people call in to register events; they call back later to recall them. As the service grows, Hardep hires Nikl. Now two people serve calls via a load balancer.

              CLIENTS
/ \
/ \
┌────────────────────────┐
│ LOAD BALANCER │
└────────┬───────────────┘

┌──────┴──────┐
│ │
┌───▼───┐ ┌───▼───┐
│ HARDEP│ │ NIKL │
│(Diary)│ │(Diary)│
└───────┘ └───────┘

Option 1 — AP System (Available, Partition Tolerant — NOT Consistent)

Protocol: Each person writes only in their own diary.

WRITE:
Shiv calls in → Load balancer routes to Hardep
Hardep writes: "Shiv: girlfriend's wedding, 28 Feb 7pm"
✅ Responds immediately

READ (later):
Shiv calls back → Load balancer routes to Nikl (randomly!)
Nikl checks his diary: "I have nothing for Shiv"
❌ Returns wrong answer — "No events found!"

Result:

Option 2 — CP System (Consistent, Partition Tolerant — NOT Available)

Protocol (updated): Before acknowledging any write, the receiver writes in BOTH diaries.

WRITE:
Shiv calls in → routes to Hardep
Hardep writes in his own diary ✓
Hardep then calls Nikl: "Please write this too"
Nikhil writes ✓ and confirms
ONLY THEN: Hardep tells Shiv "Your event is registered" ✅
READ (later):
Shiv calls → routes to Nikl
Nikhil checks diary: "Shiv: girlfriend's wedding, 28 Feb 7pm"
✅ Returns correct answer

Problem: What if Nikhil goes on holiday (the machine goes down)?

Nikhil is OFFLINE (vacation in Lonavala! 🏖️)

Shiv calls in → routes to Hardeep
Hardeep writes in his own diary ✓
Hardeep calls Nikhil: [no answer] ❌

HARDEP CANNOT ACKNOWLEDGE THE WRITE →
System is UNAVAILABLE for writes ❌

Result:

Option 3 — AP System (with eventual sync)

Protocol: Write locally when the partner is down; sync when they come back.

Nikhil is OFFLINE:
Shiv registers event → Hardeep writes locally, acknowledges immediately ✅
(Shiv is happy — quick response)

When Nikhil comes BACK ONLINE:
Hardeep sends all pending entries to Nikhil
Nikhil writes them ✓ — now both are synced

BUT: If network partition happens DURING sync:
→ Some entries may not reach Nikhil
→ Nikhil comes online, gets some requests
→ Nikhil returns stale data ❌

Result:

Summary: The Three Options

┌──────────────────────────────────────────────────────────────────────┐
│ Option 1: AP (each writes locally, no sync) │
│ ✅ Always responds (Available) │
│ ✅ Handles partitions fine (Partition Tolerant) │
│ ❌ Different machines return different data (Not Consistent) │
│ │
│ Option 2: CP (write to both, fail if one is down) │
│ ✅ Every read returns latest write (Consistent) │
│ ✅ Handles partitions by refusing to accept inconsistency │
│ ❌ System is unavailable when one machine is down (Not Available) │
│ │
│ Option 3: AP with eventual consistency (write locally, sync later) │
│ ✅ Always responds (Available) │
│ ✅ Handles partitions (Partition Tolerant) │
│ ❌ Stale reads possible during/after partition (Not Consistent) │
└──────────────────────────────────────────────────────────────────────┘

🎯 Choosing CP vs AP

In distributed systems, Partition Tolerance is always required. Networks always fail sometimes. The choice is: CP or AP.

Choose CP (Consistency + Partition Tolerance) when:

Choose AP (Availability + Partition Tolerance) when:

CA (without Partition Tolerance) — Only on Single Machines

https://medium.com/media/304c16c99fad1c9d6d5972d6ca2a3ea2/href
CA systems exist in practice, but only for specific use cases within a larger distributed system — e.g., a single MySQL box handling blacklist lookups (consistent + available, but no need for partition tolerance because it’s ONE machine).

PART 3: PACELC THEOREM

⚡ What is PACELC Theorem?

PACELC extends CAP theorem by adding a second trade-off that applies even when there is no partition.

PACELC = Partition → Availability or Consistency; Else → Latency or Consistency
┌─────────────────────────────────────────────────────────┐
│ PACELC THEOREM │
├─────────────────────────────────────────────────────────┤
│ │
│ IF Partition Happens: │
│ → Choose between AVAILABILITY or CONSISTENCY │
│ (this is the CAP theorem part) │
│ │
│ ELSE (Normal operation, no partition): │
│ → Choose between LOW LATENCY or CONSISTENCY │
│ (this is the NEW insight) │
│ │
└─────────────────────────────────────────────────────────┘

⏱️ Latency vs Consistency Trade-off

What is Latency?

Latency = the time it takes for a request to be completed.
* Low Latency = Fast (good ✅)
* High Latency = Slow (bad ❌)

Why Does Consistency Cost Latency?

Going back to the Hardeep & Nikhil example:

HIGH CONSISTENCY Protocol (CP):
Client writes → Hardep writes locally → Hardep WAITS for Nikl to confirm

EXTRA TIME = EXTRA LATENCY


Total Time: local write + network round trip + Nikl's write + confirmation

LOW LATENCY Protocol (AP):
Client writes → Hardep writes locally → Hardep immediately responds ✅

Total Time: just local write (fast!)
But: consistency is sacrificed

The Trade-off Visualized

HIGH CONSISTENCY ←──────────────→ LOW LATENCY


Banking ATM Booking Systems Social Feed DNS View Counts
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Must be Must be Eventual Stale OK Stale OK
consistent consistent consistency (TTL) (approx)
(slow ok) (slow ok) (fast!) (fast!)

🌍 Real-World Examples

IRCTC / Flight Booking (CP + High Latency is OK)

User clicks "Book Seat 12A" on IRCTC:

→ Request goes to system
→ System must ensure NO TWO USERS book the same seat
→ Uses distributed lock / 2-phase commit
→ This takes 2-3 seconds ← HIGH LATENCY

But: ✅ Nobody ever gets double-booked
You’ve experienced this! That 2–3 second wait on IRCTC after clicking “Book” — that’s the system ensuring consistency at the cost of latency.

YouTube Views (AP + Low Latency)

User visits video:
→ System returns view count immediately (maybe slightly stale)
→ No waiting for all replica counts to sync
→ Response in < 50ms ← LOW LATENCY

But: Different users may see slightly different view counts ❌ (Not consistent)

Zerodha Stock Prices (CP even with latency)

Trader checks Adani stock price:
→ System MUST return latest price
→ Worth a small wait to ensure accuracy
→ Wrong data = wrong trades = financial loss

PACELC Classification of Common Databases

https://medium.com/media/b0f2c5e36689393d58415be793d4a063/href

PART 4: MASTER-SLAVE ARCHITECTURE

🏛️ Master-Slave Architecture

Master-Slave (also called Primary-Replica) is the most common replication pattern.

                    CLIENTS

┌────────▼────────┐
│ LOAD BALANCER │
└────────┬────────┘

┌─────────────┼─────────────┐
│ │ │
READS │ READS │ READS │
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ REPLICA │ │ REPLICA │ │ MASTER │ ◄── WRITES go here ONLY
│ (Slave) │ │ (Slave) │ │(Primary)│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└─────────────┼─────────────┘

REPLICATION
(async/sync)

Key Rules

https://medium.com/media/6c20d8c5de76deb40815ead768ba70b7/href

Why Separate Reads from Writes?

In typical web applications:

By routing reads to replicas, we dramatically reduce master load.

📖 Read Replicas

Synchronous vs Asynchronous Replication

SYNCHRONOUS REPLICATION (Strong Consistency):

Client → WRITE → Master

├─────────────► Replica 1 (write confirmed)
│ │
└─────────────► Replica 2 (write confirmed)

Only THEN → ACK to Client

✅ All replicas always in sync
❌ Write latency = slowest replica's write time


ASYNCHRONOUS REPLICATION (Eventual Consistency):

Client → WRITE → Master → ACK to Client immediately ✅

└──(async)──► Replica 1 (queued write)
└──(async)──► Replica 2 (queued write)

✅ Low write latency
❌ Replicas may lag — reads from replica may be stale

Replication Lag

Replication Lag = the time delay between a write to the master and that write appearing on all replicas.
Timeline:
T+0ms → User writes post
T+0ms → Master has the post ✅
T+50ms → Replica 1 has the post ✅
T+100ms → Replica 2 has the post ✅

During T+0ms to T+50ms:
→ If user's next READ goes to Replica 1, they DON'T see their own post! 😱
→ This is "Read Your Own Write" consistency violation

Solutions to Replication Lag Issues:

https://medium.com/media/bfc27545250504e737347cac1771db6d/href

🔄 Failover & Recovery

What Happens When Master Goes Down?

BEFORE FAILURE:
Master ────(async)────► Replica 1
────(async)────► Replica 2

MASTER FAILS:
Master ✗ Replica 1 ← Promoted to new Master!
Replica 2 ← Now replicates from Replica 1

Failover Challenges

  1. Data Loss Risk: If async replication, the failed master may have had some writes that never made it to replicas
  2. Split-Brain Problem: Two nodes think they’re both Master — can lead to conflicts
  3. Detection Time: How quickly do you detect the master failed?
  4. Promotion Complexity: Which replica becomes the new master? (Usually the most up-to-date one)

Semi-Synchronous Replication

A middle ground:

WRITE → Master stores + sends to at least ONE replica
→ Gets ack from that ONE replica
→ Acknowledges to client

✅ At most one replica's worth of lag survived
✅ Better performance than full synchronous
✅ Better durability than full async

PART 5: SUMMARY & INTERVIEW PREP

📋 Quick Reference Cheatsheet

CAP Theorem Summary

┌──────────────────────────────────────────────────────────────┐
│ CAP THEOREM CHEATSHEET │
├──────────────────────┬───────────────────────────────────────┤
│ C (Consistency) │ Every read returns the latest write │
│ A (Availability) │ Every request gets a response (no err)│
│ P (Partition Tol.) │ System works despite network failures │
├──────────────────────┼───────────────────────────────────────┤
│ CP System │ Prioritizes accuracy over uptime │
│ AP System │ Prioritizes uptime over accuracy │
│ CA System │ Only possible on single machine │
├──────────────────────┼───────────────────────────────────────┤
│ In Distributed? │ P is always required! → CP or AP only │
└──────────────────────┴───────────────────────────────────────┘

PACELC Summary

┌──────────────────────────────────────────────────────────────┐
│ PACELC THEOREM CHEATSHEET │
├──────────────────────────┬───────────────────────────────────┤
│ P (Partition) │ When partition: A vs C │
│ E (Else / normal) │ No partition: L vs C │
│ A (Availability) │ Respond even if stale │
│ C (Consistency) │ Only respond if data is current │
│ L (Latency) │ Respond fast (may not sync first) │
├──────────────────────────┼───────────────────────────────────┤
│ PA/EL (Cassandra) │ AP + fast, eventual consistency │
│ PC/EC (Zookeeper) │ CP + always consistent, slower │
└──────────────────────────┴───────────────────────────────────┘

Sharding vs Replication Summary

┌────────────────┬──────────────────────┬──────────────────────┐
│ │ SHARDING │ REPLICATION │
├────────────────┼──────────────────────┼──────────────────────┤
│ What it does │ Splits data across │ Copies same data to │
│ │ multiple machines │ multiple machines │
├────────────────┼──────────────────────┼──────────────────────┤
│ Data per box │ DIFFERENT subsets │ SAME full copy │
├────────────────┼──────────────────────┼──────────────────────┤
│ Solves │ Storage too large │ Too many requests │
├────────────────┼──────────────────────┼──────────────────────┤
│ MECE? │ Yes — exclusive + │ No — all are same │
│ │ exhaustive │ │
├────────────────┼──────────────────────┼──────────────────────┤
│ Can coexist? │ YES — a shard can │ YES — a shard can │
│ │ have replicas │ also be sharded │
└────────────────┴──────────────────────┴──────────────────────┘

🎯 Practice Exercises & Interview Questions

Conceptual Questions

Q1. What is the CAP theorem? Explain with a real-world example.

Q2. Why can’t a distributed system be simultaneously consistent, available, AND partition tolerant?

Q3. A social media platform shows slightly stale “likes” count on posts. Is this CP or AP? Why?

Q4. An online banking system must ensure that a user cannot withdraw money they don’t have, even if two ATMs are used simultaneously. Is this CP or AP? Why?

Q5. Explain the difference between sharding and replication. Can they work together?

Q6. What is a “hot shard”? How would you solve it in the context of a social network like Instagram?

Q7. What is replication lag? When does it cause problems? How do you mitigate it?

Q8. What is the PACELC theorem? How does it extend CAP?

Q9. When would you choose synchronous replication vs asynchronous replication?

Q10. What is an intra shard query vs an intershard query? Which is preferred and why?

System Design Application Questions

Q11. You are designing a chess game system. Should you use CP or AP for game state? Why?

Q12. You are designing a “recently viewed products” feature on Amazon. Would you use CP or AP? Why?

Q13. Design the sharding strategy for a WhatsApp chat system. What would be your sharding key?

Q14. You have a single MySQL master that’s getting overwhelmed by read queries. What would you do? Walk through the trade-offs.

Q15. Facebook stores Justin Bieber’s posts. Millions of users try to read his posts per second. How would you design this system? (Hint: hot shard + replication + caching)

📝 Solutions

Q1 — CAP Theorem Explanation

CAP Theorem states that a distributed system can only guarantee two of three properties: Consistency (every read returns latest write), Availability (every request gets a response), and Partition Tolerance (system works despite network failures).

Real-world example — IRCTC ticket booking:

IRCTC chooses CP: It’s better for the system to be temporarily slow or unavailable than to double-book a seat.

Q2 — Why No CAP Together?

When a network partition occurs (which is inevitable), a node must decide: do I respond to the client (Availability) or do I wait until the other node confirms (Consistency)?

There’s no way to guarantee both during a partition. Hence: pick one.

Q3 & Q4 — AP vs CP Classification

System Choice Reason Social media likes countAPStale count is fine; availability matters moreBank ATM withdrawalCPWrong balance data is catastrophic; prefer unavailability

Q5 — Sharding + Replication Together

Yes! They are complementary, not mutually exclusive.

Common pattern: Shard your data into N shards, then replicate each shard M times → N × M total machines.

Q11 — Chess Game (CP)

Chess requires CP because:

Q12 — Recently Viewed Products (AP)

“Recently viewed” is AP because:

Q13 — WhatsApp Chat Sharding Key

Sharding Key: Chat Room ID (or Conversation ID)

Why:

Consideration: Group chats with millions of members could become hot shards → may need per-group-specific replication strategy.

Q14 — MySQL Overwhelmed by Reads

  1. Add Read Replicas — Route all read queries to replicas
  2. Add a Caching Layer — Redis in front of DB for hot queries
  3. Monitor Replication Lag — Ensure replicas don’t fall too far behind
  4. Read-Your-Own-Write Routing — Route user’s reads to the master for N seconds after their write

Trade-off: Adding replicas with async replication → eventual consistency. If strong consistency is needed, → sync replication (higher latency) or always read from master.

Q15 — Justin Bieber’s Posts (Hot Shard + Replication + Caching)

Hot account problem (also called “celebrity problem”):

Solution layers:
1. Dedicated hot shard for celebrities (separate from normal users)
2. High replication factor for hot shards (e.g., 100 replicas vs 3 for normal users)
3. Aggressive caching — cache celebrity posts at CDN / application layer
4. Fan-out on read (pull model) — don't push to all followers; pull from celebrity shard when a follower loads feed

The Facebook News Feed solution: Use a Recent Posts DB (cache) that holds only the latest posts from celebrities, avoiding hitting the main shards on every feed load.

📚 References and Resources

Academic & Formal References

Database Documentation

Study Resources

Videos & Blogs

📌 Remember: CAP Theorem and PACELC are frameworks for thinking, not rigid dogmas. Real-world databases often allow you to tune consistency vs availability per operation (e.g., DynamoDB’s ConsistentRead, Cassandra's consistency levels). The key is understanding the trade-offs so you can make informed decisions.

CAP Theorem, PACELC & Replication — The Ultimate System Design Guide was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This article was originally published on Level Up Coding and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].

NexaPay — Accept Card Payments, Receive Crypto

No KYC · Instant Settlement · Visa, Mastercard, Apple Pay, Google Pay

Get Started →