Part 3
Last Updated: February 2026 Author: Ishwar Ambare
Topics: Sharding, Replication, CAP Theorem, PACELC Theorem, Master-Slave Architecture, Consistency vs Availability

📋 Table of Contents
Part 1: Foundations
- The Big Picture Architecture
- Sharding — Distributing Data
- Replication — Copying Data
- Sharding vs Replication — Side by Side
Part 2: CAP Theorem
- What is CAP Theorem?
- Consistency ©
- Availability (A)
- Partition Tolerance (P)
- CAP in Practice — The Hardep & Nikl Story
- Choosing CP vs AP
Part 3: PACELC Theorem
- What is PACELC Theorem?
- Latency vs Consistency Trade-off
- Real-World Examples
Part 4: Master-Slave Replication
- Master-Slave Architecture
- Read Replicas
- Failover & Recovery
Part 5: Summary & Interview Prep
- Quick Reference Cheatsheet
- Practice Exercises & Interview Questions
- Solutions
- References and Resources
PART 1: FOUNDATIONS
🏗️ The Big Picture Architecture
Before diving into CAP Theorem, let’s ground ourselves in how large-scale systems look:
┌─────────────────────────────────┐
│ CLIENT (Browser) │
└──────────────┬──────────────────┘
│ DNS Resolution
▼
┌─────────────────────────────────┐
│ DNS │
└──────────────┬──────────────────┘
│ IP Address returned
▼
┌─────────────────────────────────┐
│ GATEWAY / LOAD BALANCER │ ◄── SSL termination here
│ (First point of contact) │ Protocol translation
└────────┬──────────┬─────────────┘ Rate limiting, etc.
│ │
┌────────────┘ └────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ App Server 1 │ │ App Server 2 │
│ (Compute Layer) │ │ (Compute Layer) │
└────────┬─────────┘ └────────┬─────────┘
│ │
└────────────────┬───────────────────┘
│
┌──────────────▼───────────────────┐
│ GLOBAL CACHE │ ◄── Redis / Memcached
└──────────────┬───────────────────┘
│
┌──────────────▼───────────────────┐
│ STORAGE / DATABASE │ ◄── Source of Truth
└──────────────────────────────────┘
Key insight: The Gateway box is called a “gateway” — not just a load balancer. It does more than load balancing. It handles SSL termination, protocol translation, rate limiting, authentication, and more.
Why Multiple App Servers?
https://medium.com/media/767978e30f65c550dfdd7e7d6cadfe1b/hrefElasticity is the ability to adjust resources. You can add more compute when traffic is high and remove it when traffic is low. Cloud platforms like AWS support this via Auto Scaling.
💡 Storage ≠ Compute — Compute is highly elastic (scale up/down rapidly). Storage is deliberately static — you plan ahead, you don’t rapidly add/remove DB boxes on the fly.
🔀 Sharding — Distributing Data
The Problem: Too Much Data for One Machine
Consider Facebook’s User Database:
Users Table (3 Billion Entries):
┌─────────┬────────┬────────┬─────────────────┬──────────┬───────────────┐
│ user_id │ name │ gender │ relationship_st │ DOB │ last_updated │
├─────────┼────────┼────────┼─────────────────┼──────────┼───────────────┤
│ 107 │ Raju │ Male │ Married │ 1990-01 │ 2024-01-01 │
│ 998 │ ... │ ... │ ... │ ... │ ... │
│ 1070 │ ... │ ... │ ... │ ... │ ... │
└─────────┴────────┴────────┴─────────────────┴──────────┴───────────────┘
User Friends Table (Billions more entries):
┌─────────┬───────────┐
│ user_id │ friend_id │
├─────────┼───────────┤
│ 107 │ 998 │ ← Raju is friends with user 998
│ 107 │ 1070 │ ← Raju is friends with user 1070
│ 107 │ 66 │ ← Raju is friends with user 66
└─────────┴───────────┘
A single machine CANNOT handle:
- Storage: ~100 TB+ for billions of users (each with profile, posts, media links…)
- Load: Billions of requests per day
What is Sharding?
Sharding = Distributing data across multiple machines in a Mutually Exclusive, Collectively Exhaustive (MECE) way.
UNIVERSE OF ALL DATA
┌──────────────────────────────────────────────────────────────────┐
│ user:107 user:998 user:1070 user:66 user:442 user:5001 ... │
└──────────────────────────────────────────────────────────────────┘
│
SHARDING splits this into:
│
┌──────────┬──────────┬──────────┬──────────┬──────────┐
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ SHARD A │ │ SHARD B │ │ SHARD C │ │ SHARD D │ │ SHARD E │
│ u:107 │ │ u:998 │ │ u:1070 │ │ u:66 │ │ u:442 │
│ u:28 │ │ u:319 │ │ u:2050 │ │ u:7123 │ │ u:9001 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
✅ All shards TOGETHER = ALL the data (Collectively Exhaustive)
✅ No two shards have the SAME data (Mutually Exclusive)
Sharding Keys
The sharding key determines which shard a piece of data goes to.
https://medium.com/media/e740c438eace6bb66819d3a13e3cc9a3/href💡 Connection to Consistent Hashing: Remember the circle with servers placed around it? Each server on that circle is a shard. The request/user ID is hashed to a position on the ring, and you move clockwise to find the shard.
Why Keep Related Data in the Same Shard?
This is the concept of co-location. Consider this example:
Query: "Get all friends of user 107"
❌ BAD (friends in different shards):
→ Query Shard A for 107's friends
→ Query Shard B for 107's friends
→ Query Shard C for 107's friends
→ Aggregate results
= INTERSHARD query (slow, network-heavy, inefficient)
✅ GOOD (user + their friends in same shard):
→ Query Shard A only (107 AND all his friend entries are here)
= INTRASHARD query (fast, single machine, efficient)
Rule of thumb: Design your sharding key so that frequently joined data lives in the same shard. This converts expensive inter-shard queries into cheap intra-shard queries.
Intrashard vs Intershard Requests
Type Description Performance Preferred? Intra-shard Query handled within a single shard. Fast ✅YES Intershard Query requires going to multiple shards. Slow ❌Avoid
The Hot Shard Problem 🔥
A hot shard occurs when one shard receives disproportionately more traffic than others.
Example: Justin Bieber's data on a celebrity-heavy shard
Shard A (celebrities): 💥💥💥💥💥💥💥 (millions of read requests)
Shard B (normal users): 💡 (low traffic)
Shard C (normal users): 💡 (low traffic)
Solutions:
- Hot-specific sharding — Put popular users in dedicated shards with higher replication
- Caching — Cache celebrity posts so DB is not hit every time (Facebook News Feed approach)
🔁 Replication — Copying Data
The Problem: Too Many Requests for One Machine
Scenario: Justin Bieber's posts stored on a single machine
User 1 ──┐
User 2 ──┤
User 3 ──┤──► Single DB Machine ◄── 💥 Overloaded!
User 4 ──┤ (Millions of read requests)
User 5 ──┘
Even after sharding, a single machine per shard may be overwhelmed by read requests.
What is Replication?
Replication = Creating multiple identical copies of the same data on different machines so more users can read from different copies simultaneously.
BEFORE Replication:
Users ──► [Single Copy of Justin Bieber's Posts] ◄── 💥 Overloaded
AFTER Replication:
Users 1-100M ──► [Copy A — Original] ◄── ✅ Manageable!
Users 100M-200M ► [Copy B — Replica 1] ◄── ✅ Manageable!
Users 200M-300M ► [Copy C — Replica 2] ◄── ✅ Manageable!
Sharding + Replication Together
These two concepts can and should exist simultaneously:
3 Billion Facebook Users
│
▼
┌──────────────────────────────────────────┐
│ SHARDING │
│ (Because storage is too large) │
└──────┬─────────────┬──────────────┬──────┘
│ │ │
▼ ▼ ▼
[SHARD 1] [SHARD 2] [SHARD 3]
Users:1-1B Users:1B-2B Users:2B-3B
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ REPLICATION │
│ (Because request volume is too high) │
└──────────────────────────────────────────┘
│ │ │
┌─────┤ ┌─────┤ ┌────┤
│ │ │ │ │ │
[S1 R1][S1 R2] [S2 R1][S2 R2] [S3 R1][S3 R2]
Shard1 Shard1 Shard2 Shard2 Shard3 Shard3
Replica1 Repl2 Repl1 Repl2 Repl1 Repl2
💡 Think of each shard as a subset of your data universe, and each replica as a clone of that shard.
⚖️ Sharding vs Replication — Key Differences
https://medium.com/media/3a4360b2153bcafc7754f487c7498d45/hrefSHARDING: REPLICATION:
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│ A-F │ │ G-M │ │ N-Z │ │ ALL │ │ ALL │ │ ALL │
│data │ │data │ │data │ │data │ │data │ │data │
└─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘
Different data on each Same data on each
PART 2: CAP THEOREM
🧩 What is CAP Theorem?
CAP Theorem states that in any distributed system, you can only guarantee two out of these three properties simultaneously:
* C — Consistency
* A — Availability
* P — Partition Tolerance
🎬 Visual Overview — CAP Theorem & PACELC Animated Diagram

The diagram above animates the full CAP triangle, database placements (CP / AP / CA), and the PACELC decision flow — all on a single reference poster.
CONSISTENCY (C)
△
/ \
/ \
/ ⚠️ \
/ Pick \
/ 2 of 3 \
/ \
AVAILABILITY ───────────── PARTITION
(A) TOLERANCE (P)
ChoiceProperties GuaranteedSacrificedCAConsistent + AvailablePartition ToleranceCPConsistent + Partition TolerantAvailabilityAPAvailable + Partition TolerantConsistency
⚠️ Critical Insight: In practice, network partitions ALWAYS happen (networks are unreliable). Therefore, partition tolerance is non-negotiable in distributed systems. The real choice is always between CP or AP.
✅ C — Consistency
Consistency means every read returns the latest write, OR equivalently, every machine has the same latest view of the truth.
CONSISTENT System:
WRITE: "Raju's DOB = 1990"
│
┌──────▼──────┐
│ Machine 1 │ ── "Raju DOB = 1990" ✅
└─────────────┘
┌─────────────┐
│ Machine 2 │ ── "Raju DOB = 1990" ✅
└─────────────┘
READ from any machine → same answer ✅
INCONSISTENT System:
WRITE went to Machine 1 only:
┌─────────────┐
│ Machine 1 │ ── "Raju DOB = 1990" ✅
└─────────────┘
┌─────────────┐
│ Machine 2 │ ── "Raju DOB = ???" ❌ (stale)
└─────────────┘
READ from Machine 2 → wrong/missing answer ❌
Formal Definition
Every READ must return the result of the most recent WRITE
(or return an error — never stale data)
Real-World Analogy
- 🏦 Bank ATM: You deposited ₹10 lakh. Your ATM must show ₹10 lakh when you check the balance — not show ₹500. Consistency is non-negotiable here.
- 💹 Zerodha (Stock Price): If Adani stock is ₹1200, showing ₹1800 to someone else is unacceptable. Wrong price = wrong trade decisions.
✅ A — Availability
Availability means the system always responds to every request — even if the response isn’t the most current data. A system is available if it responds without error.
⚠️ Availability does NOT guarantee the response is correct — just that there IS a response.
AVAILABLE System (even if stale):
User asks: "How many views does this YouTube video have?"
Machine 1 says: "10,242 views" ← May not be exactly right
Machine 2 says: "10,238 views" ← May not be exactly right
But BOTH respond! No error. = AVAILABLE ✅
NOT AVAILABLE System:
User asks: "What is my ATM balance?"
System says: "Sorry, try again later" ← ERROR response = NOT AVAILABLE ❌
Real-World Analogy
- 📺 YouTube view count: Does not show exact real-time count, but ALWAYS shows something. Prioritises availability.
- 📺 Hotstar live viewers: “2.4 million watching” might not be exact, but the count is always there.
- 🏦 ATM: Better to say “temporarily unavailable” than show a wrong balance. Prioritizes consistency over availability.
✅ P — Partition Tolerance
Partition = A network partition — a scenario where two machines temporarily cannot communicate with each other (the network link between them is broken).
NETWORK PARTITION:
┌─────────────┐ ✂️ BROKEN ✂️ ┌─────────────┐
│ Machine 1 │ ─────────── X ────────── │ Machine 2 │
└─────────────┘ └─────────────┘
- M1 can still talk to clients ✅
- M2 can still talk to clients ✅
- M1 and M2 CANNOT talk to each other ❌
Partition Tolerance = The system continues to function even when a network partition occurs.
Why Partitions Always Happen
Networks are always unreliable. Even inside a single data center:
- Cables can fail
- Network ports can drop
- Packets can be lost
- Switches can fail
- Software bugs can block communication
Network is always unreliable. This is a fundamental truth of distributed computing.
📖 CAP in Practice — The Hardep & Nikl Story
Let’s walk through a concrete example to understand CP vs AP.
The Scenario
Hardep runs an “Event Reminder Service” — people call in to register events; they call back later to recall them. As the service grows, Hardep hires Nikl. Now two people serve calls via a load balancer.
CLIENTS
/ \
/ \
┌────────────────────────┐
│ LOAD BALANCER │
└────────┬───────────────┘
│
┌──────┴──────┐
│ │
┌───▼───┐ ┌───▼───┐
│ HARDEP│ │ NIKL │
│(Diary)│ │(Diary)│
└───────┘ └───────┘
Option 1 — AP System (Available, Partition Tolerant — NOT Consistent)
Protocol: Each person writes only in their own diary.
WRITE:
Shiv calls in → Load balancer routes to Hardep
Hardep writes: "Shiv: girlfriend's wedding, 28 Feb 7pm"
✅ Responds immediately
READ (later):
Shiv calls back → Load balancer routes to Nikl (randomly!)
Nikl checks his diary: "I have nothing for Shiv"
❌ Returns wrong answer — "No events found!"
Result:
- ✅ Available — Both Hardep and Nikl always respond (no errors)
- ❌ Not Consistent — Nikl returns wrong data
- ✅ Partition Tolerant — Even if they stop talking to each other, everything still “works”
Option 2 — CP System (Consistent, Partition Tolerant — NOT Available)
Protocol (updated): Before acknowledging any write, the receiver writes in BOTH diaries.
WRITE:
Shiv calls in → routes to Hardep
Hardep writes in his own diary ✓
Hardep then calls Nikl: "Please write this too"
Nikhil writes ✓ and confirms
ONLY THEN: Hardep tells Shiv "Your event is registered" ✅
READ (later):
Shiv calls → routes to Nikl
Nikhil checks diary: "Shiv: girlfriend's wedding, 28 Feb 7pm"
✅ Returns correct answer
Problem: What if Nikhil goes on holiday (the machine goes down)?
Nikhil is OFFLINE (vacation in Lonavala! 🏖️)
Shiv calls in → routes to Hardeep
Hardeep writes in his own diary ✓
Hardeep calls Nikhil: [no answer] ❌
HARDEP CANNOT ACKNOWLEDGE THE WRITE →
System is UNAVAILABLE for writes ❌
Result:
- ✅ Consistent — When working, every read returns latest write
- ❌ Not Available — If one machine is down, writes fail completely
- ✅ Partition Tolerant — Even during partition, the system rejects writes (stays consistent)
Option 3 — AP System (with eventual sync)
Protocol: Write locally when the partner is down; sync when they come back.
Nikhil is OFFLINE:
Shiv registers event → Hardeep writes locally, acknowledges immediately ✅
(Shiv is happy — quick response)
When Nikhil comes BACK ONLINE:
Hardeep sends all pending entries to Nikhil
Nikhil writes them ✓ — now both are synced
BUT: If network partition happens DURING sync:
→ Some entries may not reach Nikhil
→ Nikhil comes online, gets some requests
→ Nikhil returns stale data ❌
Result:
- ✅ Available — Always responds even when partner is down
- ❌ Not Consistent — During the partition window, stale data is possible
- ✅ Partition Tolerant — Works through network outages
Summary: The Three Options
┌──────────────────────────────────────────────────────────────────────┐
│ Option 1: AP (each writes locally, no sync) │
│ ✅ Always responds (Available) │
│ ✅ Handles partitions fine (Partition Tolerant) │
│ ❌ Different machines return different data (Not Consistent) │
│ │
│ Option 2: CP (write to both, fail if one is down) │
│ ✅ Every read returns latest write (Consistent) │
│ ✅ Handles partitions by refusing to accept inconsistency │
│ ❌ System is unavailable when one machine is down (Not Available) │
│ │
│ Option 3: AP with eventual consistency (write locally, sync later) │
│ ✅ Always responds (Available) │
│ ✅ Handles partitions (Partition Tolerant) │
│ ❌ Stale reads possible during/after partition (Not Consistent) │
└──────────────────────────────────────────────────────────────────────┘
🎯 Choosing CP vs AP
In distributed systems, Partition Tolerance is always required. Networks always fail sometimes. The choice is: CP or AP.
Choose CP (Consistency + Partition Tolerance) when:
- Banking / Financial systems — Wrong balance is catastrophic
- Stock Trading (Zerodha) — Wrong price = bad trades
- IRCTC / Flight Booking — Double-booking a seat is unacceptable
- User Authentication / Blacklisting — If a user is blocked, they must be blocked everywhere
Choose AP (Availability + Partition Tolerance) when:
- YouTube view counts — A slightly stale count is fine; going down is not
- Hotstar live viewer count — Approximate count is OK
- Social media likes/comments — A stale like count won’t hurt anyone
- Product catalog / Prices (non-critical) — Slight staleness is acceptable
- DNS — Propagation delay is acceptable; being unavailable is not
CA (without Partition Tolerance) — Only on Single Machines
https://medium.com/media/304c16c99fad1c9d6d5972d6ca2a3ea2/hrefCA systems exist in practice, but only for specific use cases within a larger distributed system — e.g., a single MySQL box handling blacklist lookups (consistent + available, but no need for partition tolerance because it’s ONE machine).
PART 3: PACELC THEOREM
⚡ What is PACELC Theorem?
PACELC extends CAP theorem by adding a second trade-off that applies even when there is no partition.
PACELC = Partition → Availability or Consistency; Else → Latency or Consistency
┌─────────────────────────────────────────────────────────┐
│ PACELC THEOREM │
├─────────────────────────────────────────────────────────┤
│ │
│ IF Partition Happens: │
│ → Choose between AVAILABILITY or CONSISTENCY │
│ (this is the CAP theorem part) │
│ │
│ ELSE (Normal operation, no partition): │
│ → Choose between LOW LATENCY or CONSISTENCY │
│ (this is the NEW insight) │
│ │
└─────────────────────────────────────────────────────────┘
⏱️ Latency vs Consistency Trade-off
What is Latency?
Latency = the time it takes for a request to be completed.
* Low Latency = Fast (good ✅)
* High Latency = Slow (bad ❌)
Why Does Consistency Cost Latency?
Going back to the Hardeep & Nikhil example:
HIGH CONSISTENCY Protocol (CP):
Client writes → Hardep writes locally → Hardep WAITS for Nikl to confirm
↑
EXTRA TIME = EXTRA LATENCY
Total Time: local write + network round trip + Nikl's write + confirmation
LOW LATENCY Protocol (AP):
Client writes → Hardep writes locally → Hardep immediately responds ✅
Total Time: just local write (fast!)
But: consistency is sacrificed
The Trade-off Visualized
HIGH CONSISTENCY ←──────────────→ LOW LATENCY
Banking ATM Booking Systems Social Feed DNS View Counts
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Must be Must be Eventual Stale OK Stale OK
consistent consistent consistency (TTL) (approx)
(slow ok) (slow ok) (fast!) (fast!)
🌍 Real-World Examples
IRCTC / Flight Booking (CP + High Latency is OK)
User clicks "Book Seat 12A" on IRCTC:
→ Request goes to system
→ System must ensure NO TWO USERS book the same seat
→ Uses distributed lock / 2-phase commit
→ This takes 2-3 seconds ← HIGH LATENCY
But: ✅ Nobody ever gets double-booked
You’ve experienced this! That 2–3 second wait on IRCTC after clicking “Book” — that’s the system ensuring consistency at the cost of latency.
YouTube Views (AP + Low Latency)
User visits video:
→ System returns view count immediately (maybe slightly stale)
→ No waiting for all replica counts to sync
→ Response in < 50ms ← LOW LATENCY
But: Different users may see slightly different view counts ❌ (Not consistent)
Zerodha Stock Prices (CP even with latency)
Trader checks Adani stock price:
→ System MUST return latest price
→ Worth a small wait to ensure accuracy
→ Wrong data = wrong trades = financial loss
PACELC Classification of Common Databases
https://medium.com/media/b0f2c5e36689393d58415be793d4a063/hrefPART 4: MASTER-SLAVE ARCHITECTURE
🏛️ Master-Slave Architecture
Master-Slave (also called Primary-Replica) is the most common replication pattern.
CLIENTS
│
┌────────▼────────┐
│ LOAD BALANCER │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ │ │
READS │ READS │ READS │
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ REPLICA │ │ REPLICA │ │ MASTER │ ◄── WRITES go here ONLY
│ (Slave) │ │ (Slave) │ │(Primary)│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└─────────────┼─────────────┘
│
REPLICATION
(async/sync)
Key Rules
https://medium.com/media/6c20d8c5de76deb40815ead768ba70b7/hrefWhy Separate Reads from Writes?
In typical web applications:
- 80–90% of traffic is reads
- 10–20% of traffic is writes
By routing reads to replicas, we dramatically reduce master load.
📖 Read Replicas
Synchronous vs Asynchronous Replication
SYNCHRONOUS REPLICATION (Strong Consistency):
Client → WRITE → Master
│
├─────────────► Replica 1 (write confirmed)
│ │
└─────────────► Replica 2 (write confirmed)
↑
Only THEN → ACK to Client
✅ All replicas always in sync
❌ Write latency = slowest replica's write time
ASYNCHRONOUS REPLICATION (Eventual Consistency):
Client → WRITE → Master → ACK to Client immediately ✅
│
└──(async)──► Replica 1 (queued write)
└──(async)──► Replica 2 (queued write)
✅ Low write latency
❌ Replicas may lag — reads from replica may be stale
Replication Lag
Replication Lag = the time delay between a write to the master and that write appearing on all replicas.
Timeline:
T+0ms → User writes post
T+0ms → Master has the post ✅
T+50ms → Replica 1 has the post ✅
T+100ms → Replica 2 has the post ✅
During T+0ms to T+50ms:
→ If user's next READ goes to Replica 1, they DON'T see their own post! 😱
→ This is "Read Your Own Write" consistency violation
Solutions to Replication Lag Issues:
https://medium.com/media/bfc27545250504e737347cac1771db6d/href🔄 Failover & Recovery
What Happens When Master Goes Down?
BEFORE FAILURE:
Master ────(async)────► Replica 1
────(async)────► Replica 2
MASTER FAILS:
Master ✗ Replica 1 ← Promoted to new Master!
Replica 2 ← Now replicates from Replica 1
Failover Challenges
- Data Loss Risk: If async replication, the failed master may have had some writes that never made it to replicas
- Split-Brain Problem: Two nodes think they’re both Master — can lead to conflicts
- Detection Time: How quickly do you detect the master failed?
- Promotion Complexity: Which replica becomes the new master? (Usually the most up-to-date one)
Semi-Synchronous Replication
A middle ground:
WRITE → Master stores + sends to at least ONE replica
→ Gets ack from that ONE replica
→ Acknowledges to client
✅ At most one replica's worth of lag survived
✅ Better performance than full synchronous
✅ Better durability than full async
PART 5: SUMMARY & INTERVIEW PREP
📋 Quick Reference Cheatsheet
CAP Theorem Summary
┌──────────────────────────────────────────────────────────────┐
│ CAP THEOREM CHEATSHEET │
├──────────────────────┬───────────────────────────────────────┤
│ C (Consistency) │ Every read returns the latest write │
│ A (Availability) │ Every request gets a response (no err)│
│ P (Partition Tol.) │ System works despite network failures │
├──────────────────────┼───────────────────────────────────────┤
│ CP System │ Prioritizes accuracy over uptime │
│ AP System │ Prioritizes uptime over accuracy │
│ CA System │ Only possible on single machine │
├──────────────────────┼───────────────────────────────────────┤
│ In Distributed? │ P is always required! → CP or AP only │
└──────────────────────┴───────────────────────────────────────┘
PACELC Summary
┌──────────────────────────────────────────────────────────────┐
│ PACELC THEOREM CHEATSHEET │
├──────────────────────────┬───────────────────────────────────┤
│ P (Partition) │ When partition: A vs C │
│ E (Else / normal) │ No partition: L vs C │
│ A (Availability) │ Respond even if stale │
│ C (Consistency) │ Only respond if data is current │
│ L (Latency) │ Respond fast (may not sync first) │
├──────────────────────────┼───────────────────────────────────┤
│ PA/EL (Cassandra) │ AP + fast, eventual consistency │
│ PC/EC (Zookeeper) │ CP + always consistent, slower │
└──────────────────────────┴───────────────────────────────────┘
Sharding vs Replication Summary
┌────────────────┬──────────────────────┬──────────────────────┐
│ │ SHARDING │ REPLICATION │
├────────────────┼──────────────────────┼──────────────────────┤
│ What it does │ Splits data across │ Copies same data to │
│ │ multiple machines │ multiple machines │
├────────────────┼──────────────────────┼──────────────────────┤
│ Data per box │ DIFFERENT subsets │ SAME full copy │
├────────────────┼──────────────────────┼──────────────────────┤
│ Solves │ Storage too large │ Too many requests │
├────────────────┼──────────────────────┼──────────────────────┤
│ MECE? │ Yes — exclusive + │ No — all are same │
│ │ exhaustive │ │
├────────────────┼──────────────────────┼──────────────────────┤
│ Can coexist? │ YES — a shard can │ YES — a shard can │
│ │ have replicas │ also be sharded │
└────────────────┴──────────────────────┴──────────────────────┘
🎯 Practice Exercises & Interview Questions
Conceptual Questions
Q1. What is the CAP theorem? Explain with a real-world example.
Q2. Why can’t a distributed system be simultaneously consistent, available, AND partition tolerant?
Q3. A social media platform shows slightly stale “likes” count on posts. Is this CP or AP? Why?
Q4. An online banking system must ensure that a user cannot withdraw money they don’t have, even if two ATMs are used simultaneously. Is this CP or AP? Why?
Q5. Explain the difference between sharding and replication. Can they work together?
Q6. What is a “hot shard”? How would you solve it in the context of a social network like Instagram?
Q7. What is replication lag? When does it cause problems? How do you mitigate it?
Q8. What is the PACELC theorem? How does it extend CAP?
Q9. When would you choose synchronous replication vs asynchronous replication?
Q10. What is an intra shard query vs an intershard query? Which is preferred and why?
System Design Application Questions
Q11. You are designing a chess game system. Should you use CP or AP for game state? Why?
Q12. You are designing a “recently viewed products” feature on Amazon. Would you use CP or AP? Why?
Q13. Design the sharding strategy for a WhatsApp chat system. What would be your sharding key?
Q14. You have a single MySQL master that’s getting overwhelmed by read queries. What would you do? Walk through the trade-offs.
Q15. Facebook stores Justin Bieber’s posts. Millions of users try to read his posts per second. How would you design this system? (Hint: hot shard + replication + caching)
📝 Solutions
Q1 — CAP Theorem Explanation
CAP Theorem states that a distributed system can only guarantee two of three properties: Consistency (every read returns latest write), Availability (every request gets a response), and Partition Tolerance (system works despite network failures).
Real-world example — IRCTC ticket booking:
- Consistency: Two users cannot book the same seat (strong consistency required)
- Availability: The booking system should respond even during peak loads
- Partition Tolerance: Network issues between servers will inevitably happen
IRCTC chooses CP: It’s better for the system to be temporarily slow or unavailable than to double-book a seat.
Q2 — Why No CAP Together?
When a network partition occurs (which is inevitable), a node must decide: do I respond to the client (Availability) or do I wait until the other node confirms (Consistency)?
- If it responds immediately → Might return stale data → Not Consistent
- If it waits for confirmation → Might timeout → Not Available
There’s no way to guarantee both during a partition. Hence: pick one.
Q3 & Q4 — AP vs CP Classification
System Choice Reason Social media likes countAPStale count is fine; availability matters moreBank ATM withdrawalCPWrong balance data is catastrophic; prefer unavailability
Q5 — Sharding + Replication Together
Yes! They are complementary, not mutually exclusive.
- Sharding handles too much data → split across machines
- Replication handles too many requests → copy to multiple machines
Common pattern: Shard your data into N shards, then replicate each shard M times → N × M total machines.
Q11 — Chess Game (CP)
Chess requires CP because:
- Both players must see the exact same game state
- An inconsistent view (e.g., player A sees different board than player B) would make the game unplayable
- Latency is acceptable (a few hundred ms is fine per move)
Q12 — Recently Viewed Products (AP)
“Recently viewed” is AP because:
- Showing a product that was viewed 2 minutes ago instead of 1 minute ago is fine
- The system must ALWAYS respond (user is shopping; showing nothing is worse than showing slightly stale list)
- Consistency is not critical here
Q13 — WhatsApp Chat Sharding Key
Sharding Key: Chat Room ID (or Conversation ID)
Why:
- All messages in a single chat belong together → intrashard queries for chat history
- Easy to route: hash(chat_id) → shard
- Avoids cross-shard joins for most common queries (fetch messages in a conversation)
Consideration: Group chats with millions of members could become hot shards → may need per-group-specific replication strategy.
Q14 — MySQL Overwhelmed by Reads
- Add Read Replicas — Route all read queries to replicas
- Add a Caching Layer — Redis in front of DB for hot queries
- Monitor Replication Lag — Ensure replicas don’t fall too far behind
- Read-Your-Own-Write Routing — Route user’s reads to the master for N seconds after their write
Trade-off: Adding replicas with async replication → eventual consistency. If strong consistency is needed, → sync replication (higher latency) or always read from master.
Q15 — Justin Bieber’s Posts (Hot Shard + Replication + Caching)
Hot account problem (also called “celebrity problem”):
Solution layers:
1. Dedicated hot shard for celebrities (separate from normal users)
2. High replication factor for hot shards (e.g., 100 replicas vs 3 for normal users)
3. Aggressive caching — cache celebrity posts at CDN / application layer
4. Fan-out on read (pull model) — don't push to all followers; pull from celebrity shard when a follower loads feed
The Facebook News Feed solution: Use a Recent Posts DB (cache) that holds only the latest posts from celebrities, avoiding hitting the main shards on every feed load.
📚 References and Resources
Academic & Formal References
- CAP Theorem — Originally proposed by Eric Brewer (2000), formalized by Gilbert & Lynch (2002)
- Paper: Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
- PACELC Theorem — Proposed by Daniel J. Abadi (2012)
- Paper: Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story
Database Documentation
- Cassandra Architecture — Data Replication
- MongoDB Replication
- MySQL Replication
- PostgreSQL High Availability
Study Resources
- Martin Kleppmann — Designing Data-Intensive Applications (DDIA) ← Must Read
- Chapter 5: Replication
- Chapter 6: Partitioning
- Chapter 9: Consistency and Consensus
- AWS — Eventual Consistency Primer
- Cloudflare — Why Network Partitions Happen
Videos & Blogs
📌 Remember: CAP Theorem and PACELC are frameworks for thinking, not rigid dogmas. Real-world databases often allow you to tune consistency vs availability per operation (e.g., DynamoDB’s ConsistentRead, Cassandra's consistency levels). The key is understanding the trade-offs so you can make informed decisions.

CAP Theorem, PACELC & Replication — The Ultimate System Design Guide was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.