I Built a Database Engine From Scratch in Rust. Here’s What I Learned.

GRANITE-DB

No reason. No startup idea. Just a weekend, a compiler, and a question I couldn’t shake.

A blazing-fast, document-oriented NoSQL database engine built from scratch in Rust.

There’s a specific kind of itch that hits you when you’ve been deep in distributed systems long enough. You’ve used the databases. You’ve debugged them at 2am. You’ve read the papers, the postmortems, the blog posts from engineers who built the things you depend on. And somewhere in the back of your head, a quiet, slightly embarrassing thought forms:

What would it actually take to build one of these?

Not a toy. Not a key-value store wrapped in a struct. A real database — document model, query engine, indexes, transactions, replication, the whole thing. I’d been sitting with that question for a while. Then one week I just… started.

The result is GraniteDB — a document-oriented NoSQL engine built entirely in Rust, with zero external database dependencies. Full WAL, MVCC, ACID transactions, B-tree and hash indexes, an aggregation pipeline, RBAC auth with AES-256-GCM encryption at rest, replication via oplog, consistent hash-based sharding, and a wire protocol with 25+ commands over async TCP.

This article is about what I built, how it works, and what the experience of writing a database from nothing actually teaches you.

Why a Document Database?

The document model is interesting precisely because it’s deceptively hard. A key-value store is a clean abstraction — you get to punt on almost everything. A document database forces you to solve the whole stack: a rich type system, nested structure traversal, schema validation, filter expressions that compose, projections, updates that don’t overwrite everything.

It’s also the model that most developers interact with daily. MongoDB’s design decisions — the query DSL, the aggregation pipeline, the update operators — are worth understanding at the implementation level. So I made GraniteDB intentionally MongoDB-like in its interface. Not a clone, but something that would feel familiar to anyone who’s written a $gt filter or an $unwind stage.

The Storage Layer: Where Everything Gets Real

The storage engine is where you find out if your abstractions hold up.

GraniteDB’s storage is built around three interleaved components: a Write-Ahead Log (WAL), an LRU buffer pool, and a page-based disk layer.

The WAL is segmented, with each entry carrying a CRC32 checksum and a Log Sequence Number (LSN). Entries are written to the current segment; once a segment hits its size ceiling, it rolls to a new one. On startup, the WAL replays from the last checkpoint — this is how you get crash durability without fsyncing every write to the data file.

WAL Segment 0  →  WAL Segment 1  →  WAL Segment N (current)
[LSN:0001][CRC][entry] [LSN:0002][CRC][entry] ...

The buffer pool is an LRU page cache. Pages are pinned before reads/writes and unpinned after — this is the classic pin/unpin lifecycle you'll recognize from any database textbook. Dirty pages are flushed on eviction. The pool tracks hit/miss ratios, which surface later in the metrics layer.

Pages are fixed-size, 16KB by default. Each page carries an integrity checksum. Overflow pages handle documents that exceed a single page. This is important because document databases don’t have the luxury of fixed row widths — a document might be 200 bytes or 200KB.

The unified StorageEngine struct coordinates all three. Writers go WAL-first, buffer pool second, disk third. Readers hit the buffer pool first, go to disk only on a miss.

The Query Engine: Planning Before Executing

Most database query engines have the same basic shape: parse → plan → execute. GraniteDB follows that pattern.

Parsing converts a JSON filter expression into an internal FilterExpr tree. The parser handles the full set of MongoDB-style operators: comparison ($eq, $ne, $gt, $gte, $lt, $lte), membership ($in, $nin), existence ($exists), type checking ($type), array element matching ($elemMatch), regex ($regex), and logical composition ($and, $or, $not, $nor).

Planning is where it gets interesting. The planner looks at the filter expression and the available indexes, then selects an execution strategy:

ID Lookup — if the filter specifies _id directly, skip everything and fetch by ID. O(1).
Index Scan — if the filter touches an indexed field, use the index to narrow candidates. O(log n) to O(k) where k is the result set size.
Collection Scan — full scan. O(n). The fallback.

The planner is also responsible for generating EXPLAIN output, which surfaces which strategy was selected and why. Useful for understanding why a query is slow before you add a covering index.

Execution takes the plan, runs it, applies sort/skip/limit for pagination, and applies projections to strip fields from the output. Projections matter for performance — returning only the fields you need reduces serialization overhead on both sides of the wire.

Indexing: B-Trees and Hash Tables

GraniteDB supports two index types, each suited to different query patterns.

B-Tree indexes are the general-purpose option. They maintain keys in sorted order, which means they support range queries ($gt, $lt, range scans) efficiently. Composite keys (multi-field indexes) are supported — the B-tree orders by the first field, then the second, and so on. This is the index you use when you need { age: { $gt: 25 }, created_at: { $lt: ... } } to be fast.

Hash indexes trade range query support for O(1) exact-match lookups. If your access pattern is almost exclusively point lookups — { user_id: "abc123" } — a hash index is faster than a B-tree because there's no tree traversal, just a hash computation and a bucket lookup.

The Index Manager handles creation, deletion, and automatic maintenance. Every insert, update, and delete propagates through the Index Manager to keep indexes consistent. This is one of the trickier parts to get right — index maintenance under concurrent write load is where many bugs hide.

Both index types support unique constraints (enforced at write time) and sparse indexes (which skip documents that don’t contain the indexed field, saving space when the field is optional).

Transactions and MVCC

This is the part where building a database gets genuinely hard.

GraniteDB implements ACID transactions with multiple isolation levels: Read Uncommitted, Read Committed, Repeatable Read, Snapshot, and Serializable. The isolation model is built on Multi-Version Concurrency Control (MVCC).

The MVCC intuition: instead of locking a row/document while a transaction reads it, you keep multiple versions of the data timestamped by transaction ID. Readers see a consistent snapshot of the data as of their transaction’s start time. Writers create new versions. Readers and writers don’t block each other.

Document "user:123" version history:
  txn_id=1  → { name: "Alice", age: 28 }
  txn_id=5  → { name: "Alice", age: 29 }   ← current
  txn_id=8  → (in-progress, not committed)

A reader at txn_id=6 sees the txn_id=5 version. A reader at txn_id=4 sees the txn_id=1 version. The in-progress txn_id=8 is invisible until it commits.

Old versions don’t live forever. A garbage collector periodically removes MVCC versions that no active transaction can still see.

The Transaction Manager handles begin/commit/abort, and detects write-write conflicts at commit time — if two transactions both modified the same document, one of them loses and gets aborted. Transactions also have configurable timeouts; a transaction that runs too long is automatically aborted, which prevents deadlock accumulation.

The Aggregation Pipeline

GraniteDB implements a 12-stage aggregation pipeline, also JSON-driven and MongoDB-compatible in structure.

[
  { "$match": { "status": "active" } },
  { "$group": { "_id": "$region", "total": { "$sum": "$revenue" } } },
  { "$sort": { "total": -1 } },
  { "$limit": 10 }
]

The stages: $match, $project, $group, $sort, $limit, $skip, $unwind, $count, $addFields, $replaceRoot, $lookup (cross-collection join), and $out (write results to a collection).

The accumulators: $sum, $avg, $min, $max, $count, $push, $addToSet, $first, $last.

The pipeline executor processes stages sequentially, passing a document stream between them. $group is the most complex stage internally — it builds a hash map keyed by the group expression, applying accumulators across all documents in each group. $unwind deconstructs arrays into individual documents. $lookup does a nested-loop join, which is expensive on large collections but correct.

Security: Auth, RBAC, and Encryption at Rest

GraniteDB ships with a full security stack, which was a deliberate choice. Security is often treated as something you bolt on after the fact. Here it’s part of the core.

Authentication uses Argon2 for password hashing. Argon2 is the winner of the Password Hashing Competition and the current recommendation over bcrypt/scrypt for new systems. Memory-hard, configurable parallelism and iteration count, resistant to GPU-accelerated cracking.

Role-Based Access Control has five built-in roles (read, readWrite, dbAdmin, userAdmin, root) plus custom role support. Each role maps to a set of 11 action types. Permission checks happen at the handler layer before any operation reaches the storage engine.

Encryption at rest uses AES-256-GCM. GCM mode provides both confidentiality and authenticated encryption — you get integrity verification for free. Each encryption operation uses a randomly generated nonce, which means identical plaintexts produce different ciphertexts (semantic security). Key management is pluggable.

Networking: Async TCP with a JSON Wire Protocol

The server is built on Tokio. Each incoming connection spawns its own task — Tokio’s async runtime schedules them cooperatively across a thread pool sized to the available CPUs.

The wire protocol is JSON over TCP. This is a deliberate tradeoff: JSON is slower to parse than a binary format (like MongoDB’s BSON or Redis’s RESP), but it’s trivially debuggable with nc or any HTTP client that speaks raw TCP. For a project where the goals include legibility and hackability, JSON wire protocol is the right call.

The protocol covers 25+ command types: database management, collection CRUD, index operations, aggregation, transactions, authentication, user management, metrics, and replication control.

{
  "request_id": "req-001",
  "command": {
    "type": "aggregate",
    "database": "analytics",
    "collection": "events",
    "pipeline": [
      { "$match": { "type": "click" } },
      { "$group": { "_id": "$page", "count": { "$sum": 1 } } }
    ]
  }
}

The connection pool tracks active connections and enforces a configurable maximum. The interactive CLI is a thin wrapper over the wire protocol — it handles input parsing, session state (use dbname), and pretty-printing.

Replication and Sharding

Replication is oplog-based. Every write operation is appended to a capped operations log with a timestamp. Secondary replicas tail the oplog and apply operations in order. The replica set supports Primary/Secondary/Arbiter roles with heartbeat-based health monitoring.

This is the same fundamental architecture MongoDB uses. The oplog-as-replication-stream model has a nice property: it’s also useful for change data capture (CDC) without any additional machinery.

Sharding uses consistent hashing with virtual nodes. Virtual nodes are a trick for handling heterogeneous shard capacities and making rebalancing less disruptive — instead of mapping key ranges directly to physical shards, you map them to a large number of virtual nodes, then assign virtual nodes to physical shards. Adding or removing a shard only moves a fraction of the virtual nodes.

The router also supports key-range sharding as an alternative, which is better when you need range scans across shards without scatter-gather overhead.

Observability: 17 Atomic Metrics

The metrics layer tracks 17 counters via atomic integers — queries executed, inserts, updates, deletes, connections, bytes read/written, WAL writes, buffer pool hits and misses, index lookups, collection scans, and active transactions.

Atomics are the right tool here. Using a mutex for counters on a hot path would be measurable overhead. With atomics you get lock-free increment operations, and the visibility guarantee is sufficient (you don’t need happens-before across threads for a metrics counter).

Metrics are queryable via the wire protocol’s server_status command, which returns a snapshot of all counters. This is the foundation you'd build a Prometheus exporter or a dashboard on.

What Building This Actually Teaches You

Some things you don’t fully understand until you’ve implemented them:

WAL replay is where crash recovery actually lives. The concept is simple. The implementation detail that bites you is ensuring that your WAL writes are truly durable before you acknowledge a write to the client. fsync semantics, buffered I/O, write ordering — these matter in ways that are easy to handwave in architecture diagrams.

MVCC garbage collection is harder than MVCC reads. Knowing when a version is safe to delete requires knowing the minimum transaction ID of any active reader. Getting that wrong either leaks memory or causes read errors for long-running transactions.

Index maintenance is a correctness problem, not a performance problem. It’s tempting to think of indexes as an optimization layer you can add later. But once writes are concurrent, ensuring that indexes stay consistent with the underlying data under partial failures is a correctness requirement, not a performance one.

The query planner is where the database’s personality lives. The same query can be fast or catastrophically slow depending on planner decisions. Even a simple “index scan vs. collection scan” choice has edge cases — an index scan with poor selectivity can be slower than a full collection scan because of the random I/O pattern.

Rust makes the concurrency story cleaner, but not easy. The borrow checker eliminates a class of data races at compile time. But the logic of MVCC, deadlock detection, and consistent hash routing is still hard — it’s just hard in ways you can reason about, rather than hard in the “Heisenbugs in production” way.

The Code

GraniteDB is open source under Apache 2.0.

git clone https://github.com/kritarth1107/GraniteDB.git
cd GraniteDB
cargo build --release
cargo run --release -- --port 6380 --data-dir ./data

In a second terminal:

cargo run --release --bin granite-cli

granite:default> use mydb
granite:mydb> createcol users
granite:mydb> insert users {"name": "Alice", "age": 28, "role": "engineer"}
granite:mydb> find users {"age": {"$gt": 25}}
granite:mydb> find users {"role": {"$in": ["engineer", "admin"]}}

The full source is in src/ — 17 modules, each cleanly separated: storage, query, index, aggregation, transaction, auth, network, replication, sharding, metrics. Reading the module structure is a reasonable way to understand how the components fit together before diving into any individual file.

I built GraniteDB because I wanted to understand databases the way you only can by building one. Everything is leaky. Every abstraction that seems clean at the design phase has a sharp edge when you implement it. That’s not a criticism of the field — it’s what makes this problem space interesting.

If you’re curious about how the pieces fit, the code is there. Issues, PRs, and questions are welcome.

GraniteDB — solid as granite. github.com/kritarth1107/GraniteDB

I Built a Database Engine From Scratch in Rust. Here’s What I Learned. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

I Built a Database Engine From Scratch in Rust. Here’s What I Learned.

GRANITE-DB

No reason. No startup idea. Just a weekend, a compiler, and a question I couldn’t shake.

Why a Document Database?

The Storage Layer: Where Everything Gets Real

The Query Engine: Planning Before Executing

Indexing: B-Trees and Hash Tables

Transactions and MVCC

The Aggregation Pipeline

Security: Auth, RBAC, and Encryption at Rest

Networking: Async TCP with a JSON Wire Protocol

Replication and Sharding

Observability: 17 Atomic Metrics

What Building This Actually Teaches You

The Code

NexaPay — Accept Card Payments, Receive Crypto

Related Articles

Goolsbee warns against rate cuts as inflation, oil prices surge

Russia intercepts 270 Ukrainian drones amid ongoing conflict

The Watchman Idea: A Simple Way to Understand Drosera

Hormuz deadlock persists as Trump faces War Powers deadline

Trump reviews Iran proposal amid potential diplomatic shift

Spirit Airlines ceases operations amid financial struggles and rising fuel costs