Educational LSM-tree storage engine implemented from scratch in TypeScript. It demonstrates how modern databases turn random writes into sequential I/O with a write-ahead log (WAL), memtables, immutable SSTables, Bloom filters, and compaction. Zero external services and minimal dependencies so you can read the code and learn the internals.
- Append-only writes: every mutation hits the WAL + in-memory memtable for O(1) amortized writes.
- Flush to SSTables: memtable spills to sorted-string tables with per-table Bloom filters and key-offset index for fast lookups.
- Crash safety: WAL replay restores unflushed writes on restart.
- Compaction: merges SSTables, drops obsolete versions/tombstones, and keeps a single immutable run.
- CLI + tests:
node --testcoverage for WAL replay, tombstones, compaction correctness.
MemTable— in-memory map tracking byte size; triggers flush at 256 KB by default.WriteAheadLog— JSON-line log for durability before touching memory.SSTable— immutable files (.data,.index.json,.meta.json) with Bloom filter snapshots and in-memory key→offset index.Manifest— persists table ordering and sequence numbers for deterministic recovery.LsmTree— orchestrates writes, reads (newest SSTable first), flushes, and compaction.CLI— ergonomics forput/get/del/flush/compact/status/demoplus--data <dir>override.
<dataDir>/
manifest.json # next sequence + ordered SSTable metadata
wal.log # write-ahead log for durability
sstables/
sst-42.data # JSONL records sorted by key
sst-42.index.json # key -> byte offset within .data
sst-42.meta.json # counts, min/max key, bloom snapshot
put/delete: O(1) amortized (append to WAL + memtable; flush is sequential I/O).get: O(L) where L = number of SSTables (Bloom filter gates most misses; in-memory index gives O(1) fetch per table).- Compaction: O(N log N) over total keys in participating SSTables; removes tombstones and keeps latest
seqper key. - Space: append-only until compaction; Bloom filters are fixed-size bitsets (default 4096 bits/table).
npm install
npm run build
node dist/cli.js --data ./data put user:1 alice
node dist/cli.js --data ./data get user:1
node dist/cli.js --data ./data flush
node dist/cli.js --data ./data compact
node dist/cli.js --data ./data statusnode dist/cli.js demoSample output:
After flush:
{
"memtableEntries": 0,
"sstables": [
{ "id": "sst-3", "count": 3 }
],
"dataDir": "/absolute/path/data"
}
user:1 -> alice
user:2 -> (nil)
Compacting...
{
"memtableEntries": 0,
"sstables": [
{ "id": "sst-compact-<timestamp>", "count": 2 }
],
"dataDir": "/absolute/path/data"
}
npm testUses Node's built-in node:test runner to cover WAL replay, tombstones, and compaction invariants.
- Bloom filters use four SHA-256–derived positions to cut false positives while keeping the bitset tiny.
- WAL + memtable ensures write ordering; every flush writes an immutable SSTable and clears the WAL.
- Compaction reads all SSTables, keeps the newest sequence per key, and drops tombstones to reclaim space.
- Data structures are intentionally lightweight (no external deps) so you can trace every byte from API call to disk write.
- Multi-level compaction strategy (tiered/leveled) with size thresholds.
- Pluggable memtable (skip list) and Bloom parameters per table.
- Range scans using sparse indexes.
- Metrics endpoint exposing flush/compaction counters.
Mo Shirmohammadi — github.com/mohosy