lsm-tree-storage-engine

Educational LSM-tree storage engine implemented from scratch in TypeScript. It demonstrates how modern databases turn random writes into sequential I/O with a write-ahead log (WAL), memtables, immutable SSTables, Bloom filters, and compaction. Zero external services and minimal dependencies so you can read the code and learn the internals.

Highlights

Append-only writes: every mutation hits the WAL + in-memory memtable for O(1) amortized writes.
Flush to SSTables: memtable spills to sorted-string tables with per-table Bloom filters and key-offset index for fast lookups.
Crash safety: WAL replay restores unflushed writes on restart.
Compaction: merges SSTables, drops obsolete versions/tombstones, and keeps a single immutable run.
CLI + tests: node --test coverage for WAL replay, tombstones, compaction correctness.

Architecture (clean separation)

MemTable — in-memory map tracking byte size; triggers flush at 256 KB by default.
WriteAheadLog — JSON-line log for durability before touching memory.
SSTable — immutable files (.data, .index.json, .meta.json) with Bloom filter snapshots and in-memory key→offset index.
Manifest — persists table ordering and sequence numbers for deterministic recovery.
LsmTree — orchestrates writes, reads (newest SSTable first), flushes, and compaction.
CLI — ergonomics for put/get/del/flush/compact/status/demo plus --data <dir> override.

Storage layout

<dataDir>/
  manifest.json            # next sequence + ordered SSTable metadata
  wal.log                  # write-ahead log for durability
  sstables/
    sst-42.data            # JSONL records sorted by key
    sst-42.index.json      # key -> byte offset within .data
    sst-42.meta.json       # counts, min/max key, bloom snapshot

Big-O at a glance

put/delete: O(1) amortized (append to WAL + memtable; flush is sequential I/O).
get: O(L) where L = number of SSTables (Bloom filter gates most misses; in-memory index gives O(1) fetch per table).
Compaction: O(N log N) over total keys in participating SSTables; removes tombstones and keeps latest seq per key.
Space: append-only until compaction; Bloom filters are fixed-size bitsets (default 4096 bits/table).

Quickstart

npm install
npm run build
node dist/cli.js --data ./data put user:1 alice
node dist/cli.js --data ./data get user:1
node dist/cli.js --data ./data flush
node dist/cli.js --data ./data compact
node dist/cli.js --data ./data status

Run the built-in demo

node dist/cli.js demo

Sample output:

After flush:
{
  "memtableEntries": 0,
  "sstables": [
    { "id": "sst-3", "count": 3 }
  ],
  "dataDir": "/absolute/path/data"
}
user:1 -> alice
user:2 -> (nil)
Compacting...
{
  "memtableEntries": 0,
  "sstables": [
    { "id": "sst-compact-<timestamp>", "count": 2 }
  ],
  "dataDir": "/absolute/path/data"
}

Tests

npm test

Uses Node's built-in node:test runner to cover WAL replay, tombstones, and compaction invariants.

Design notes

Bloom filters use four SHA-256–derived positions to cut false positives while keeping the bitset tiny.
WAL + memtable ensures write ordering; every flush writes an immutable SSTable and clears the WAL.
Compaction reads all SSTables, keeps the newest sequence per key, and drops tombstones to reclaim space.
Data structures are intentionally lightweight (no external deps) so you can trace every byte from API call to disk write.

Future extensions

Multi-level compaction strategy (tiered/leveled) with size thresholds.
Pluggable memtable (skip list) and Bloom parameters per table.
Range scans using sparse indexes.
Metrics endpoint exposing flush/compaction counters.

Author

Mo Shirmohammadi — github.com/mohosy

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lsm-tree-storage-engine

Highlights

Architecture (clean separation)

Storage layout

Big-O at a glance

Quickstart

Run the built-in demo

Tests

Design notes

Future extensions

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lsm-tree-storage-engine

Highlights

Architecture (clean separation)

Storage layout

Big-O at a glance

Quickstart

Run the built-in demo

Tests

Design notes

Future extensions

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages