Skip to content

mohosy/lsm-tree-storage-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lsm-tree-storage-engine

TypeScript Tests License: MIT

Educational LSM-tree storage engine implemented from scratch in TypeScript. It demonstrates how modern databases turn random writes into sequential I/O with a write-ahead log (WAL), memtables, immutable SSTables, Bloom filters, and compaction. Zero external services and minimal dependencies so you can read the code and learn the internals.

Highlights

  • Append-only writes: every mutation hits the WAL + in-memory memtable for O(1) amortized writes.
  • Flush to SSTables: memtable spills to sorted-string tables with per-table Bloom filters and key-offset index for fast lookups.
  • Crash safety: WAL replay restores unflushed writes on restart.
  • Compaction: merges SSTables, drops obsolete versions/tombstones, and keeps a single immutable run.
  • CLI + tests: node --test coverage for WAL replay, tombstones, compaction correctness.

Architecture (clean separation)

  • MemTable — in-memory map tracking byte size; triggers flush at 256 KB by default.
  • WriteAheadLog — JSON-line log for durability before touching memory.
  • SSTable — immutable files (.data, .index.json, .meta.json) with Bloom filter snapshots and in-memory key→offset index.
  • Manifest — persists table ordering and sequence numbers for deterministic recovery.
  • LsmTree — orchestrates writes, reads (newest SSTable first), flushes, and compaction.
  • CLI — ergonomics for put/get/del/flush/compact/status/demo plus --data <dir> override.

Storage layout

<dataDir>/
  manifest.json            # next sequence + ordered SSTable metadata
  wal.log                  # write-ahead log for durability
  sstables/
    sst-42.data            # JSONL records sorted by key
    sst-42.index.json      # key -> byte offset within .data
    sst-42.meta.json       # counts, min/max key, bloom snapshot

Big-O at a glance

  • put/delete: O(1) amortized (append to WAL + memtable; flush is sequential I/O).
  • get: O(L) where L = number of SSTables (Bloom filter gates most misses; in-memory index gives O(1) fetch per table).
  • Compaction: O(N log N) over total keys in participating SSTables; removes tombstones and keeps latest seq per key.
  • Space: append-only until compaction; Bloom filters are fixed-size bitsets (default 4096 bits/table).

Quickstart

npm install
npm run build
node dist/cli.js --data ./data put user:1 alice
node dist/cli.js --data ./data get user:1
node dist/cli.js --data ./data flush
node dist/cli.js --data ./data compact
node dist/cli.js --data ./data status

Run the built-in demo

node dist/cli.js demo

Sample output:

After flush:
{
  "memtableEntries": 0,
  "sstables": [
    { "id": "sst-3", "count": 3 }
  ],
  "dataDir": "/absolute/path/data"
}
user:1 -> alice
user:2 -> (nil)
Compacting...
{
  "memtableEntries": 0,
  "sstables": [
    { "id": "sst-compact-<timestamp>", "count": 2 }
  ],
  "dataDir": "/absolute/path/data"
}

Tests

npm test

Uses Node's built-in node:test runner to cover WAL replay, tombstones, and compaction invariants.

Design notes

  • Bloom filters use four SHA-256–derived positions to cut false positives while keeping the bitset tiny.
  • WAL + memtable ensures write ordering; every flush writes an immutable SSTable and clears the WAL.
  • Compaction reads all SSTables, keeps the newest sequence per key, and drops tombstones to reclaim space.
  • Data structures are intentionally lightweight (no external deps) so you can trace every byte from API call to disk write.

Future extensions

  • Multi-level compaction strategy (tiered/leveled) with size thresholds.
  • Pluggable memtable (skip list) and Bloom parameters per table.
  • Range scans using sparse indexes.
  • Metrics endpoint exposing flush/compaction counters.

Author

Mo Shirmohammadi — github.com/mohosy

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors