NoKV is a Go-native key-value storage engine that provides both embedded and distributed operation modes. It implements an LSM-tree architecture with value separation, supporting single-node embedded usage through the NoKV.DB API or distributed deployment via multi-Raft regions with MVCC transaction semantics.
Scope: This page introduces NoKV's purpose, runtime modes, core components, and high-level architecture. For detailed subsystem documentation:
NoKV addresses the need for a flexible storage engine that can scale from single-process embedded usage to distributed multi-node clusters without architectural rewrites. The dual-mode design allows applications to start with embedded storage and migrate to distributed operation by deploying raftstore.Server instances without changing application logic.
Primary design goals:
Sources: README.md1-234 docs/architecture.md1-208
NoKV operates in two distinct modes determined at startup:
| Mode | Entry Point | Use Case | Coordination |
|---|---|---|---|
| Embedded | NoKV.Open(opt) | Single-process applications, testing, local caching | None (standalone) |
| Distributed | nokv serve --raft-config <file> | Multi-node clusters, high availability, horizontal scaling | PD-lite for routing/TSO, Raft for replication |
The DB struct (db.go57-89) provides direct access to the storage engine. Applications call NoKV.Open with an Options struct (options.go11-194) and receive a DB handle that exposes:
Set, SetWithTTL, Get, Del, NewIterator for application dataApplyInternalEntries, GetInternalEntry, NewInternalIterator for versioned operationsWAL(), Manifest() accessors for metadata managersSources: db.go1-707 options.go1-275 README.md85-122
The raftstore.Server (raftstore/server/server.go) wraps the embedded core with multi-Raft coordination. Each store.Store manages multiple peer.Peer instances representing region replicas, with each peer embedding the same NoKV.DB engine.
Sources: docs/architecture.md115-157 raftstore/server/server.go README.md126-146
The embedded engine consists of four primary subsystems coordinated by the DB struct:
The wal.Manager (wal/manager.go) provides append-only durability with typed record support. Each record follows the format [len|type|payload|crc32] (docs/architecture.md58). Segment rotation occurs when the active segment exceeds size thresholds or on explicit SwitchSegment calls.
Large values (exceeding ValueThreshold) bypass the LSM tree and are written to the valueLog (vlog.go). The LSM stores a ValuePtr (kv/value.go) instead of the full payload. Garbage collection (vlog/gc.go) uses discard statistics to identify reclaimable segments.
The lsm.LSM (lsm/lsm.go) orchestrates:
The manifest.Manager (manifest/manager.go) persists version edits to a log file. The CURRENT file (manifest/manager.go) contains a pointer to the active manifest log. Edits include SSTable additions/deletions, WAL checkpoints, ValueLog heads, and Raft pointers (manifest/edit.go).
Sources: db.go119-296 docs/architecture.md54-92 README.md147-156
When operating in distributed mode, NoKV implements Percolator-style two-phase commit:
| Component | Location | Responsibility |
|---|---|---|
percolator.MVCC | percolator/mvcc.go | Prewrite/Commit/Rollback operations with lock management |
kv.Apply | raftstore/kv/apply.go | Bridges Raft commands to MVCC operations |
raftstore/client.Client | raftstore/client/client.go | Leader-aware routing, retry logic, 2PC orchestration |
| PD-lite | pd/service.go | TSO allocation, region routing, heartbeat tracking |
Column Families:
CFDefault: User dataCFLock: Prewrite locks with primary key referencesCFWrite: Commit records with timestampsSources: docs/architecture.md89-93 docs/architecture.md161-177 percolator/mvcc.go
The following diagram illustrates how a write operation flows through the embedded engine, mapping high-level operations to concrete code entities:
Sources: db.go425-456 db_write.go336-394 db_write.go439-453 vlog.go wal/manager.go
Reads traverse the LSM hierarchy from newest to oldest, resolving ValuePtr indirections transparently:
Entry Lifecycle Contracts:
DB.Get returns detached entries (db.go526); callers must NOT call DecrRefDB.GetInternalEntry returns borrowed entries (db.go496-509); callers MUST call DecrRef exactly onceClose is calledSources: db.go512-574 db.go582-605 docs/architecture.md103-111 db_hot.go37-53
NoKV implements adaptive write batching with hot-key detection to optimize throughput under skewed workloads:
The hotTracker interface (db_tracker.go) wraps hotring.Ring (hotring/ring.go) to monitor access patterns:
count >= prefetchHotWriteHotKeyLimit threshold, returns ErrHotKeyWriteThrottle on violationThe commit worker (db_write.go264-334) dynamically adjusts batch limits based on:
WriteBatchMaxCount by min(backlog/limitCount, 4) to drain fasterHotWriteBatchMultiplier when isHotWrite(entries) returns trueWriteBatchWait duration when queue is momentarily emptySources: db_hot.go1-208 db_write.go264-334 options.go119-131 hotring/ring.go
The Options struct (options.go11-194) provides ~50 configuration knobs grouped by subsystem:
| Category | Key Options | Default |
|---|---|---|
| Storage | MemTableSize, SSTableMaxSz, ValueThreshold | 64MB, 256MB, 1KB |
| Compaction | NumCompactors, NumLevelZeroTables, IngestCompactBatchSize | 4, 16, 4 |
| Write Path | WriteBatchMaxCount, WriteBatchWait, SyncWrites | 64, 200µs, false |
| Hot Keys | WriteHotKeyLimit, HotWriteBurstThreshold, HotRingTopK | 128, 8, 16 |
| ValueLog GC | ValueLogGCInterval, ValueLogGCDiscardRatio | 10min, 0.5 |
| WAL Watchdog | EnableWALWatchdog, WALAutoGCInterval | true, 15s |
For distributed mode, topology is defined in raft_config.json (README.md128-145):
Launch sequence:
nokv-config manifest --config raft_config.json bootstraps manifestsnokv pd --addr 127.0.0.1:2379 starts PD-lite servicenokv serve --raft-config raft_config.json starts each store nodeSources: options.go1-275 README.md126-146 config/config.go
The cmd/nokv-redis binary (cmd/nokv-redis/) exposes a RESP-compatible interface:
| Mode | Backend | Transaction Path |
|---|---|---|
| Embedded | embeddedBackend (cmd/nokv-redis/backend_embedded.go) | Direct DB.Set/DB.Get calls |
| Distributed | raftBackend (cmd/nokv-redis/backend_raft.go) | raftstore/client.Client.TwoPhaseCommit |
Supported Commands:
SET key value [NX|XX] [EX seconds|PX ms|EXAT unix|PXAT unix-ms]GET, MGET, DEL, INCR, DECR, EXPIRE, TTL, PINGTTL is persisted as Entry.ExpiresAt through the same 2PC write path as the value payload.
Sources: cmd/nokv-redis/server.go cmd/nokv-redis/backend_raft.go README.md220-228
The Stats struct (stats.go) runs a background collector (stats.go) that samples metrics every 5 seconds:
Published via expvar.Publish("NoKV.Stats", snapshot) and consumed by:
nokv stats --workdir <dir> CLI command/debug/vars HTTP endpoint (when enabled)| Command | Purpose | Example |
|---|---|---|
nokv stats | Display live or offline metrics | nokv stats --workdir ./store-1 --json |
nokv manifest | Inspect manifest edits | nokv manifest --workdir ./store-1 |
nokv vlog | Show ValueLog segment status | nokv vlog --workdir ./store-1 |
nokv regions | List region metadata | nokv regions --workdir ./store-1 --json |
Sources: stats.go1-400 cmd/nokv/main.go README.md206-217 docs/cli.md
| Scenario | Recommended Mode | Configuration Notes |
|---|---|---|
| Local caching | Embedded | Set ValueThreshold=0 to force value-log separation, enable aggressive GC |
| Single-node application | Embedded | Default settings, optionally enable SyncWrites for durability |
| Distributed service | RaftStore + PD-lite | Configure raft_config.json with 3+ replicas per region |
| Redis-compatible API | Either (via gateway) | Embedded for standalone, raft backend for distributed |
| Write-heavy workload | Either | Increase NumCompactors, IngestCompactBatchSize, tune WriteHotKeyLimit |
| Read-heavy workload | Either | Enable HotRingEnabled, increase BlockCacheSize, set HotRingTopK |
Sources: docs/architecture.md201-206 README.md1-234 options.go204-274
Refresh this wiki
This wiki was recently refreshed. Please wait 7 days to refresh again.