Releases: stoolap/stoolap
v0.4.0
What's New in v0.4.0
Immutable Volume Storage Engine
Stoolap now splits each table into a hot MVCC buffer (recent writes, WAL-backed) and cold frozen volumes (historical data, column-major). The query engine merges both sources transparently.
Cold volume format (STV4):
- Per-column per-row-group LZ4 compression with streaming CRC32 verification
- Zone maps (min/max per column per 64K row group) for scan pruning
- Bloom filters for point-lookup acceleration
- Dictionary encoding for text columns
- Deferred column loading with tiered eviction (hot/warm/cold memory tiers)
Lifecycle:
PRAGMA CHECKPOINTseals hot rows into immutable.volfiles, persists manifests, truncates WAL- Bounded compaction: sub-target volumes merge, oversized volumes split, dirty volumes rewrite
- Background compaction thread (non-blocking checkpoint cycles)
- Cutoff-filtered seal and compaction during snapshot isolation transactions
Crash safety:
- Fsync-before-rename on all atomic writes (volumes, manifests, catalog)
- Two-phase WAL: only committed transactions applied during recovery
- Manifests loaded before WAL replay for idempotent recovery
Columnar Aggregate Pushdown
Filtered and grouped aggregates computed directly on raw column arrays without constructing Row objects:
- Filtered aggregates: SUM/COUNT/MIN/MAX/AVG with typed predicates on i64/f64/dictionary columns
- Grouped aggregates: Single-column GROUP BY with dictionary-indexed accumulators (zero hashing) or FxHashMap for numeric columns
- Dictionary DISTINCT: Extracts unique values from dictionary metadata without scanning rows
- Zone-map pruning: Volumes and row groups skipped when predicates prove no match
- IN list pruning:
WHERE id IN (...)generates min/max bounds for zone-map elimination
Query Performance
- ORDER BY PK + LIMIT: K-way merge across sorted volume row_ids, stopping after limit rows
- MIN/MAX typed scan: Direct i64/f64/timestamp access with zone-map volume pruning
- OFFSET skip: Cold scan skips row materialization for offset rows
- Parallel cold scanning: Rayon-based parallel volume processing (4+ volumes, 100K+ rows threshold)
Foreign Key Improvements
- Recursive ON UPDATE CASCADE: Cascades through the full FK chain including grandchild tables, with depth limiting (16 levels)
- Referenced UNIQUE column cascade: FK cascade generalized from PK-only to any referenced UNIQUE column
- Pre-check RESTRICT: RESTRICT constraints checked before writing parent rows, preserving statement-level atomicity in explicit transactions
- SET NULL recursion: SET NULL arm recurses through descendants so deeper RESTRICT checks are enforced
Primary Key Update Protection
UPDATE on primary key columns is now rejected with a clear error message. The engine uses row_id == pk_value as a core invariant across ~50 code paths. This matches SQLite's behavior for rowid tables. Use DELETE + INSERT to change a row's primary key value.
Calendar-Aware INTERVAL Arithmetic
INTERVAL '1 month' and INTERVAL '1 year' now use proper calendar logic instead of 30-day/365-day approximations. Handles leap years, variable month lengths, and preserves nanosecond precision. Matches DATE_ADD behavior.
Schema Evolution Fixes
- CREATE INDEX after DROP+ADD COLUMN:
validate_cold_uniqueandpopulate_index_from_coldnow use column mapping to correctly translate schema indices to physical volume indices - AS OF PK dedup: Resolves PK column through mapping for correct cold row dedup after schema evolution
- Partition grouping: Uses snapshotted
cs.mappinginstead of live lookup (immune to compaction races)
ALTER TABLE MODIFY COLUMN Validation
MODIFY COLUMN ... NOT NULL now validates existing data with a streaming IS NULL scan before applying the constraint. Returns a clear error if any NULL values exist.
Data Integrity
- Compaction TOCTOU fix: Snapshot sequence limit captured per-table (matches seal's per-table pattern)
- Manifest truncation errors: Tombstones, column renames, and dropped columns return errors instead of silent data loss on truncated manifests
- Volume corruption guards: Dictionary IDs and bytes offsets validated at deserialization, preventing panics on corrupted volume files
- Aggregation pushdown correctness: Bail on partial WHERE pushdown (prevents wrong results when memory filter is needed)
- Scanner column pruning: Materialize all columns when filter column indices can't be determined
- Seal race fix:
collect_rows_with_limitusescollect_hot_row_ids_intoinstead ofhas_row_idpoint lookups
Configuration
New DSN parameters for volume storage tuning:
| Parameter | Default | Description |
|---|---|---|
checkpoint_interval |
60 | Seconds between checkpoint cycles |
compact_threshold |
4 | Sub-target volumes per table before merging |
target_volume_rows |
1048576 | Target rows per cold volume (min 65536) |
checkpoint_on_close |
on | Seal all hot rows on clean shutdown |
volume_compression |
on | LZ4 compression for cold volume files |
sync_mode |
normal | none/off, normal, full (or 0, 1, 2) |
Invalid DSN parameter values now return errors instead of silently using defaults.
Migration from v0.3.7
Existing v0.3.7 databases are automatically migrated on first open:
- Legacy snapshot
.binfiles loaded into hot buffer - WAL entries replayed
- Hot data sealed into immutable
.volfiles snapshots/directory removed
Legacy DSN parameter names (snapshot_interval, snapshot_compression) are accepted for backward compatibility.
Other Changes
- Build timestamp embedded in
version_info()output - CLI error paths ensure database cleanup before exit
- Stale group cache cleared in volume scanner (prevents panic at row-group boundaries)
- Eviction epoch off-by-one corrected
- WASM binary rebuilt with warning-free compilation
Full Changelog: v0.3.7...v0.4.0
v0.3.7
What's New in v0.3.7
PostgreSQL-Inspired DISTINCT ON
New DISTINCT ON (expr, ...) syntax for per-group deduplication, returning the first row for each unique combination of the specified expressions.
- Hash-based dedup with O(groups) memory, correctly handles arbitrary ORDER BY patterns including non-leading sort orders
- Pipeline order: ORDER BY, DISTINCT ON, column removal, LIMIT/OFFSET
- Works across all query paths: single-table scans, JOINs, CTEs, subqueries, and complex ORDER BY
- Supports aliased keys, computed expressions, keys not in SELECT, qualified identifiers, and NULL equality
- Guards distinct index pushdown to prevent bypassing key-based dedup
-- First (highest) order per customer
SELECT DISTINCT ON (customer) customer, amount, order_date
FROM orders
ORDER BY customer, amount DESC;
-- Per-group dedup on joins with qualified keys
SELECT DISTINCT ON (c.name) c.name, p.amount
FROM customers c JOIN purchases p ON c.id = p.customer_id
ORDER BY c.name, p.amount DESC;ON CONFLICT Upsert (PostgreSQL-Style)
ON CONFLICT (cols) DO UPDATE SETandDO NOTHINGsyntaxEXCLUDEDpseudo-table to reference attempted insert values- Conflict target matching against PK and composite unique constraints
- CHECK constraint validation during upsert updates
- RETURNING clause and INSERT ... SELECT support for conflict handling
INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email;Constant Folding and Non-Deterministic Function Tracking
- Compile-time constant folding for deterministic column-free expressions
FunctionInfo.deterministicflag with registry-based lookup- NOW, CURRENT_DATE, CURRENT_TIMESTAMP, RANDOM, SLEEP, EMBED marked non-deterministic
- Semantic cache bypass for queries with non-deterministic functions
- Pushdown evaluator for stable expressions like
NOW() - INTERVAL '24h'
GROUP BY and Aggregation Optimizations
- 3-column GROUP BY uses tuple keys (no Vec heap allocation)
- 4+ columns use direct AHashMap (replaces hash-collision approach)
- Early termination extended to 1, 2, and 3-column paths
- FIRST/LAST aggregates support ORDER BY with O(1) sort-key tracking
Snapshot and Persistence
- Persist
default_valuein both WAL and snapshot serialization - Re-record index and view DDL to WAL during snapshots so they survive truncation
- Skip snapshot when WAL has not grown since last snapshot
- Auto-snapshot loop sleeps
min(cleanup, snapshot)interval - HNSW
mandef_constructionexposed on Index trait for persistence
Performance
- Restore
panic = "abort"in release profile, recovering 5-15% across all benchmarks - Dev profile optimizations and thin LTO for faster builds
Parser and SQL Compatibility
- CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP parsed as niladic functions (SQL standard, no parentheses required)
- Keyword identifiers fold to lowercase (PostgreSQL compatibility)
- Parse ISO 8601 timestamps with fractional seconds and UTC
Zsuffix - Keywords accepted as column names in SET assignments
Bug Fixes
- Fix transaction INSERT with partial column lists: delegate to full executor pipeline for correct column mapping, default values, type coercion, and FK validation
- Fix qualified column ambiguity in joins: ORDER BY and DISTINCT ON with qualified identifiers (e.g.,
c.namevsp.name) now correctly resolve when both joined tables have the same column name - Fix DISTINCT ON key resolution after projection: qualified keys resolve correctly when projected as bare names or under aliases
- Fix classification cache: include DISTINCT ON expressions in hash key to prevent stale cache hits
- Fix TimeTruncFunction duration cache miss on zero-duration values
Documentation
- Go driver docs synced with stoolap-go README
- Driver icons with official brand colors in docs sidebar and page headers
Full Changelog: v0.3.5...v0.3.7
v0.3.5
What's New in v0.3.5
FFI Panic Safety
panic = "unwind"in release profile so thatcatch_unwindboundaries in the C FFI layer work correctly. Previouslypanic = "abort"madecatch_unwinda no-op, meaning any Rust panic would abort the host process (MCP server, Node.js, Python, PHP, Go).- Removed unused
release-ffiprofile. All drivers already build with--release --features ffi. - Added
staticlibcrate type for the Go driver's bundled static libraries.
MCP Server Improvements
stoolap://sql-referenceresource added for discoverability. Delivers the same live schema and complete SQL reference as thesql-assistantprompt, but as an MCP resource that clients can attach without prompt support.
Go Driver Documentation
- New comprehensive Go driver documentation covering both the Direct API and the
database/sqldriver, with examples for transactions, prepared statements, vector search, bulk fetch, JSON, NULL handling, and concurrency patterns.
Documentation
- Updated all FFI build instructions from
--profile release-ffito--release --features ffiacross C API docs, header file, benchmark example, building guide, and testing guide - Updated release profile description from
panic = "abort"topanic = "unwind"in building docs - Reordered driver pages: Node.js, Python, PHP, Go, WASM, C API, MCP Server
Bug Fixes
- Fix view column aliasing: strip table alias prefix from QualifiedIdentifier output column names in post-aggregation expressions (
u.username->username) - Fix window functions on views: materialize view rows and delegate to
execute_select_with_window_functions - Fix panic in projection compilation: replace
.expect()panics inExprMappedResult::with_defaultsandFilteredResult::with_defaultswith properResultpropagation
Full Changelog: v0.3.4...v0.3.5
v0.3.4
What's New in v0.3.4
C FFI Layer
Complete C API for embedding Stoolap in any language that can call C functions. Feature-gated with --features ffi.
- Opaque handle API with step-based iteration, per-handle error storage, and panic-safe
catch_unwindboundaries - Bulk fetch API (
stoolap_rows_fetch_all) transfers entire result sets in a single packed binary buffer, eliminating per-row FFI overhead - Prepared statements with pre-compiled plans that bypass cache lookup on every execution
- Transaction support with isolation level control (begin, commit, rollback)
- Parameter binding for both positional (
$1, $2) and named (:key) parameters - Full C header (
include/stoolap.h) with documented binary format spec
StoolapDb *db = stoolap_open(":memory:");
stoolap_exec(db, "CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)");
StoolapStmt *stmt = stoolap_prepare(db, "INSERT INTO users VALUES ($1, $2)");
StoolapValue params[] = { stoolap_int(1), stoolap_text("Alice") };
stoolap_stmt_execute(stmt, params, 2);Table-Valued Functions
GENERATE_SERIESfor integer, float, and timestamp types- LIMIT pushdown, WHERE range clamping, JOIN support, and EXPLAIN integration
- TVF infrastructure (parser, AST, executor, registry) for future functions
SELECT * FROM GENERATE_SERIES(1, 10);
SELECT * FROM GENERATE_SERIES('2026-01-01', '2026-12-31', '1 month');
SELECT g.value, t.name FROM GENERATE_SERIES(1, 5) g JOIN tasks t ON t.id = g.value;Hash and String Functions
- Hash/checksum:
MD5,SHA1,SHA256,SHA384,SHA512,CRC32 - String:
STARTS_WITH,ENDS_WITH,CONTAINS
Query Optimizer Improvements
- Mixed OR hybrid optimization for queries with both indexed and non-indexed OR branches (index lookup + filtered scan with dedup)
- Relaxed multi-column index prefix rule to allow single-column prefix matching
- Trailing range scan on composite indexes after equality prefix
- BETWEEN decomposition to range comparisons for index use
- Subquery index probe safety guard with outer refs or nested subqueries
- Aggregation guard to prevent fast-path optimizations on aggregate queries
EXPLAIN Improvements
- Partition WHERE predicates to correct join sides
- Show residual filters on index scans
- Handle ROLLUP/CUBE/GROUPING SETS display
Stoolap Studio
- Documentation page with features overview, installation guide, quick tour, and keyboard shortcuts
- Responsive dark/light mode screenshots on docs site and README
Bug Fixes
- Fix undefined behavior in FFI bulk fetch buffer deallocation:
into_boxed_slice()guaranteeslen == capacityfor safeVec::from_raw_partsreconstruction - Fix CowHashMap Stacked Borrows violation in drop and backward-shift deletion
- Fix I64Map Stacked Borrows violation in backward-shift deletion
- Fix CowBTree Stacked Borrows violations across all node pointer accesses
- Fix CompactArc Stacked Borrows violation: derive data pointer via raw arithmetic instead of Deref
- Restore
Expression::Casematch arm inprocess_where_subqueriesaccidentally deleted by mutation testing - Fix qualified outer column resolution in expression compiler to prevent incorrect binding when inner row shadows outer column names
- Fix parser handling of consecutive semicolons (
SELECT 1;;) - Reject bare expression statements in
prepare()to catch typos likeSELECTXat prepare time
Testing & CI
- 14 FFI aggregate function tests (COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, JOINs, subqueries)
- 67 unit tests for outer reference detection covering all match arms
- 13 integration tests for correlated subquery outer reference paths
- MVCC TransactionRegistry unit tests
- sqllogictest suite with comprehensive
.slttest files - Failpoint infrastructure for I/O error injection testing
- Nightly CI: mutation testing (24 shards), Miri rotation (5 module groups), stress tests, sanitizers
- TSAN inline suppression for CompactArc false positive
Documentation
- C driver page with full API reference and bulk fetch binary format spec
- Node.js driver updated for N-API C addon architecture
- Development category: testing, limitations, building from source, contributing
- Changelog page fetching releases from GitHub API
- Unified header styles and grid overlay across docs site
Full Changelog: v0.3.3...v0.3.4
v0.3.3
What's New in v0.3.3
Vector Search Engine
- VECTOR(N) column type with packed binary storage for embedding vectors of any dimension
- HNSW index for approximate nearest neighbor search with configurable parameters (
m,ef_construction,ef_search, distance metric) - Distance functions:
VEC_DISTANCE_L2,VEC_DISTANCE_COSINE,VEC_DISTANCE_IP, and<=>operator - Vector search optimizer with HNSW index path (O(log N)) and WHERE post-filtering, or parallel brute-force k-NN with fused distance+topK
- Utility functions:
VEC_DIMS,VEC_NORM,VEC_TO_TEXT
CREATE TABLE documents (id INTEGER PRIMARY KEY, embedding VECTOR(384));
CREATE INDEX idx_emb ON documents(embedding)
USING HNSW WITH (m = 16, ef_construction = 200, metric = 'cosine');
SELECT id, VEC_DISTANCE_COSINE(embedding, '[0.1, 0.2, ...]') AS dist
FROM documents ORDER BY dist LIMIT 10;Built-in Semantic Search
EMBED()function generates 384-dimensional embeddings using a built-in sentence-transformer model (all-MiniLM-L6-v2), no external API calls needed- Enable with
--features semanticat build time - Combine with HNSW indexing for end-to-end semantic search in pure SQL
-- Generate embeddings at insert time
INSERT INTO documents (title, embedding)
VALUES ('Password Reset Guide', EMBED('How to recover your account'));
-- Semantic search with a natural language query
SELECT title, VEC_DISTANCE_COSINE(embedding, EMBED('forgot my login')) AS dist
FROM documents ORDER BY dist LIMIT 5;ANN Benchmark
Self-contained benchmark binary on the public Fashion-MNIST dataset (60,000 vectors, 784 dimensions, 10,000 queries, single-core, full SQL path):
| Recall | QPS | p95 Latency | Speedup vs brute-force |
|---|---|---|---|
| 95.0% | 10,410 | 0.12 ms | 733x |
| 99.0% | 6,700 | 0.19 ms | 472x |
| 99.9% | 4,159 | 0.33 ms | 293x |
| 100% | 913 | 1.59 ms | 64x |
See ANN Benchmarks for the full report.
# Run the benchmark yourself
RAYON_NUM_THREADS=1 cargo run --release --example ann_benchmark \
--features ann-benchmark -- --sweep --runs 5 --max-queries 10000Other Changes
- VACUUM statement and
PRAGMA VACUUMfor manual cleanup of deleted rows and old versions - Value::Extension refactoring:
Value::Jsonreplaced by tag-in-data pattern keeping Value at 16 bytes with room for future types - Lexer fast path: zero-alloc string literal parsing for non-escaped strings
Bug Fixes
- Fix
#[inline(always)]+#[target_feature]build error on x86_64 - Stabilize flaky CI tests (HNSW recall with deterministic PRNG, table name collision, perf threshold)
- Fix 38 broken relative links across docs (converted to Jekyll link tags)
Docs & Website
- Full website redesign with vector search spotlight, search modal (Cmd/Ctrl+K), accessibility improvements
- Blog post: Vector and Semantic Search in SQL
- Playground: vector search tables and query chips
- Python driver documentation
Full Changelog: v0.3.2...v0.3.3
v0.3.2
What's New in v0.3.2
Rust API
- Transaction named parameters —
execute_named()/query_named()now available onTransaction, matching theDatabaseAPI
Bug Fixes
- UPDATE SET parameter resolution — Fixed
UPDATE t SET col = col + $1 WHERE id = $2failing because positional and named parameters were not passed to SET expression evaluation (resolved to NULL)
Internal
- Zero-copy parameter passing in UPDATE — Arc refcount bump instead of deep-cloning all parameter keys/values into the setter closure
- Unified execution path — Eliminated internal code duplication via
execute_sql_with_ctx - WASM binary updated
Full Changelog: v0.3.1...v0.3.2
v0.3.1
What's New in v0.3.1
SQL Engine
- Cross-type Timestamp/Text comparison —
Value::compare()now supports Timestamp ↔ Text/Json comparison viaparse_timestampfallback - CURRENT_TIME function — Returns
HH:MM:SSformat, alongside existing CURRENT_DATE and CURRENT_TIMESTAMP - RELEASE SAVEPOINT — Full parser, AST, and executor support
- SET/BEGIN isolation level —
SET isolation_levelandBEGINwith isolation level now work correctly (SNAPSHOT, REPEATABLE READ, SERIALIZABLE, READ COMMITTED, READ UNCOMMITTED) - Double-quoted pattern strings — LIKE, ILIKE, GLOB, REGEXP now accept double-quoted identifiers as pattern strings (SQLite compatibility)
- SQLite double-quote fallback — Double-quoted identifiers fall back to string literals when column resolution fails
- Improved parser errors — Context-aware messages showing actual tokens and clause context (e.g., "expected expression after WHERE")
- Implicit type coercions — Integer↔Float and Integer→Boolean coercions at the storage layer
- SHOW CREATE TABLE — Now includes FOREIGN KEY constraints in output
- DESCRIBE improvements — Shows UNI key type for single-column unique indexes
- EXTRACT fields — Added MILLISECOND, MICROSECOND, ISODOW, EPOCH
Rust API
- Cached Plan API —
Database::cached_plan(),execute_plan(),query_plan()for pre-parsed SQL reuse without cache lookup overhead - Prepared statement execution in transactions —
Transaction::execute_prepared()for batch operations with pre-parsed SQL - Zero-clone row cursor —
Rows::advance()/current_row()for zero-clone row iteration (bulk serialization) - ParamVec passthrough —
Paramsimpl forParamVec(identity passthrough), re-exported from lib.rs
Website & Playground
- Node.js driver documentation — New "Drivers" category with complete
@stoolap/nodeAPI reference - WASM playground — Browser-based SQL sandbox with WebAssembly compilation support
- Immersive terminal hero — Auto-scrolling terminal with 16 SQL demo scenes
- Website redesign — Consolidated CSS, new homepage, blog, and layout templates
Documentation Fixes
- Fix
row.get("name")→row.get_by_name("name")across 7 doc files - Fix PRAGMA
create_snapshot→snapshot(matches implementation) - Add 8 missing connection string options to persistence docs
- Fix RowVersion struct in MVCC docs (remove non-existent fields)
- Fix DATEDIFF signature, CAST(NULL) behavior, CTE+INSERT limitation
- Add STRING_AGG native ORDER BY syntax, recursive CTE iteration limit
- Complete rewrite of sql-functions-reference covering all 110 functions
- Fix CLI flag
-q→-efor executing queries
Internal
- Feature-gate rayon parallelism behind
parallelfeature flag with sequential fallbacks for WASM - Gate
thread::spawn/sleepfor WASM targets - Add
time_compatshim (std::timevsweb_time) for WASM Instant/SystemTime - Fix NaN/Infinity panic in WASM value serialization
- Fix auto-increment to follow schema flag, not implicit INTEGER PK
Full Changelog: v0.3.0...v0.3.1
v0.3.0
What's New in v0.3.0
This release brings foreign key constraint enforcement, a crash-safe WAL/snapshot system, and significant MVCC performance improvements with reduced memory footprint and better concurrency.
Foreign Key Constraints
Full referential integrity enforcement with three referential actions:
- RESTRICT (default): Block parent deletion/update when children exist
- CASCADE: Propagate deletes/updates to child rows (recursive, depth limit 16)
- SET NULL: Set FK columns to NULL when parent is deleted/updated
Key features:
- Column-level
REFERENCESand table-levelFOREIGN KEYsyntax withON DELETE/ON UPDATE - DDL validation: parent table must exist, referenced column must be PK/UNIQUE
- Enforcement on INSERT, UPDATE, DELETE, TRUNCATE, and DROP TABLE
- Pre-validation of constant FK values in explicit transactions to prevent dirty state
- Cached reverse FK mapping with schema epoch invalidation
- Auto-created indexes on FK columns for efficient cascade operations
- WAL + snapshot persistence for FK metadata
- DROP TABLE strips orphaned FK constraints from child schemas
Crash-Safe WAL/Snapshot System
- Safe WAL truncation: only truncate to 2nd-to-last CRC-verified snapshot
- Snapshot fallback loading: try older snapshots when latest is corrupted
- Remove stale
checkpoint.metawhen all snapshots fail to load - Use
min(header_lsn)across tables for crash-safe replay - Capture
commit_seqAFTER checkpoint to prevent data loss window - Clean up orphaned snapshot directories from dropped tables
- CRC-aware snapshot cleanup: corrupt files don't count toward keep_count
- WAL rotation after DDL/DML commits to prevent unbounded growth
- Sort WAL files by embedded LSN instead of lexicographic order
Transaction Safety
- Write WAL COMMIT marker before making changes visible in registry
- Abort transaction on Phase 3 WAL write failure to prevent registry leak
- Restore transaction on API commit failure so rollback remains possible
- Fix file lock race: acquire lock before truncating PID file
- Fix registry
override_countunderflow from mismatchedfetch_sub - Fix race condition where transaction was briefly absent from both maps during commit
- Fix
abort_transactionto not resurrect already-committed transactions
ALTER TABLE
- Implement
RENAME COLUMNandMODIFY COLUMNwith dual schema updates (version store + cached schema) - Column existence checks in pushdown rules to prevent invalid storage-level predicates
- Full WAL and snapshot durability for ALTER TABLE operations
Performance Improvements
CompactArc Header (24 → 16 bytes)
- Compile-time drop dispatch via
CompactArcDroptrait with monomorphization - Eliminates stored function pointer, replaces indirect call with direct call
CowHashMap for Transaction Registry
- O(1) snapshot cloning for lock-free iteration
- Replace
DashMap<i64, TxnState>withMutex<CowHashMap<TxnState>> - Thread-local caching in
is_directly_visible()minimizes lock contention
MVCC Memory Reduction
- Remove
create_timefrom ArenaRowMeta (8 bytes saved per row) - Committed transactions removed from map (implicit state)
- Pack TxnState into 16 bytes with bit manipulation
- Separate
snapshot_seqsmap for snapshot isolation commit sequences
Batch Index Operations
add_batch_sliceandremove_batch_slicefor single-lock operations (O(1) locks instead of O(N))- Two-phase commit (validate-then-modify) with rollback support
- Peak memory usage reduced by ~10% (verified with DHAT)
PkIndex: Hybrid Primary Key Index
- Hybrid bitset + I64Set with O(1) lookups
- Speculative arena probe replaces
row_arena_indexHashMap (saves ~40 bytes/row) - CowBTree reverse iterators for O(limit) descending ORDER BY
Bug Fixes
- Fix transaction-local visibility: replace
txn_versions.get()withget_local_version()in 11 lookup paths - Fix hash collision bug in HashIndex where same hash was treated as same values
- Fix conflict detection to properly catch UPDATE conflicts via
get_latest_version_id() - Fix historical version
arena_idx(must be None, slot reused by HEAD) - Fix
next_txn_idrecovery to not skip transaction IDs - Fix partial commit handling:
commit_all_tablesreturns(bool, Option<Error>) - Fix
record_commiterror propagation (was silently swallowed) - Consolidate duplicate error variants for consistent error messages across 27 files
- CAST(NULL AS type) now returns typed NULL per SQL standard
- Add overflow guards:
checked_neg/checked_absfori64::MIN,checked_addin SUM - Fix CAST text→float→integer to reject inf/NaN/out-of-range
- Fix ILIKE pattern matching to prevent panics on multi-byte UTF-8
- Fix AM/PM
format_timestampsequential replacement interference - Fix
expressions_equivalentfor In, Like, FunctionCall, Window, Between
Other Changes
- TRUNCATE TABLE with WAL persistence and transaction safety checks
- Savepoint DDL rollback support (CREATE/DROP TABLE)
- Named parameter support (
:name) in PK fast path and DML fast path - EXPLAIN plan colorization in CLI with ANSI colors
- Snapshot isolation guards on arena fast paths
- i128 accumulator for SUM/AVG to prevent integer overflow
- Comprehensive durability test suite
- Expanded documentation for SQL features
Full Changelog: v0.2.4...v0.3.0
v0.2.4
What's New in v0.2.4
This release focuses on memory optimization and MVCC performance, reducing memory footprint by 33% for core data structures and introducing a copy-on-write B+ tree for O(1) MVCC snapshots.
Performance Improvements
CowBTree for O(1) MVCC Snapshots
- Replace
RwLock<BTreeMap>with copy-on-write B+ tree enabling structural sharing - Readers clone tree root (atomic increment) then iterate without holding locks
- Dual refcount system for correct concurrent drop coordination
- Rightmost split optimization for sequential inserts
Value Size Reduction (24 → 16 bytes, 33% smaller)
- Redesigned SmartString with packed struct and tag byte for niche optimization
- Extended CompactArc to support DSTs (str, [T]) with thin pointers (8 bytes)
Option<Value>also 16 bytes with no discriminant overhead
MVCC Memory Footprint Reduction
- Remove
row_idfrom RowVersion (8 bytes saved per version) - Remove
chain_depthfrom VersionChainEntry (8 bytes saved) - Use
NonZeroU64/NonZeroUsizefor arena indices (RowIndex: 24 → 16 bytes) - V2 persistence format without redundant row_id field (backward compatible)
Subquery Caching (5.3x faster)
- Cross-query subquery caching with table-based invalidation
- Cache entries track referenced tables for selective invalidation on DML
Bug Fixes
- Fix double-free in CowBTree internal node merge
- Fix dirty read vulnerability:
Database::clone()now creates independent executor - Fix join projection column ordering with ColumnSource enum
- Fix ORDER BY + LIMIT with Hash index fallback Thanks to @nhansiromeo
- Fix FULL OUTER JOIN null row handling in nested loop join
Other Changes
- Migrate Value-keyed maps to AHash for HashDoS resistance
- Simplify Row storage from 3 variants to 2 (Shared/Owned)
- Add LRU bounds to thread-local caches (scalar/IN subquery: 128, semi-join: 256)
- Add comprehensive SAFETY comments to unsafe blocks
- Panic safety fixes in CompactVec (clone, extend, from_iter)
Full Changelog: v0.2.3...v0.2.4
v0.2.3
What's New in v0.2.3
SmartString - Custom SSO String
- Inline storage for strings ≤22 bytes (no heap allocation)
- Owned (
Box<str>) for computed values, Shared (Arc<str>) for cloned values - 24-byte size with O(1) clone for shared variant
I64Map - High-Performance Hashmap for i64 Keys
- Uses
i64::MINas empty sentinel (row/txn IDs are always ≥0) - FxHash-based with linear probing and backward-shift deletion
- ~45% faster lookups than
FxHashMap<i64, V>
Value Interning
- Interned: NULL (all 7 DataTypes), booleans, integers 0-1000
- Reduces allocations for frequently used values in rows
Aggregation Fixes & Optimizations
- Fixed hash collision bug in single-column GROUP BY (was using hash as key)
- Added 2-column tuple optimization (30% faster than
Vec<Value>) - Single-column primitive GROUP BY uses I64Map directly
SIMD Pattern Matching
memchr::memmemfor LIKE '%pattern%' substring search- Pre-compiled Finder stored in CompiledPattern
O(1) COUNT(*) and COUNT(DISTINCT) Fast Paths
- Committed row count atomic counter for O(1) COUNT(*)
get_distinct_count_excluding_null()in Index trait for O(1) COUNT(DISTINCT)- Compiled query cache for COUNT queries with schema epoch validation
- Performance: COUNT(*) ~22.5µs → ~2.9µs (7.7x faster)
Background Cleanup and Memory Management
- Background cleanup thread for periodic garbage collection
- Configurable via
CleanupConfig(interval, retention periods) - Arena slot reuse to prevent unbounded memory growth
- Memory at program end: 665 KB → 153.6 KB (77% reduction)
Global Pool and Cache Cleanup
clear_version_map_pools()for TransactionVersions poolsclear_program_cache()for expression bytecode LRU cacheclear_classification_cache()for query classification LRU cache
Other Improvements
- CompactVec: new
insert(),retain(),drain()methods - All indexes migrated to I64Map and CompactVec
- Index trait
_intomethods to avoid allocations - RowIdVec pooled vector for index lookups (256K max cached capacity)
- BTree index optimization using Borrow trait for lookups
- Thorough SAFETY documentation on unsafe code
Full Changelog: v0.2.2...v0.2.3