Skip to content

Releases: stoolap/stoolap

v0.4.0

01 Apr 21:01
9134f44

Choose a tag to compare

What's New in v0.4.0

Immutable Volume Storage Engine

Stoolap now splits each table into a hot MVCC buffer (recent writes, WAL-backed) and cold frozen volumes (historical data, column-major). The query engine merges both sources transparently.

Cold volume format (STV4):

  • Per-column per-row-group LZ4 compression with streaming CRC32 verification
  • Zone maps (min/max per column per 64K row group) for scan pruning
  • Bloom filters for point-lookup acceleration
  • Dictionary encoding for text columns
  • Deferred column loading with tiered eviction (hot/warm/cold memory tiers)

Lifecycle:

  • PRAGMA CHECKPOINT seals hot rows into immutable .vol files, persists manifests, truncates WAL
  • Bounded compaction: sub-target volumes merge, oversized volumes split, dirty volumes rewrite
  • Background compaction thread (non-blocking checkpoint cycles)
  • Cutoff-filtered seal and compaction during snapshot isolation transactions

Crash safety:

  • Fsync-before-rename on all atomic writes (volumes, manifests, catalog)
  • Two-phase WAL: only committed transactions applied during recovery
  • Manifests loaded before WAL replay for idempotent recovery

Columnar Aggregate Pushdown

Filtered and grouped aggregates computed directly on raw column arrays without constructing Row objects:

  • Filtered aggregates: SUM/COUNT/MIN/MAX/AVG with typed predicates on i64/f64/dictionary columns
  • Grouped aggregates: Single-column GROUP BY with dictionary-indexed accumulators (zero hashing) or FxHashMap for numeric columns
  • Dictionary DISTINCT: Extracts unique values from dictionary metadata without scanning rows
  • Zone-map pruning: Volumes and row groups skipped when predicates prove no match
  • IN list pruning: WHERE id IN (...) generates min/max bounds for zone-map elimination

Query Performance

  • ORDER BY PK + LIMIT: K-way merge across sorted volume row_ids, stopping after limit rows
  • MIN/MAX typed scan: Direct i64/f64/timestamp access with zone-map volume pruning
  • OFFSET skip: Cold scan skips row materialization for offset rows
  • Parallel cold scanning: Rayon-based parallel volume processing (4+ volumes, 100K+ rows threshold)

Foreign Key Improvements

  • Recursive ON UPDATE CASCADE: Cascades through the full FK chain including grandchild tables, with depth limiting (16 levels)
  • Referenced UNIQUE column cascade: FK cascade generalized from PK-only to any referenced UNIQUE column
  • Pre-check RESTRICT: RESTRICT constraints checked before writing parent rows, preserving statement-level atomicity in explicit transactions
  • SET NULL recursion: SET NULL arm recurses through descendants so deeper RESTRICT checks are enforced

Primary Key Update Protection

UPDATE on primary key columns is now rejected with a clear error message. The engine uses row_id == pk_value as a core invariant across ~50 code paths. This matches SQLite's behavior for rowid tables. Use DELETE + INSERT to change a row's primary key value.

Calendar-Aware INTERVAL Arithmetic

INTERVAL '1 month' and INTERVAL '1 year' now use proper calendar logic instead of 30-day/365-day approximations. Handles leap years, variable month lengths, and preserves nanosecond precision. Matches DATE_ADD behavior.

Schema Evolution Fixes

  • CREATE INDEX after DROP+ADD COLUMN: validate_cold_unique and populate_index_from_cold now use column mapping to correctly translate schema indices to physical volume indices
  • AS OF PK dedup: Resolves PK column through mapping for correct cold row dedup after schema evolution
  • Partition grouping: Uses snapshotted cs.mapping instead of live lookup (immune to compaction races)

ALTER TABLE MODIFY COLUMN Validation

MODIFY COLUMN ... NOT NULL now validates existing data with a streaming IS NULL scan before applying the constraint. Returns a clear error if any NULL values exist.

Data Integrity

  • Compaction TOCTOU fix: Snapshot sequence limit captured per-table (matches seal's per-table pattern)
  • Manifest truncation errors: Tombstones, column renames, and dropped columns return errors instead of silent data loss on truncated manifests
  • Volume corruption guards: Dictionary IDs and bytes offsets validated at deserialization, preventing panics on corrupted volume files
  • Aggregation pushdown correctness: Bail on partial WHERE pushdown (prevents wrong results when memory filter is needed)
  • Scanner column pruning: Materialize all columns when filter column indices can't be determined
  • Seal race fix: collect_rows_with_limit uses collect_hot_row_ids_into instead of has_row_id point lookups

Configuration

New DSN parameters for volume storage tuning:

Parameter Default Description
checkpoint_interval 60 Seconds between checkpoint cycles
compact_threshold 4 Sub-target volumes per table before merging
target_volume_rows 1048576 Target rows per cold volume (min 65536)
checkpoint_on_close on Seal all hot rows on clean shutdown
volume_compression on LZ4 compression for cold volume files
sync_mode normal none/off, normal, full (or 0, 1, 2)

Invalid DSN parameter values now return errors instead of silently using defaults.

Migration from v0.3.7

Existing v0.3.7 databases are automatically migrated on first open:

  1. Legacy snapshot .bin files loaded into hot buffer
  2. WAL entries replayed
  3. Hot data sealed into immutable .vol files
  4. snapshots/ directory removed

Legacy DSN parameter names (snapshot_interval, snapshot_compression) are accepted for backward compatibility.

Other Changes

  • Build timestamp embedded in version_info() output
  • CLI error paths ensure database cleanup before exit
  • Stale group cache cleared in volume scanner (prevents panic at row-group boundaries)
  • Eviction epoch off-by-one corrected
  • WASM binary rebuilt with warning-free compilation

Full Changelog: v0.3.7...v0.4.0

v0.3.7

13 Mar 03:39

Choose a tag to compare

What's New in v0.3.7

PostgreSQL-Inspired DISTINCT ON

New DISTINCT ON (expr, ...) syntax for per-group deduplication, returning the first row for each unique combination of the specified expressions.

  • Hash-based dedup with O(groups) memory, correctly handles arbitrary ORDER BY patterns including non-leading sort orders
  • Pipeline order: ORDER BY, DISTINCT ON, column removal, LIMIT/OFFSET
  • Works across all query paths: single-table scans, JOINs, CTEs, subqueries, and complex ORDER BY
  • Supports aliased keys, computed expressions, keys not in SELECT, qualified identifiers, and NULL equality
  • Guards distinct index pushdown to prevent bypassing key-based dedup
-- First (highest) order per customer
SELECT DISTINCT ON (customer) customer, amount, order_date
FROM orders
ORDER BY customer, amount DESC;

-- Per-group dedup on joins with qualified keys
SELECT DISTINCT ON (c.name) c.name, p.amount
FROM customers c JOIN purchases p ON c.id = p.customer_id
ORDER BY c.name, p.amount DESC;

ON CONFLICT Upsert (PostgreSQL-Style)

  • ON CONFLICT (cols) DO UPDATE SET and DO NOTHING syntax
  • EXCLUDED pseudo-table to reference attempted insert values
  • Conflict target matching against PK and composite unique constraints
  • CHECK constraint validation during upsert updates
  • RETURNING clause and INSERT ... SELECT support for conflict handling
INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email;

Constant Folding and Non-Deterministic Function Tracking

  • Compile-time constant folding for deterministic column-free expressions
  • FunctionInfo.deterministic flag with registry-based lookup
  • NOW, CURRENT_DATE, CURRENT_TIMESTAMP, RANDOM, SLEEP, EMBED marked non-deterministic
  • Semantic cache bypass for queries with non-deterministic functions
  • Pushdown evaluator for stable expressions like NOW() - INTERVAL '24h'

GROUP BY and Aggregation Optimizations

  • 3-column GROUP BY uses tuple keys (no Vec heap allocation)
  • 4+ columns use direct AHashMap (replaces hash-collision approach)
  • Early termination extended to 1, 2, and 3-column paths
  • FIRST/LAST aggregates support ORDER BY with O(1) sort-key tracking

Snapshot and Persistence

  • Persist default_value in both WAL and snapshot serialization
  • Re-record index and view DDL to WAL during snapshots so they survive truncation
  • Skip snapshot when WAL has not grown since last snapshot
  • Auto-snapshot loop sleeps min(cleanup, snapshot) interval
  • HNSW m and ef_construction exposed on Index trait for persistence

Performance

  • Restore panic = "abort" in release profile, recovering 5-15% across all benchmarks
  • Dev profile optimizations and thin LTO for faster builds

Parser and SQL Compatibility

  • CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP parsed as niladic functions (SQL standard, no parentheses required)
  • Keyword identifiers fold to lowercase (PostgreSQL compatibility)
  • Parse ISO 8601 timestamps with fractional seconds and UTC Z suffix
  • Keywords accepted as column names in SET assignments

Bug Fixes

  • Fix transaction INSERT with partial column lists: delegate to full executor pipeline for correct column mapping, default values, type coercion, and FK validation
  • Fix qualified column ambiguity in joins: ORDER BY and DISTINCT ON with qualified identifiers (e.g., c.name vs p.name) now correctly resolve when both joined tables have the same column name
  • Fix DISTINCT ON key resolution after projection: qualified keys resolve correctly when projected as bare names or under aliases
  • Fix classification cache: include DISTINCT ON expressions in hash key to prevent stale cache hits
  • Fix TimeTruncFunction duration cache miss on zero-duration values

Documentation

  • Go driver docs synced with stoolap-go README
  • Driver icons with official brand colors in docs sidebar and page headers

Full Changelog: v0.3.5...v0.3.7

v0.3.5

11 Mar 13:57

Choose a tag to compare

What's New in v0.3.5

FFI Panic Safety

  • panic = "unwind" in release profile so that catch_unwind boundaries in the C FFI layer work correctly. Previously panic = "abort" made catch_unwind a no-op, meaning any Rust panic would abort the host process (MCP server, Node.js, Python, PHP, Go).
  • Removed unused release-ffi profile. All drivers already build with --release --features ffi.
  • Added staticlib crate type for the Go driver's bundled static libraries.

MCP Server Improvements

  • stoolap://sql-reference resource added for discoverability. Delivers the same live schema and complete SQL reference as the sql-assistant prompt, but as an MCP resource that clients can attach without prompt support.

Go Driver Documentation

  • New comprehensive Go driver documentation covering both the Direct API and the database/sql driver, with examples for transactions, prepared statements, vector search, bulk fetch, JSON, NULL handling, and concurrency patterns.

Documentation

  • Updated all FFI build instructions from --profile release-ffi to --release --features ffi across C API docs, header file, benchmark example, building guide, and testing guide
  • Updated release profile description from panic = "abort" to panic = "unwind" in building docs
  • Reordered driver pages: Node.js, Python, PHP, Go, WASM, C API, MCP Server

Bug Fixes

  • Fix view column aliasing: strip table alias prefix from QualifiedIdentifier output column names in post-aggregation expressions (u.username -> username)
  • Fix window functions on views: materialize view rows and delegate to execute_select_with_window_functions
  • Fix panic in projection compilation: replace .expect() panics in ExprMappedResult::with_defaults and FilteredResult::with_defaults with proper Result propagation

Full Changelog: v0.3.4...v0.3.5

v0.3.4

09 Mar 18:50

Choose a tag to compare

What's New in v0.3.4

C FFI Layer

Complete C API for embedding Stoolap in any language that can call C functions. Feature-gated with --features ffi.

  • Opaque handle API with step-based iteration, per-handle error storage, and panic-safe catch_unwind boundaries
  • Bulk fetch API (stoolap_rows_fetch_all) transfers entire result sets in a single packed binary buffer, eliminating per-row FFI overhead
  • Prepared statements with pre-compiled plans that bypass cache lookup on every execution
  • Transaction support with isolation level control (begin, commit, rollback)
  • Parameter binding for both positional ($1, $2) and named (:key) parameters
  • Full C header (include/stoolap.h) with documented binary format spec
StoolapDb *db = stoolap_open(":memory:");
stoolap_exec(db, "CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)");

StoolapStmt *stmt = stoolap_prepare(db, "INSERT INTO users VALUES ($1, $2)");
StoolapValue params[] = { stoolap_int(1), stoolap_text("Alice") };
stoolap_stmt_execute(stmt, params, 2);

Table-Valued Functions

  • GENERATE_SERIES for integer, float, and timestamp types
  • LIMIT pushdown, WHERE range clamping, JOIN support, and EXPLAIN integration
  • TVF infrastructure (parser, AST, executor, registry) for future functions
SELECT * FROM GENERATE_SERIES(1, 10);
SELECT * FROM GENERATE_SERIES('2026-01-01', '2026-12-31', '1 month');
SELECT g.value, t.name FROM GENERATE_SERIES(1, 5) g JOIN tasks t ON t.id = g.value;

Hash and String Functions

  • Hash/checksum: MD5, SHA1, SHA256, SHA384, SHA512, CRC32
  • String: STARTS_WITH, ENDS_WITH, CONTAINS

Query Optimizer Improvements

  • Mixed OR hybrid optimization for queries with both indexed and non-indexed OR branches (index lookup + filtered scan with dedup)
  • Relaxed multi-column index prefix rule to allow single-column prefix matching
  • Trailing range scan on composite indexes after equality prefix
  • BETWEEN decomposition to range comparisons for index use
  • Subquery index probe safety guard with outer refs or nested subqueries
  • Aggregation guard to prevent fast-path optimizations on aggregate queries

EXPLAIN Improvements

  • Partition WHERE predicates to correct join sides
  • Show residual filters on index scans
  • Handle ROLLUP/CUBE/GROUPING SETS display

Stoolap Studio

  • Documentation page with features overview, installation guide, quick tour, and keyboard shortcuts
  • Responsive dark/light mode screenshots on docs site and README

Bug Fixes

  • Fix undefined behavior in FFI bulk fetch buffer deallocation: into_boxed_slice() guarantees len == capacity for safe Vec::from_raw_parts reconstruction
  • Fix CowHashMap Stacked Borrows violation in drop and backward-shift deletion
  • Fix I64Map Stacked Borrows violation in backward-shift deletion
  • Fix CowBTree Stacked Borrows violations across all node pointer accesses
  • Fix CompactArc Stacked Borrows violation: derive data pointer via raw arithmetic instead of Deref
  • Restore Expression::Case match arm in process_where_subqueries accidentally deleted by mutation testing
  • Fix qualified outer column resolution in expression compiler to prevent incorrect binding when inner row shadows outer column names
  • Fix parser handling of consecutive semicolons (SELECT 1;;)
  • Reject bare expression statements in prepare() to catch typos like SELECTX at prepare time

Testing & CI

  • 14 FFI aggregate function tests (COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING, JOINs, subqueries)
  • 67 unit tests for outer reference detection covering all match arms
  • 13 integration tests for correlated subquery outer reference paths
  • MVCC TransactionRegistry unit tests
  • sqllogictest suite with comprehensive .slt test files
  • Failpoint infrastructure for I/O error injection testing
  • Nightly CI: mutation testing (24 shards), Miri rotation (5 module groups), stress tests, sanitizers
  • TSAN inline suppression for CompactArc false positive

Documentation

  • C driver page with full API reference and bulk fetch binary format spec
  • Node.js driver updated for N-API C addon architecture
  • Development category: testing, limitations, building from source, contributing
  • Changelog page fetching releases from GitHub API
  • Unified header styles and grid overlay across docs site

Full Changelog: v0.3.3...v0.3.4

v0.3.3

28 Feb 18:29

Choose a tag to compare

What's New in v0.3.3

Vector Search Engine

  • VECTOR(N) column type with packed binary storage for embedding vectors of any dimension
  • HNSW index for approximate nearest neighbor search with configurable parameters (m, ef_construction, ef_search, distance metric)
  • Distance functions: VEC_DISTANCE_L2, VEC_DISTANCE_COSINE, VEC_DISTANCE_IP, and <=> operator
  • Vector search optimizer with HNSW index path (O(log N)) and WHERE post-filtering, or parallel brute-force k-NN with fused distance+topK
  • Utility functions: VEC_DIMS, VEC_NORM, VEC_TO_TEXT
CREATE TABLE documents (id INTEGER PRIMARY KEY, embedding VECTOR(384));

CREATE INDEX idx_emb ON documents(embedding)
  USING HNSW WITH (m = 16, ef_construction = 200, metric = 'cosine');

SELECT id, VEC_DISTANCE_COSINE(embedding, '[0.1, 0.2, ...]') AS dist
FROM documents ORDER BY dist LIMIT 10;

Built-in Semantic Search

  • EMBED() function generates 384-dimensional embeddings using a built-in sentence-transformer model (all-MiniLM-L6-v2), no external API calls needed
  • Enable with --features semantic at build time
  • Combine with HNSW indexing for end-to-end semantic search in pure SQL
-- Generate embeddings at insert time
INSERT INTO documents (title, embedding)
VALUES ('Password Reset Guide', EMBED('How to recover your account'));

-- Semantic search with a natural language query
SELECT title, VEC_DISTANCE_COSINE(embedding, EMBED('forgot my login')) AS dist
FROM documents ORDER BY dist LIMIT 5;

ANN Benchmark

Self-contained benchmark binary on the public Fashion-MNIST dataset (60,000 vectors, 784 dimensions, 10,000 queries, single-core, full SQL path):

Recall QPS p95 Latency Speedup vs brute-force
95.0% 10,410 0.12 ms 733x
99.0% 6,700 0.19 ms 472x
99.9% 4,159 0.33 ms 293x
100% 913 1.59 ms 64x

See ANN Benchmarks for the full report.

# Run the benchmark yourself
RAYON_NUM_THREADS=1 cargo run --release --example ann_benchmark \
  --features ann-benchmark -- --sweep --runs 5 --max-queries 10000

Other Changes

  • VACUUM statement and PRAGMA VACUUM for manual cleanup of deleted rows and old versions
  • Value::Extension refactoring: Value::Json replaced by tag-in-data pattern keeping Value at 16 bytes with room for future types
  • Lexer fast path: zero-alloc string literal parsing for non-escaped strings

Bug Fixes

  • Fix #[inline(always)] + #[target_feature] build error on x86_64
  • Stabilize flaky CI tests (HNSW recall with deterministic PRNG, table name collision, perf threshold)
  • Fix 38 broken relative links across docs (converted to Jekyll link tags)

Docs & Website

  • Full website redesign with vector search spotlight, search modal (Cmd/Ctrl+K), accessibility improvements
  • Blog post: Vector and Semantic Search in SQL
  • Playground: vector search tables and query chips
  • Python driver documentation

Full Changelog: v0.3.2...v0.3.3

v0.3.2

19 Feb 10:50

Choose a tag to compare

What's New in v0.3.2

Rust API

  • Transaction named parametersexecute_named() / query_named() now available on Transaction, matching the Database API

Bug Fixes

  • UPDATE SET parameter resolution — Fixed UPDATE t SET col = col + $1 WHERE id = $2 failing because positional and named parameters were not passed to SET expression evaluation (resolved to NULL)

Internal

  • Zero-copy parameter passing in UPDATE — Arc refcount bump instead of deep-cloning all parameter keys/values into the setter closure
  • Unified execution path — Eliminated internal code duplication via execute_sql_with_ctx
  • WASM binary updated

Full Changelog: v0.3.1...v0.3.2

v0.3.1

18 Feb 11:20

Choose a tag to compare

What's New in v0.3.1

SQL Engine

  • Cross-type Timestamp/Text comparisonValue::compare() now supports Timestamp ↔ Text/Json comparison via parse_timestamp fallback
  • CURRENT_TIME function — Returns HH:MM:SS format, alongside existing CURRENT_DATE and CURRENT_TIMESTAMP
  • RELEASE SAVEPOINT — Full parser, AST, and executor support
  • SET/BEGIN isolation levelSET isolation_level and BEGIN with isolation level now work correctly (SNAPSHOT, REPEATABLE READ, SERIALIZABLE, READ COMMITTED, READ UNCOMMITTED)
  • Double-quoted pattern strings — LIKE, ILIKE, GLOB, REGEXP now accept double-quoted identifiers as pattern strings (SQLite compatibility)
  • SQLite double-quote fallback — Double-quoted identifiers fall back to string literals when column resolution fails
  • Improved parser errors — Context-aware messages showing actual tokens and clause context (e.g., "expected expression after WHERE")
  • Implicit type coercions — Integer↔Float and Integer→Boolean coercions at the storage layer
  • SHOW CREATE TABLE — Now includes FOREIGN KEY constraints in output
  • DESCRIBE improvements — Shows UNI key type for single-column unique indexes
  • EXTRACT fields — Added MILLISECOND, MICROSECOND, ISODOW, EPOCH

Rust API

  • Cached Plan APIDatabase::cached_plan(), execute_plan(), query_plan() for pre-parsed SQL reuse without cache lookup overhead
  • Prepared statement execution in transactionsTransaction::execute_prepared() for batch operations with pre-parsed SQL
  • Zero-clone row cursorRows::advance() / current_row() for zero-clone row iteration (bulk serialization)
  • ParamVec passthroughParams impl for ParamVec (identity passthrough), re-exported from lib.rs

Website & Playground

  • Node.js driver documentation — New "Drivers" category with complete @stoolap/node API reference
  • WASM playground — Browser-based SQL sandbox with WebAssembly compilation support
  • Immersive terminal hero — Auto-scrolling terminal with 16 SQL demo scenes
  • Website redesign — Consolidated CSS, new homepage, blog, and layout templates

Documentation Fixes

  • Fix row.get("name")row.get_by_name("name") across 7 doc files
  • Fix PRAGMA create_snapshotsnapshot (matches implementation)
  • Add 8 missing connection string options to persistence docs
  • Fix RowVersion struct in MVCC docs (remove non-existent fields)
  • Fix DATEDIFF signature, CAST(NULL) behavior, CTE+INSERT limitation
  • Add STRING_AGG native ORDER BY syntax, recursive CTE iteration limit
  • Complete rewrite of sql-functions-reference covering all 110 functions
  • Fix CLI flag -q-e for executing queries

Internal

  • Feature-gate rayon parallelism behind parallel feature flag with sequential fallbacks for WASM
  • Gate thread::spawn/sleep for WASM targets
  • Add time_compat shim (std::time vs web_time) for WASM Instant/SystemTime
  • Fix NaN/Infinity panic in WASM value serialization
  • Fix auto-increment to follow schema flag, not implicit INTEGER PK

Full Changelog: v0.3.0...v0.3.1

v0.3.0

15 Feb 09:03

Choose a tag to compare

What's New in v0.3.0

This release brings foreign key constraint enforcement, a crash-safe WAL/snapshot system, and significant MVCC performance improvements with reduced memory footprint and better concurrency.

Foreign Key Constraints

Full referential integrity enforcement with three referential actions:

  • RESTRICT (default): Block parent deletion/update when children exist
  • CASCADE: Propagate deletes/updates to child rows (recursive, depth limit 16)
  • SET NULL: Set FK columns to NULL when parent is deleted/updated

Key features:

  • Column-level REFERENCES and table-level FOREIGN KEY syntax with ON DELETE/ON UPDATE
  • DDL validation: parent table must exist, referenced column must be PK/UNIQUE
  • Enforcement on INSERT, UPDATE, DELETE, TRUNCATE, and DROP TABLE
  • Pre-validation of constant FK values in explicit transactions to prevent dirty state
  • Cached reverse FK mapping with schema epoch invalidation
  • Auto-created indexes on FK columns for efficient cascade operations
  • WAL + snapshot persistence for FK metadata
  • DROP TABLE strips orphaned FK constraints from child schemas

Crash-Safe WAL/Snapshot System

  • Safe WAL truncation: only truncate to 2nd-to-last CRC-verified snapshot
  • Snapshot fallback loading: try older snapshots when latest is corrupted
  • Remove stale checkpoint.meta when all snapshots fail to load
  • Use min(header_lsn) across tables for crash-safe replay
  • Capture commit_seq AFTER checkpoint to prevent data loss window
  • Clean up orphaned snapshot directories from dropped tables
  • CRC-aware snapshot cleanup: corrupt files don't count toward keep_count
  • WAL rotation after DDL/DML commits to prevent unbounded growth
  • Sort WAL files by embedded LSN instead of lexicographic order

Transaction Safety

  • Write WAL COMMIT marker before making changes visible in registry
  • Abort transaction on Phase 3 WAL write failure to prevent registry leak
  • Restore transaction on API commit failure so rollback remains possible
  • Fix file lock race: acquire lock before truncating PID file
  • Fix registry override_count underflow from mismatched fetch_sub
  • Fix race condition where transaction was briefly absent from both maps during commit
  • Fix abort_transaction to not resurrect already-committed transactions

ALTER TABLE

  • Implement RENAME COLUMN and MODIFY COLUMN with dual schema updates (version store + cached schema)
  • Column existence checks in pushdown rules to prevent invalid storage-level predicates
  • Full WAL and snapshot durability for ALTER TABLE operations

Performance Improvements

CompactArc Header (24 → 16 bytes)

  • Compile-time drop dispatch via CompactArcDrop trait with monomorphization
  • Eliminates stored function pointer, replaces indirect call with direct call

CowHashMap for Transaction Registry

  • O(1) snapshot cloning for lock-free iteration
  • Replace DashMap<i64, TxnState> with Mutex<CowHashMap<TxnState>>
  • Thread-local caching in is_directly_visible() minimizes lock contention

MVCC Memory Reduction

  • Remove create_time from ArenaRowMeta (8 bytes saved per row)
  • Committed transactions removed from map (implicit state)
  • Pack TxnState into 16 bytes with bit manipulation
  • Separate snapshot_seqs map for snapshot isolation commit sequences

Batch Index Operations

  • add_batch_slice and remove_batch_slice for single-lock operations (O(1) locks instead of O(N))
  • Two-phase commit (validate-then-modify) with rollback support
  • Peak memory usage reduced by ~10% (verified with DHAT)

PkIndex: Hybrid Primary Key Index

  • Hybrid bitset + I64Set with O(1) lookups
  • Speculative arena probe replaces row_arena_index HashMap (saves ~40 bytes/row)
  • CowBTree reverse iterators for O(limit) descending ORDER BY

Bug Fixes

  • Fix transaction-local visibility: replace txn_versions.get() with get_local_version() in 11 lookup paths
  • Fix hash collision bug in HashIndex where same hash was treated as same values
  • Fix conflict detection to properly catch UPDATE conflicts via get_latest_version_id()
  • Fix historical version arena_idx (must be None, slot reused by HEAD)
  • Fix next_txn_id recovery to not skip transaction IDs
  • Fix partial commit handling: commit_all_tables returns (bool, Option<Error>)
  • Fix record_commit error propagation (was silently swallowed)
  • Consolidate duplicate error variants for consistent error messages across 27 files
  • CAST(NULL AS type) now returns typed NULL per SQL standard
  • Add overflow guards: checked_neg/checked_abs for i64::MIN, checked_add in SUM
  • Fix CAST text→float→integer to reject inf/NaN/out-of-range
  • Fix ILIKE pattern matching to prevent panics on multi-byte UTF-8
  • Fix AM/PM format_timestamp sequential replacement interference
  • Fix expressions_equivalent for In, Like, FunctionCall, Window, Between

Other Changes

  • TRUNCATE TABLE with WAL persistence and transaction safety checks
  • Savepoint DDL rollback support (CREATE/DROP TABLE)
  • Named parameter support (:name) in PK fast path and DML fast path
  • EXPLAIN plan colorization in CLI with ANSI colors
  • Snapshot isolation guards on arena fast paths
  • i128 accumulator for SUM/AVG to prevent integer overflow
  • Comprehensive durability test suite
  • Expanded documentation for SQL features

Full Changelog: v0.2.4...v0.3.0

v0.2.4

24 Jan 07:28

Choose a tag to compare

What's New in v0.2.4

This release focuses on memory optimization and MVCC performance, reducing memory footprint by 33% for core data structures and introducing a copy-on-write B+ tree for O(1) MVCC snapshots.

Performance Improvements

CowBTree for O(1) MVCC Snapshots

  • Replace RwLock<BTreeMap> with copy-on-write B+ tree enabling structural sharing
  • Readers clone tree root (atomic increment) then iterate without holding locks
  • Dual refcount system for correct concurrent drop coordination
  • Rightmost split optimization for sequential inserts

Value Size Reduction (24 → 16 bytes, 33% smaller)

  • Redesigned SmartString with packed struct and tag byte for niche optimization
  • Extended CompactArc to support DSTs (str, [T]) with thin pointers (8 bytes)
  • Option<Value> also 16 bytes with no discriminant overhead

MVCC Memory Footprint Reduction

  • Remove row_id from RowVersion (8 bytes saved per version)
  • Remove chain_depth from VersionChainEntry (8 bytes saved)
  • Use NonZeroU64/NonZeroUsize for arena indices (RowIndex: 24 → 16 bytes)
  • V2 persistence format without redundant row_id field (backward compatible)

Subquery Caching (5.3x faster)

  • Cross-query subquery caching with table-based invalidation
  • Cache entries track referenced tables for selective invalidation on DML

Bug Fixes

  • Fix double-free in CowBTree internal node merge
  • Fix dirty read vulnerability: Database::clone() now creates independent executor
  • Fix join projection column ordering with ColumnSource enum
  • Fix ORDER BY + LIMIT with Hash index fallback Thanks to @nhansiromeo
  • Fix FULL OUTER JOIN null row handling in nested loop join

Other Changes

  • Migrate Value-keyed maps to AHash for HashDoS resistance
  • Simplify Row storage from 3 variants to 2 (Shared/Owned)
  • Add LRU bounds to thread-local caches (scalar/IN subquery: 128, semi-join: 256)
  • Add comprehensive SAFETY comments to unsafe blocks
  • Panic safety fixes in CompactVec (clone, extend, from_iter)

Full Changelog: v0.2.3...v0.2.4

v0.2.3

17 Jan 10:52

Choose a tag to compare

What's New in v0.2.3

SmartString - Custom SSO String

  • Inline storage for strings ≤22 bytes (no heap allocation)
  • Owned (Box<str>) for computed values, Shared (Arc<str>) for cloned values
  • 24-byte size with O(1) clone for shared variant

I64Map - High-Performance Hashmap for i64 Keys

  • Uses i64::MIN as empty sentinel (row/txn IDs are always ≥0)
  • FxHash-based with linear probing and backward-shift deletion
  • ~45% faster lookups than FxHashMap<i64, V>

Value Interning

  • Interned: NULL (all 7 DataTypes), booleans, integers 0-1000
  • Reduces allocations for frequently used values in rows

Aggregation Fixes & Optimizations

  • Fixed hash collision bug in single-column GROUP BY (was using hash as key)
  • Added 2-column tuple optimization (30% faster than Vec<Value>)
  • Single-column primitive GROUP BY uses I64Map directly

SIMD Pattern Matching

  • memchr::memmem for LIKE '%pattern%' substring search
  • Pre-compiled Finder stored in CompiledPattern

O(1) COUNT(*) and COUNT(DISTINCT) Fast Paths

  • Committed row count atomic counter for O(1) COUNT(*)
  • get_distinct_count_excluding_null() in Index trait for O(1) COUNT(DISTINCT)
  • Compiled query cache for COUNT queries with schema epoch validation
  • Performance: COUNT(*) ~22.5µs → ~2.9µs (7.7x faster)

Background Cleanup and Memory Management

  • Background cleanup thread for periodic garbage collection
  • Configurable via CleanupConfig (interval, retention periods)
  • Arena slot reuse to prevent unbounded memory growth
  • Memory at program end: 665 KB → 153.6 KB (77% reduction)

Global Pool and Cache Cleanup

  • clear_version_map_pools() for TransactionVersions pools
  • clear_program_cache() for expression bytecode LRU cache
  • clear_classification_cache() for query classification LRU cache

Other Improvements

  • CompactVec: new insert(), retain(), drain() methods
  • All indexes migrated to I64Map and CompactVec
  • Index trait _into methods to avoid allocations
  • RowIdVec pooled vector for index lookups (256K max cached capacity)
  • BTree index optimization using Borrow trait for lookups
  • Thorough SAFETY documentation on unsafe code

Full Changelog: v0.2.2...v0.2.3