QuestDB Release Notes

April 13, 2026

This release introduces per-column bloom filter configuration, allowing fine-grained control over bloom filter acceleration through CREATE TABLE and ALTER TABLE statements. It also includes bug fixes for a rare backup failure caused by empty partitions and a rare replica crash during graceful shutdown.

New Features

Bug Fixes

This fix addresses a JVM crash (SIGSEGV) that could occur when shutting down a replica node while WAL replication is active. The crash happened when a WAL download callback wrote to registry memory that had already been released during engine shutdown, representing a race condition more likely to trigger under load when WAL downloads were actively registering tables at the moment the server was stopped. The fix corrects shutdown ordering in EntCairoEngine.close() to ensure all replication callbacks complete before the engine releases its resources. Primary-only nodes are not affected.

QuestDB 9.3.5 introduces lateral joins, SQL-standard UNNEST, statistical window functions, and corrects SAMPLE BY timezone handling during DST transitions. It also delivers multiple join performance improvements and important Parquet export fixes.

Breaking Changes

This fix introduces a new timestamp_floor_utc function that floors timestamps in local time and converts back to UTC internally, replacing the previous approach of wrapping the query in an extra to_utc() conversion model. For sub-day strides, it uses the standard (non-DST) timezone offset to keep bucket widths uniform in UTC space, avoiding ambiguity during fall-back transitions. DST fall-back (clocks go back) now produces two rows instead of one for the repeated local hour — previously both passes through the repeated hour were merged into a single bucket. Output timestamps can be non-monotonic in local time during fall-back, though the underlying UTC timestamps remain monotonic. Sub-day bucket boundaries are now uniform in UTC rather than in local time. These changes do not affect queries without TIME ZONE, queries with fixed-offset timezones, or super-day strides (day, week, month, year). The fix also corrects an offset-in-floor-anchor bug where DST-aware and fixed-offset paths produced different bucket assignments, fixes the materialized view refresh iterator for sub-day DST timezones with offset, and resolves a native memory leak in FillRangeRecordCursorFactory.

New Features

Users can now declare bloom filters as part of column metadata via the existing PARQUET() clause in CREATE TABLE and ALTER TABLE statements. Previously, bloom filter columns had to be specified each time a partition was converted to parquet. The PARQUET() clause accepts an optional trailing BLOOM_FILTER keyword, which can be used as the sole argument, combined with encoding, or with both encoding and compression (e.g., PARQUET(DELTA_BINARY_PACKED, ZSTD(3), BLOOM_FILTER)). The bloom filter flag is stored in bit 25 of the existing 32-bit parquetEncodingConfig field, requiring no on-disk format change. When convertPartitionNativeToParquet runs without an explicit bloom_filter_columns override, the TableWriter scans per-column metadata for the bloom filter flag automatically. An explicit bloom_filter_columns in CONVERT PARTITION WITH(...) still overrides metadata flags. SHOW CREATE TABLE renders the flag in lowercase (bloom_filter). SET PARQUET(...) replaces the entire parquet config for the column, so users must re-specify BLOOM_FILTER when changing only the encoding. FPP (false positive probability) remains a global setting (partition.encoder.parquet.bloom.filter.fpp) and is not configurable per-column. The vendored parquet-format-safe crate is patched to include bloom_filter_length on ColumnMetaData, improving compatibility with readers that rely on this field to locate bloom filter boundaries.
This feature introduces a full suite of statistical aggregate window functions. STDDEV_POP(), STDDEV_SAMP(), and STDDEV() compute standard deviation, while VAR_POP(), VAR_SAMP(), and VARIANCE() compute variance, reusing the standard deviation base via an isSqrt flag. COVAR_POP(), COVAR_SAMP(), and CORR() provide bivariate statistical analysis through a new bivariate abstract base class. All functions support all frame modes including ROWS, RANGE, PARTITION BY, and unbounded/bounded frames, as well as two-pass whole-partition variants. Non-removable frames use Welford's online algorithm for numerical stability. STDDEV is an alias for STDDEV_SAMP(), and VARIANCE is an alias for VAR_SAMP(). A fix to FunctionParser was also included to correctly resolve group-by vs window factory when an argument requires implicit cast.
This feature introduces a new server property cairo.metadata.cache.snapshot.ordered (default: false) that, when enabled, causes all table-listing functions to return rows sorted alphabetically by table name. The sort is maintained incrementally inside a new CharSequenceObjSortedHashMap data structure backed by a CharSequenceSortedList, which inserts at the correct sorted position using binary search. This means lookup remains O(1) via hash and ordered iteration is O(n), with no post-query sort step required. A new CharSequenceObjMap interface allows callers to hold either the sorted or unsorted implementation transparently. The feature applies to SHOW TABLES, all_tables(), and related table-listing functions, and correctly handles drop-and-recreate as well as rename scenarios.
Tick expressions like [2024-01, 2024-02]T09:30@America/New_York#workday;6h29m now work without requiring explicit day ranges ([2024-01-[01..31]]). The compiler detects month-level and year-level date elements with time override suffixes and expands them to individual days before applying the time. This also works with bracket ranges (2024-[01..02]T09:30), time list brackets (T[09:00,14:00]), bare expressions without brackets (2024-01T09:30), year-level dates ([2024]T09:30), and duration plus day filter combinations ([2024-01]#workday;6h29m). Month validation was added before getDaysPerMonth to prevent ArrayIndexOutOfBoundsException on invalid month values. A parseMonthLevelDate helper was extracted to deduplicate YYYY-MM parsing and validation. Heap allocation of long[] dayStarts was eliminated by compacting day starts to the front of the existing LongList and reusing positions after them for per-day parsing.
This feature introduces UNNEST as a FROM-clause operator supporting all typed array column types, multiple arrays with NULL padding, and WITH ORDINALITY. For JSON arrays, a COLUMNS(name TYPE, ...) syntax declares output columns from VARCHAR-stored JSON arrays. Supported JSON column types include DOUBLE, LONG, INT, SHORT, BOOLEAN, VARCHAR, and TIMESTAMP. UnnestRecordCursorFactory wraps the base factory and emits one output row per array or JSON element. The UnnestSource interface abstracts the difference between typed arrays (ArrayUnnestSource) and JSON arrays (JsonUnnestSource), keeping cursor and record logic shared. JSON field extraction uses simdjson via JSON Pointer queries, and a native truncated flag detects values exceeding the 4KB extraction limit, throwing an error instead of silently truncating. Example usage: SELECT u.price, u.name FROM events e, UNNEST(e.payload COLUMNS(price DOUBLE, name VARCHAR)) u. Typed array and JSON sources can be mixed in the same UNNEST call.
This feature adds lateral join support, allowing subqueries in the FROM clause to reference columns from preceding tables. Correlated lateral subqueries are decorrelated at the optimizer into standard joins, enabling set-based execution instead of per-row nested-loop evaluation. The decorrelation technique is based on the Neumann and Kemper "Unnesting Arbitrary Queries" approach. The rewriter (LateralJoinRewriter) runs three passes during SQL optimization: correlation analysis that tags literals referencing outer tables, decorrelation that builds deduplicated outer-reference subqueries and rewrites correlated references, and an elimination pass that attempts to remove outer-reference join models when all correlations resolve to equalities. Operator-specific compensation preserves semantics for GROUP BY, SAMPLE BY, window functions, DISTINCT, LATEST BY, LIMIT, and set operations. Supported syntax includes JOIN LATERAL for inner lateral joins, LEFT JOIN LATERAL for left lateral joins with NULL fill, and standalone LATERAL for implicit cross joins. Example: SELECT o.id, t.total FROM orders o JOIN LATERAL (SELECT sum(qty) AS total FROM trades WHERE order_id = o.id) t.
ALTER TABLE ADD COLUMN and ALTER TABLE DROP COLUMN now work correctly on tables whose partitions have been converted to Parquet format. Previously, schema changes on Parquet-stored partitions could produce corrupted files or query errors because the Parquet column layout was assumed to match the current table schema. Added columns read as all-NULL at read time, while dropped columns' data remains in the file until an O3 merge rewrites the affected row groups. During O3 merge, rewritten row groups reflect the current schema: added columns get all-NULL chunks, and dropped columns are omitted. Bitmap indexes are rebuilt when converting between native and Parquet formats if the index files are missing or stale. The WAL writer correctly marks new symbol columns as nullable when uncommitted rows exist at the time of ADD COLUMN.
This feature extends HORIZON JOIN to accept multiple right-hand-side (slave) tables in a single query, enabling users to aggregate columns from several time-series sources against a common master table and offset grid in one statement. Both single-threaded and parallel (page-frame-based) execution paths are supported for multi-slave HORIZON JOIN, including keyed (ON symbol) and non-keyed (timestamp-only ASOF) variants, as well as mixed keyed/non-keyed slaves within the same query. The last HORIZON JOIN in the chain carries the RANGE/LIST and AS clauses; preceding HORIZON JOIN clauses omit them. For example: SELECT avg(b.bid) AS avg_bid, avg(a.ask) AS avg_ask FROM trades AS t HORIZON JOIN bids AS b ON (t.sym = b.sym) HORIZON JOIN asks AS a ON (t.sym = a.sym) LIST (-2s, 0, 2s) AS h GROUP BY h.offset. Internal refactoring replaces LongList offsets with long[] to avoid on-heap allocations on the hot path, and replaces the always-forward scan strategy with an adaptive backward/forward approach controlled by configurable thresholds (cairo.sql.horizon.join.bwd.scan.* properties).
This feature adds RLE dictionary decoding for STRING columns in the Parquet reader by wiring the existing BaseVarDictDecoder, RleDictionarySlicer, and StringColumnSink together in decode_byte_array_dispatch. The default writer encoding for VARCHAR columns has been changed from RleDictionary to DeltaLengthByteArray, aligning it with STRING and Binary defaults.
This feature introduces the arg_max(varchar, key) aggregate function for key types: timestamp, double, long, and int. The function returns the varchar value corresponding to the row where the key column reaches its maximum. Each variant supports parallel group-by execution (Async Group By) with per-worker function cloning and pointer-based merge, using StableAwareUtf8StringHolder for efficient off-heap varchar storage that avoids copying on every new max. NULL key rows are skipped and do not affect the result, while a NULL varchar value at the max key is correctly returned as NULL.

Improvements

This improvement extends the convertSymbolJoinKeysToInt optimization, previously used only by AsOf/LT joins, to hash joins. Static SYMBOL-to-SYMBOL key pairs are now compared as integers rather than strings, with a SymbolTranslatingRecord wrapping the build-side record to translate build-side symbol IDs into the probe-side encoding. Translation runs only on the build side, so the probe side stays on a straight getInt() path with zero added overhead. All six hash join variants (inner/left/right/full outer × light/full-fat) are covered.
Queries on wide tables that reference only a subset of columns now open partitions faster. TableReader tracks which columns the current query needs via a BitSet and skips memory-mapping inactive columns when opening partitions, reducing mmap/munmap syscall overhead proportionally to the number of unreferenced columns. For a query touching 1 column out of 100, roughly 99% of mmap calls are avoided per partition open. The active column set is deduplicated and falls back to mapping all columns when the list is null, empty, or covers every column. When the active set broadens, already-open partitions map any newly-needed columns. On pool return, goPassive() clears the active column state so subsequent opens map all columns. When no active columns are set (Parquet export, direct callers), the reader maps all columns as before.
This improvement eliminates eager opening of all partitions at query start for ASOF JOIN, HORIZON JOIN, and WINDOW JOIN queries on large slave tables. Time frame cursors now pre-compute exact page frame boundaries from table metadata (column tops, row counts, partition formats) without opening partitions. Actual partition opening happens lazily on first access via ensurePartitionOpened(), so queries targeting a narrow time range skip I/O for untouched partitions. Both single-threaded (TimeFrameCursorImpl) and concurrent (ConcurrentTimeFrameState + ConcurrentTimeFrameCursorImpl) cursors follow a two-phase approach: an upfront phase that pre-computes page frame boundaries from partition metadata stored as UninitializedPageFrame entries, and a lazy phase that opens partitions and patches zero-address entries with real mmap addresses. Fallback paths that bypass lazy opening include partitions already open in the table reader, cursors with interval filters, and Parquet partitions. The concurrent path uses double-checked locking with AtomicIntegerArray for safe partition opening across worker cursors. Queries accessing a subset of partitions avoid opening irrelevant partitions entirely, with zero-GC on the hot path.
Previously, calculateInsertTransactionBlock() excluded the last loaded INSERT from the transaction block, forcing it to be processed alone. This improvement includes it in the block, allowing consecutive INSERT operations to be committed together via processWalCommitBlock() instead of one-by-one via processWalCommit(). The previous behavior caused unnecessary LAG usage on empty tables (e.g., post-TRUNCATE with few rows), creating artificial 0-row partitions that could race with backup compression. The root cause was a backward loop that assigned FORCE_FULL_COMMIT to the last loaded INSERT, and the block calculation loop broke before incrementing blockSize, excluding the transaction. The fix increments blockSize before the break, which is safe because the backward loop propagates FORCE_FULL_COMMIT from structural changes (ALTER, TRUNCATE) to the INSERT before them, so the block loop never reaches a non-data transaction.
ASOF JOIN and LT JOIN factories with multi-key symbol columns now compare join keys as integers instead of converting symbol IDs to strings. SymbolTranslatingRecord translates master symbol IDs to slave symbol IDs via a cached IntIntHashMap, enabling integer-based map lookups and memeq() comparisons instead of variable-length string hashing and comparison. This optimization applies to AsOfJoinFastRecordCursorFactory, AsOfJoinDenseRecordCursorFactory, AsOfJoinLightRecordCursorFactory, FilteredAsOfJoinFastRecordCursorFactory, and LtJoinLightRecordCursorFactory. Single-symbol joins retain their existing specialized paths. SymbolTranslatingRecord gains a hadNonExistentKey() flag set during getInt() calls, eliminating a separate hasNonExistentKey() pre-scan. Benchmarks on a 100M master / 50M slave row dataset show multi-key ASOF JOIN improving from 8.11s to 5.2s.
This improvement speeds up constant-index element access on 1D and 2D double array columns (e.g. arr[1], arr[3], arr[2,1]) by bypassing full ArrayView construction. The hot path now reads directly from AUX/data pages via Unsafe, skipping IntList operations, stride computation, and BorrowedFlatArrayView setup that BorrowedArray.of() performs per row. A new Record.getArrayDouble1d2d() method with an optimized override in PageFrameMemoryRecord takes two zero-based indices and dispatches internally between the 1D and 2D paths with a single Unsafe.getDouble return. DoubleArrayAccessFunctionFactory detects constant positive indices on 1D/2D column functions at compile time and routes through the fast path. Negative indices and 3D+ arrays fall through to the existing getArray path. Benchmarks on a 10M-row table show approximately 2.4x speedup for queries like SELECT sum(arr[1]) FROM t.
This improvement adds a table selector to the table details drawer header when there is no table details history, reusing the same selector as the metrics table selector. It also includes styling updates for the metrics header (table name, title, actions) and layout updates to the metrics dashboard.

Bug Fixes

After DROP COLUMN, a column's reader index (dense position in the live column list) diverges from its writer index (permanent ID stored as field_id in parquet files). Several code paths confused the two, producing type mismatch errors, corrupt bitmap indexes, or suspended WAL tables. This fix addresses five distinct issues: bitmap index rebuild using the wrong parquet column during checkpoint/backup recovery; a doubled parquet file path that silently skipped bitmap index rebuilds; a race condition where parallel parquet bitmap rebuild tasks read stale metadata when recovering multiple tables; COPY TO parquet export flagging the wrong column as designated timestamp after DROP COLUMN; and ALTER TABLE ALTER COLUMN TYPE crashing on tables with parquet partitions by converting parquet partitions back to native before the column type conversion starts.
This fix allows signed duration segments when parsing and compiling tick expressions (e.g., timestamp in '2026-01;-3d'). Intervals are now normalized when durations move backwards so that static and compiled paths produce matching results. The compiled path in emitSingleVar previously produced inverted intervals for non-day-level variables like $now;-1h. Additionally, addYears day-of-month clamping was corrected in Micros, Nanos, and Dates so that Feb 29 + 1y now correctly produces Feb 28 instead of Mar 1. Calendar drift in repeating month/year intervals is fixed by computing each interval from the base timestamp instead of iterating. Non-positive counts in repeating interval syntax are now rejected with a clear error message.
The projection self-reference check in doReplaceLiteral0() only inspected the first join model, missing columns from other join models. This caused "Invalid column" errors when a function wraps a column from a joined table (e.g., coalesce(c, 0) where c comes from a secondary join model) and table-prefixed columns make the translating model non-redundant. This fix replaces the single-model check with columnNotExistsInJoinModels() that scans all join models.
This fix addresses three related issues. First, CopyExportRequestTask incorrectly treated var-size columns as fully NULL when their data was entirely inlined into the aux vector, because the colTop heuristic checked the data page address (which is legitimately 0 for inlined varchars). It now checks the aux page address for var-size columns. Second, the streaming Parquet reader now supports multiple dictionary pages per column chunk by allocating a fresh buffer per varchar-slice dict page, preventing aux entries decoded against earlier dict pages from being corrupted when later ones are decompressed. Third, the ASCII flag is now normalized when serializing VARCHAR keys in SingleRecordSink and OrderedMap, so two equal varchars whose ASCII-flag provenance differs hash and compare as the same key. Without this normalization, GROUP BY, DISTINCT, and hash-join operations could treat visually identical values as distinct groups.
This fix corrects a bug in convertSymbolJoinKeysToInt() where writeSymbolAsString bits were unset while iterating over join key columns. When master and slave symbol columns had crossing indices (e.g., master.symbol1=col2, slave.symbol2=col2), the first pair's unset operation cleared the bits needed by the second pair, leaving keyTypes as STRING while the RecordSink generated getInt() code. This type mismatch produced garbled map keys and missed matches.
When DirectLongLongHashMap.restoreInitialCapacity() or rehash() encountered an allocation failure (OOM or RSS memory limit exceeded), the map was left in an inconsistent state where capacity > 0 but ptr = 0. A subsequent FastGroupByAllocator._close() call would iterate over the capacity and dereference the null pointer, causing a SIGSEGV. This fix moves capacity, mask, and ptr updates to after the allocation succeeds, so a failed malloc leaves the map in its prior consistent state.
The ALTER TABLE ALTER COLUMN TYPE statement now supports converting between DECIMAL and VARCHAR/STRING types. The type conversion validation matrix in SqlCompilerImpl was updated to include these entries. Four new conversion methods handle the data transformation: DECIMAL to VARCHAR/STRING reads each value through a loader, sets the scale on a thread-local Decimal256, and writes the string representation; VARCHAR/STRING to DECIMAL parses each string through Decimal256.ofString() and stores the result. NULL values round-trip correctly in both directions. All six DECIMAL storage sizes (DECIMAL8 through DECIMAL256) are supported for both conversion directions. The fix also corrects VARCHAR aux-vector sizing in the shared fixed-to-VARCHAR conversion path.
This fix resolves an issue where clients using the PostgreSQL extended query protocol (such as postgres.js) to iterate cursors in batches via unnamed portals would receive a "spurious execute message" error on the second Execute. The root cause was that after a Flush, the suspended cursor was freed for unnamed portals, the factory was moved into the cache, and the pipeline entry was released to the object pool, leaving pipelineCurrentEntry null when the next Execute arrived. The fix retains suspended unnamed portal pipeline entries across Flush/Sync so the next Execute can resume iteration, adds a stateSuspended flag to track portal suspension state accurately, and frees suspended cursors on abandonment (new Parse/Bind) or explicit Close. Named portals (used by JDBC's setFetchSize) are unaffected as they already survived via the namedPortals map.
Go's JSON marshaller omits the fractional part entirely when a time value has zero sub-second precision, producing values like 2026-03-31T09:02:28Z instead of 2026-03-31T09:02:28.000000000Z. This broke QuestDB timestamp patterns such as .U+ and .N+, which previously required a dot and at least one digit. This fix normalizes the fractional component into optional fraction opcodes in the micros and nanos timestamp compilers so .U+ and .N+ accept either fractional digits or no fractional part at all, while still rejecting a bare dot. Both the generic and ASM parser paths share the relaxed behavior.
This fix resolves an issue where copyOrRebuildColumnIndexes() skipped columns only when colTop == -1, but rebuildPartitionIndexFiles() also skips when colTop >= partitionRowCount (column has no data). When a symbol column was added after a Parquet partition existed, O3 merge set colTop == partitionRowCount. The Parquet-to-native conversion correctly skipped building index files, but the subsequent native-to-Parquet conversion tried to hard-link the non-existent .k/.v files, suspending the table. The skip condition in copyOrRebuildColumnIndexes() was aligned with rebuildPartitionIndexFiles() by also skipping when colTop >= partitionRowCount.
When a backup checkpoint was in progress, the backup read partition data and column files from the live database directory. If partition directories or column version files were deleted while the checkpoint was snapshotting metadata, the backup failed because it tried to read files that no longer existed. This fix adds checkpoint guards to all synchronous and asynchronous file removal paths to prevent deletion while a checkpoint is active. TableWriter.processPartitionRemoveCandidates0 now checks isInProgress() and defers old partition directory removal to async purge when a checkpoint is active, while rollback orphans bypass the guard since they are not in any committed snapshot. TableWriter.finishColumnPurge forces async-only column version purge during checkpoint to prevent synchronous deletion of .d/.i files. The same checkpoint guard is applied for column purge triggered by ALTER COLUMN CONVERT and UPDATE operations. The async purge job (O3PartitionPurgeJob and ColumnPurgeOperator) also skips file deletion during checkpoint, closing the race where the purge job runs before the checkpoint pins a table's scoreboard.
This fix resolves an issue where CopyExportFactory.getCursor() enqueues export work onto a ring queue that a worker thread picks up, but BaseParquetExporter.of() constructed the worker's SqlExecutionContext with a null BindVariableService. This caused a "bind variable service is not provided" error when bind variables were used in the COPY subquery (e.g. COPY (SELECT * FROM t WHERE ts <= $1) TO ...). The fix threads the caller's BindVariableService through CopyExportRequestTask so the worker's execution context receives it. Bind variable values are deep-copied via BindVariableServiceImpl.snapshot() before enqueueing the task, since the PostgreSQL Wire Protocol thread may clear or repopulate the BindVariableService for the next query before the worker starts. The HTTP export path keeps the direct reference since it runs synchronously on the same thread.
This fix resolves a NullPointerException that occurred when WalPurgeJob attempted to fetch metadata for a table that was concurrently dropped. A concurrent drop could close the metadata pool tenant's txFile during refresh, causing a NullPointerException rather than a CairoException. The fix updates WalPurgeJob.fetchSequencerPairs() to catch Throwable (not just CairoException) when getTableMetadata() fails, and similarly updates AbstractMultiTenantPool.get0() in the newTenant path so the pool slot is always released on failure.
This fix corrects an off-by-one error in O3PartitionJob.mergeRowGroup() that could produce row groups exceeding the 1.5x maxRowGroupSize limit by 1 row. The numChunks formula used 3 * maxRowGroupSize as the divisor, representing 2 * (1.5 * maxRowGroupSize) in real arithmetic. When maxRowGroupSize is odd, integer truncation causes a mismatch: 3 * 551 = 1653 vs 2 * (551 + 551/2) = 2 * 826 = 1652. This difference meant that for certain merge totals, the formula computed fewer chunks than needed, and the even-distribution logic gave the first chunk more rows than the 1.5x limit allowed. The fix computes numChunks from the integer-exact maxChunkTarget (maxRowGroupSize + maxRowGroupSize / 2), ensuring no chunk exceeds the 1.5x bound after remainder distribution.
The validateWindowJoins function only recursed into nestedModel and unionModel, missing WINDOW JOIN definitions inside join model subqueries. This left WindowJoinContext uninitialized (lo/hi not computed), causing aggregates to return error results when a WINDOW JOIN appeared under another join, such as inside a subquery used as a JOIN source.
The propagateTopDownColumns0() method emits join-condition columns to each join model, but the expression nodes carried the outer query's table aliases. When the join model was a view, those aliases could not be resolved within the view's own scope, so addTopDownColumn silently dropped them. The optimizer then pruned the column from the view's output, causing an InvalidColumnException at code-generation time when the join tried to look up the missing column. This fix switches to using plain column names without table-alias prefixes, which can be resolved in the join model's alias map.
This fix replaces header click-to-insert with per-column copy buttons and tracks grid selection state to disable "move column to front" when nothing is selected. It resolves Monaco context menu being clipped by overflow containers, consolidates DDL queries to strip redundant blank lines, and adds truncation for long DDL in the table details drawer. Format shortcuts are rebound to Alt+Shift+F / Alt+F, toolbar and result bar spacing is tightened, long materialized view DDLs are truncated, and the LiteEditor icon now switches based on diff vs regular mode.

March 26, 2026

This release introduces new SQL capabilities including the COPY PERMISSIONS statement for cloning access profiles between principals, per-column Parquet encoding and compression controls, and dynamic window ranges in WINDOW JOIN queries. Performance improvements span faster ORDER BY on TCA queries, expanded vectorized GROUP BY coverage, and improved Parquet I/O throughput. The release also addresses several critical bug fixes including data corruption in DECIMAL columns, JVM crashes during backup coordination, and WAL replication race conditions.

New Features

This feature introduces three new permission constants: CONVERT PARTITION TO PARQUET, CONVERT PARTITION TO NATIVE, and SET PARQUET ENCODING. Authorization checks are wired into all enterprise SecurityContext implementations, including EntSecurityContextBase, AdminSecurityContext, and AbstractReplicaSecurityContext. The PermissionParser was updated to handle multi-word permission names containing SQL keywords (such as TO in CONVERT PARTITION TO PARQUET) by adding isPermissionPrefix() to disambiguate keywords from permission name continuations. All three operations are denied on replica security contexts. Previously, these ALTER TABLE operations piggybacked on authorizeAlterTableAlterColumnType and could not be granted or revoked independently.

Bug Fixes

The WAL uploader could enter an infinite retry loop when it encountered a segment whose upload.pending marker file was never created. This occurred when WalWriter.openNewSegment() advanced minSegmentLocked even when createSegmentDir() threw due to a transient disk error. When the distressed WalWriter was later closed, a segmentClosed event fired for a segment that never had its upload.pending file created, causing the Rust uploader to retry deleting the non-existent file indefinitely. This fix makes clear_segment_pending_file in the Rust uploader treat ENOENT as success, since the desired end state — upload.pending absent — is already achieved. This is safe with respect to the WAL cleaner because phantom segments are harmless: object stores return success when deleting non-existent keys, and the cleaner's compaction watermark has gap protection that prevents it from skipping real segments.
In QuestDB Enterprise, exporting to Parquet via COPY could produce spurious "cancelled by user" errors. The SecurityCheckFactory was inserted as a RecordCursorFactory wrapper for all SELECT queries, even though it only mattered for UPDATE queries where it revalidated column-level permissions. This broke an assumption that the outermost factory for a SELECT is always QueryProgress, causing the code to fail to unwrap the factory chain correctly. QueryProgress then interacted badly with the COPY job, triggering the spurious errors. This fix moves authorization revalidation directly into UpdateOperation, eliminating the need for the SecurityCheckFactory wrapper.
After a server restart, collect_missed_segments walks WAL directories to find segments with upload.pending files left over from a previous run. Because the walk is processed asynchronously, it could encounter WAL directories created after the bounce and queue them as "closed" with a stale last_txn set to the sequencer's startup transaction. This caused the upload.pending clearing loop to remove the file prematurely, after which WalPurgeJob deleted the unprotected segment. The uploader, still needing the segment, entered an infinite retry loop and the table's replication became permanently stuck. This fix passes current_wal_id into collect_missed_segments and skips any segment where wal_id > current_wal_id. Since current_wal_id is read from _wal_index.d before ingestion starts, it reliably represents the last pre-bounce WAL ID. Post-bounce segments are handled through the normal SegmentClosed event flow and do not need recovery via collect_missed_segments.
Without this fix, revoking a user's ALTER or UPDATE permission did not take effect for already-cached prepared statements in the PostgreSQL Wire Protocol, allowing the user to continue executing those operations until the connection was closed. Previously, ALTER statements were not cached before the PostgreSQL Wire Protocol layer rewrite, so authorizing them at compile time only was sufficient. Since the PostgreSQL Wire Protocol layer can now cache these commands, this fix ensures that authorization happens at execution time as well.

QuestDB 9.3.4 delivers dynamic windows in WINDOW JOIN, Parquet row group pruning with bloom filters, new array functions, and significant performance improvements across ORDER BY, joins, and Parquet I/O.

Breaking Changes

This change improves query execution performance by folding constant sub-expressions at compile time rather than re-evaluating them on every row. For example, x < '2026-01-01'::timestamp previously kept the cast as an unevaluated function that executed per row, but now evaluates it once at compile time. Additionally, constant reassociation regroups constants of the same associative operator into a single subtree (e.g., col + 1 + 4 becomes col + 5). As a breaking change, float and double Infinity and -Infinity values are now treated as NULL at compile time. Previously, expressions like cast('Infinity' as float) remained unevaluated at runtime and returned Float.POSITIVE_INFINITY, but with constant folding these are now evaluated through FloatConstant.newInstance() / DoubleConstant.newInstance(), which collapses Infinity, -Infinity, and NaN to NULL per QuestDB's convention. CASE expressions can no longer branch on Infinity or -Infinity as distinct float/double values.
This change aligns STRING and SYMBOL ordering with the SQL standard byte-order collation and enables range filter pushdown for string columns in Parquet row group pruning. Any code relying on the previous UTF-16 code unit comparison order for string comparisons may observe different ordering results.
This change improves varchar decoding performance from parquet files by introducing a new internal column type: varchar slice. VarcharSlice aux entries store (length, pointer) pairs pointing directly into mmapped Parquet pages or per-page decompression buffers, eliminating byte copies on the read path. The default encoding for Varchar has been changed from Delta Length Byte Array to RLE Dictionary. Benchmarks show decode-only performance improving from ~4.25ms to ~0.75ms for short strings (8 bytes) at 500K rows with cardinality 256.

New Features

This feature adds minTimestamp and maxTimestamp TIMESTAMP columns to sys.telemetry_wal to capture the data timestamp range per WAL transaction event. WAL telemetry is now enabled by default regardless of the main telemetry setting, reducing the dependency on logs to investigate data writing shape. WalWriter commit log messages have been downgraded from info to debug level unless the commit has a replace range. Schema migration support was also added: when the column count mismatches the expected schema, the table is dropped and recreated, which is safe given the 1-week TTL.
This feature introduces array_sort(DOUBLE[]) and array_reverse(DOUBLE[]) scalar functions that operate on double arrays of any dimensionality. array_sort sorts each innermost-dimension slice independently, preserving the array's shape, and accepts optional boolean arguments for descending order and nulls-first placement. array_reverse reverses each innermost-dimension slice. Both functions handle NULL arrays, empty arrays, NaN values, and multidimensional inputs, and support both contiguous unit-stride and non-vanilla array layouts via separate code paths. The internal sort buffer grows on demand and stays at peak size for the cursor's lifetime to avoid allocation churn on the hot path.
This feature introduces four new SQL functions that operate element-wise across DOUBLE[] arrays. Each function works in two modes: variadic (two or more array arguments, per-row) and aggregate (single array column, GROUP BY / SAMPLE BY). The functions support full N-dimensional arrays with automatic shape broadcasting, where the output shape is the per-dimension maximum of all inputs. NULL arrays and NaN elements are skipped, and positions that receive no finite values yield null. array_elem_sum() and array_elem_avg() use Kahan compensated summation for floating-point accuracy. The GROUP BY average variant uses a uniform/variable dual-mode count tracker to avoid per-element count allocation in the common case. Parallel GROUP BY is supported via merge().
Queries on Parquet partitions can now skip entire row groups that contain no matching rows. Row group pruning uses three strategies: min/max statistics (row groups whose per-column value range does not overlap the filter values are skipped), bloom filters (row groups whose bloom filter reports no match are skipped, opt-in via the bloom_filter_columns option), and null count statistics (for IS NULL and IS NOT NULL filters). Pruning applies to all Parquet read paths including forward scan, backward scan, read_parquet(), and parallel page frame execution. Supported filter operations include equality, IN list, comparison operators (<, <=, >, >=), BETWEEN, IS NULL, IS NOT NULL, and OR-connected equalities on the same column. Supported column types include BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, TIMESTAMP, DATE, IPv4, UUID, LONG128, STRING, SYMBOL, VARCHAR, and all DECIMAL widths. Bloom filter columns and false positive probability (FPP) can be specified via ALTER TABLE ... CONVERT PARTITION TO PARQUET WITH (bloom_filter_columns = 'col1,col2', fpp = 0.01), COPY TO export, or the /exp HTTP endpoint. New configuration properties include cairo.sql.parquet.row.group.pruning.enabled (default true), cairo.partition.encoder.parquet.bloom.filter.fpp (default 0.01), cairo.parquet.export.bloom.filter.fpp (default 0.01), and cairo.parquet.export.statistics.enabled (default true).
This feature adds arg_max() and arg_min() function variants that accept a CHAR type for the value argument. Supported signatures include arg_max(char, timestamp), arg_max(char, long), arg_max(char, double), and the corresponding arg_min() variants. Null keys are ignored during aggregation, while null values are returned when the corresponding key is the max/min. All variants support parallel execution. Example usage: SELECT arg_max(status, created_at), arg_min(status, created_at) FROM events;
This feature allows WINDOW JOIN to accept column references and expressions as RANGE BETWEEN boundaries, in addition to static constants, enabling each left-hand-side row to define its own window size based on its data. For example: SELECT t.ts, sum(d.val) AS agg FROM fx_trades t WINDOW JOIN market_data d RANGE BETWEEN t.lookback minutes PRECEDING AND t.lookahead minutes FOLLOWING. Either or both of the lo/hi boundaries can be dynamic while the other remains a static constant. Boundary expressions must evaluate to an integer and must only reference left-hand-side table columns. Negative values are clamped to zero (equivalent to CURRENT ROW), and NULL values produce NULL aggregates where the row is skipped. Each boundary can optionally include a time unit suffix (seconds, minutes, hours, etc.), and when present the value is scaled to the table's timestamp resolution at runtime. Dynamic windows disable the fast symbol-keyed and vectorized execution paths; queries with an ON key equality clause fall back to the general path with a join filter instead. This feature also makes parallel HORIZON JOIN and WINDOW JOIN queries more responsive to query cancellation by adding circuit breaker checks at the top of every master-row iteration loop.
O3 commits into Parquet partitions previously replaced row groups in-place, leaving orphaned bytes in the file. Repeated merges caused file sizes to grow to 2-3x their useful data. This improvement adds a rewrite mode that periodically writes all data to a fresh file, eliminating dead space. An unused_bytes counter tracked in Parquet metadata drives the rewrite decision. Rewrite triggers when the file has a single row group, when unused_bytes / file_size exceeds a configurable ratio (default 0.5), or when absolute unused bytes exceeds a threshold (default 1 GB). In rewrite mode, untouched row groups are raw-copied with adjusted thrift offsets without decode/re-encode. A new O3ParquetMergeStrategy class computes merge/copy/split actions up-front using min/max timestamp overlap detection, replacing the old iterative merge loop. Row groups that exceed the configured size get split into multiple output groups. Small row groups (< 4096 rows) adjacent to a gap absorb the gap's O3 data to avoid proliferation of tiny row groups. WAL tables can now convert their last (active) partition to Parquet, with the TableWriter routing all WAL data through the O3 merge path. New configuration properties: cairo.partition.encoder.parquet.o3.rewrite.unused.ratio (default 0.5) and cairo.partition.encoder.parquet.o3.rewrite.unused.max.bytes (default 1g).
Users can now specify Parquet encoding and compression on a per-column basis using CREATE TABLE and ALTER TABLE SQL syntax. The syntax is PARQUET(encoding [, compression[(level)]]), where both encoding and compression are optional — use default for the encoding when specifying compression only. When omitted entirely, the column uses the global defaults. For example: CREATE TABLE sensors (ts TIMESTAMP, temperature DOUBLE PARQUET(rle_dictionary, zstd(3)), device_id VARCHAR PARQUET(default, lz4_raw)) TIMESTAMP(ts) PARTITION BY DAY;. Existing tables can be modified with ALTER TABLE sensors ALTER COLUMN temperature SET PARQUET(rle_dictionary, zstd(3)); or reset with ALTER TABLE sensors ALTER COLUMN temperature DROP PARQUET;. Per-column config appears in SHOW CREATE TABLE output. Supported encodings include plain, rle_dictionary, delta_length_byte_array, and delta_binary_packed, with type-specific restrictions. Supported compression codecs include uncompressed, snappy, gzip (0-9), brotli (0-11), zstd (1-22), and lz4_raw. RLE dictionary encoding is now supported for all column types except Boolean and Array. A new cairo.partition.encoder.parquet.min.compression.ratio configuration property (default 1.2) controls whether compressed pages are worth keeping — when a compressed column chunk fails to meet the ratio threshold, the encoder discards the compressed output and stores it uncompressed instead. Varchar dictionary encoding performance was significantly improved by switching from the default hasher to RapidHashMap and storing indices directly in a Vec<u32>, achieving ~87 Melem/s throughput compared to the original ~13 Melem/s.

Improvements

WalWriter previously hardcoded POSIX_MADV_RANDOM for memory-mapped column files, which hurts most workloads. This improvement makes the madvise hint configurable via the cairo.wal.writer.madvise.mode property with valid values: none (default, no hint), sequential, and random. The random mode is beneficial when ingesting into many tables with many columns, as it prevents the OS from speculatively reading adjacent pages under memory pressure.
This improvement replaces linear scanning with binary search for initial frame positioning in ASOF JOIN, LT JOIN, and WINDOW JOIN operations. The openSlaveFrame() method in AbstractAsOfJoinFastRecordCursor and the findRowLo()/findRowLoWithPrevailing() methods in WindowJoinTimeFrameHelper now call seekEstimate() on the first lookup to binary-search directly to the target partition instead of linearly scanning all preceding frames. This reduces the initial positioning cost from O(N) in the number of frames to O(log P) where P is the number of partitions, mirroring the optimization already present in HORIZON JOIN.
This improvement brings significant performance gains to Parquet decoding, with a median improvement of approximately 87% across column type and encoding combinations. Key optimizations include removing unnecessary allocation and zeroing in the decompression path, skipping definition decoding when null_count=0, improved Plain, DeltaBinaryPacked, Rle, and RleDictionary decoder implementations, a lookup table for boolean unpacking, specialized UUID byte swapping, batch decoding for DeltaBinaryPacked, batched nullable bitmap processing, and bulk copying for PlainPrimitiveDecoder. This improvement also fixes a bug with incorrect null sentinel values.
Vectorized execution is now applied to more non-keyed GROUP BY queries, such as SELECT first(price), last(price) FROM trades. Internally, the GroupByFunction#computeBatch() API is used for all queries without a filter. This API was previously used only in WINDOW JOIN.
This improvement introduces an encoded sort path for ORDER BY queries. SortKeyEncoder encodes column values into fixed-width, order-preserving binary keys (8/16/24/32 bytes), enabling comparisons via native uint64 operations instead of per-column type dispatch. The native sort is a three-layer hybrid: vergesort for detecting natural runs (O(n) for pre-sorted time-series data), MSD radix sort (American Flag Sort) with parallel bucket sorting across available cores via atomic work-stealing for large arrays, and pdqsort for small partitions with heapsort fallback guaranteeing O(n log n) worst-case. The entire native sort path is allocation-free with all data structures stack-allocated. It replaces the R-B tree (RecordTreeChain) with flat array collect plus radix sort for the non-random-access path, and guarantees stable sort by falling back to rowId ordering when all sort key columns are equal. Key size is capped at 32 bytes; ORDER BY clauses exceeding this fall back to the existing R-B tree sort. Benchmarks show a 5-column ORDER BY over 3M rows improving from 6.98s to 1.26s, and a 2-column ORDER BY over 1B rows improving from over 15 minutes to 63.6s with parallel sorting.
This improvement replaces string comparison (getSym() + Chars.compare) with rank-based comparison (getInt() + rank map lookup) when sorting SYMBOL columns that have static symbol tables. At cursor open, a rank map maps each symbol key to its alphabetical rank, and the comparator then compares ranks instead of strings. It also eliminates the temporary DirectIntList used when building symbol rank maps by replacing it with in-place permutation inversion after quicksort, reducing extra space from O(N) to O(1) where N is the symbol count, mainly benefiting high-cardinality symbol columns. This optimization applies to all comparator-based sort paths: tree-based sort cursors, top-K cursors, window function tree comparators, and window function internal comparators (rank(), percent_rank()). Non-static symbol columns (e.g. CAST(str AS SYMBOL)) fall back to the original string comparison path.
This improvement introduces an adaptive scan strategy for HORIZON JOIN's keyed ASOF lookup. The lookup now starts in backward-only mode, which is cheap for low-cardinality key spaces, and adaptively switches to forward scan mode within a frame when backward scan cost becomes excessive — for example, with high-cardinality symbols or rare/infrequent keys that cause deep backward scans. The switch uses two criteria: a relative threshold where backward scan cost at a position must exceed 8x the gap to trigger (with a minimum gap of 1,024 rows), and an absolute threshold of 131,072 rows to handle cross-partition boundaries where the relative check cannot trigger. Once switched, the frame stays in forward mode for its remainder. This also re-enables the ASOF row cache in HorizonJoinTimeFrameHelper and resets bookmarks on toTop() to eliminate jitter from out-of-order frame processing. Performance improvements range from 13-24% for dense and equity scenarios to 96% for sparse data distributions.

Bug Fixes

A LATEST BY ALL query on a large table could fill an OrderedMap until keyCapacity reached 2^30. When rehash() doubled to newKeyCapacity = 1L << 31, the overflow guard allowed it through because MAX_SAFE_INT_POW_2 was incorrectly set to 1L << 31. The subsequent (int) cast produced Integer.MIN_VALUE, and clear() computed ~18.4 EB for native memset, causing a SIGSEGV. This fix corrects Numbers.MAX_SAFE_INT_POW_2 from 1L << 31 to 1L << 30, so the guard rejects the overflow and throws a clean CairoException("map capacity overflow") instead of crashing the JVM. The constant was also deduplicated from Unordered4Map, Unordered8Map, and UnorderedVarcharMap, each of which had a private copy with the same bug.
The cairo.partition.encoder.parquet.statistics.enabled configuration allows users to disable Parquet statistics, but the read path (ParquetTimestampFinder, TableWriter) and the O3 merge path (O3PartitionJob.processParquetPartition) hard-depended on timestamp column statistics. When statistics were absent, getMinValueLong would hit an assertion crash with -ea or read garbage memory causing silent data corruption without -ea. This fix adds rowGroupMinTimestamp and rowGroupMaxTimestamp methods to PartitionDecoder that try Parquet column statistics first at zero cost, then fall back to decoding the first/last row from actual data pages when statistics are absent. findRowGroupByTimestamp also falls back to decoding instead of reading garbage memory. O3PartitionJob, ParquetTimestampFinder, and TableWriter have been migrated to use the new methods.
This fix resolves a crash that occurred when using SAMPLE BY with FILL(NULL), FILL(value), or FILL(PREV) in queries containing array column aggregates such as last(arr). The fill record types did not implement getArray(), and the null/value fill factories did not handle array column types when constructing fill constants. The fix adds getArray() overrides to SampleByFillRecord and FillRangeRecordCursorFactory.FillRangeRecord, and updates SampleByFillNullRecordCursorFactory and SampleByFillValueRecordCursorFactory to handle array types by yielding NullConstant.NULL, since arrays cannot be filled with scalar values.
When a JOIN query had a WHERE clause containing both a column-referencing condition and a non-column constant expression (e.g., NOW() = NOW()), QuestDB crashed with an internal AssertionError. The SQL optimizer splits the WHERE clause into separate buckets — one for column-referencing conditions and one for constant expressions. The code generator then applied each bucket as a separate FilteredRecordCursorFactory wrapper, but nesting these factories violated an internal assertion. This fix detects when the base factory is already a FilteredRecordCursorFactory, extracts the existing filter, and combines both filters with AND into a single factory instead of nesting them.
ALTER TABLE ... ALTER COLUMN and ALTER MATERIALIZED VIEW ... ALTER COLUMN failed when the column name was quoted (e.g., "MY_COL"). The token from the lexer was used as-is for the metadata lookup, so the quoted form was looked up instead of the unquoted name, resulting in a "column does not exist" error. This fix wraps the token with unquote() before the column name lookup in both compileAlterTable and compileAlterMatView.
The non-vectorized fast cursor (WindowJoinWithPrevailingFastRecordCursor) compared the first window match against the window end instead of the window start, causing the prevailing row to effectively never be included when the window already contained matches. The vectorized variant already used the correct comparison. This fix aligns the non-vectorized path to compare against the window start timestamp. The bug only affected queries using non-vectorizable aggregates (e.g., max(concat(...))) with symbol-keyed WINDOW JOIN and INCLUDE PREVAILING.
The read_parquet() function crashed with SIGSEGV when reading parquet files containing SYMBOL columns encoded by QuestDB's PartitionEncoder. The canProjectMetadata() method passed the actual column type (SYMBOL) to the Rust decoder instead of the expected type (VARCHAR), causing the Rust decoder to write INT32 symbol keys that Java then read as VARCHAR pointers. This fix passes the expected type (VARCHAR) for SYMBOL-to-VARCHAR conversions so the Rust decoder resolves dictionary entries to UTF-8 strings.
This fix addresses two bugs in WINDOW JOIN with INCLUDE PREVAILING. The first bug affected four sync cursor variants that call findRowLo(lo, hi, true): WindowJoinWithPrevailingAndJoinFilterRecordCursor, WindowJoinWithPrevailingAndJoinFilterFastRecordCursor, WindowJoinWithPrevailingFastRecordCursor, and WindowJoinFastVectRecordCursor. When the bookmarked-frame optimization kicked in and the prevailing candidate sat in a prior partition, the bookmarked path skipped initialization of prevailingFrameIndex/prevailingRowIndex, leaving them at -1/Long.MIN_VALUE. Downstream backward scans saw -1 and returned immediately, silently omitting the prevailing row from results. The fix seeds prevailingFrameIndex/prevailingRowIndex when entering via the bookmark path. The second bug was a comparison target error in WindowJoinWithPrevailingFastRecordCursor where the condition that triggers prevailing inclusion checked whether the first row's timestamp exceeded the window high boundary (masterTimestampHi) instead of the window low boundary (slaveTimestampLo).
This fix addresses several correctness, resource leak, and crash issues in the SQL engine. A missing break after the DECIMAL inner switch in generateCastFunctions caused fall-through to the BINARY case, adding a spurious BinColumn to the cast function list and shifting subsequent column indices in multi-column UNION queries, resulting in UnsupportedOperationException at runtime. The moveClauses method never incremented its position counter, so when swapJoinOrder0 needed to move multiple join clauses, only the first was moved — the rest stayed in the wrong join context, creating circular dependencies that broke topological sort on 3+ table cross joins with multi-column WHERE conditions. The hasGroupByFunc/hasOrderedGroupByFunc methods skipped children of non-aggregate functions, making nested aggregates like abs(sum(x)) invisible and causing PIVOT to reject valid aggregate expressions. Resource leaks were fixed in generateJoins where const filters were leaked on error paths, and in generateSampleBy where timezoneNameFunc, offsetFunc, sampleFromFunc, and sampleToFunc were never freed on error paths. A NullPointerException in CREATE MATERIALIZED VIEW was fixed — when WITH BASE specifies a nonexistent table not referenced in the query, getTableTokenIfExists() returned null and .isView() threw NPE instead of a descriptive SqlException. Additional latent bugs were fixed in generateFill (always reading the first fill value instead of iterating) and in GeoHash-to-VARCHAR cast functions (using wrong column type parameter).
This fix corrects a data corruption issue where set_memory_vanilla_vec() delegated to run_vec_bulk, which used TVec::size() as the loop increment. For Vec8uq this returned 8 (the number of uint64_t lanes), but one 512-bit store only covers 4 long_128bit elements or 2 long_256bit elements. The mismatch caused the bulk path to advance past unwritten elements, leaving holes filled with stale data. The fix replaces the broken run_vec_bulk call with a dedicated loop that computes the correct per-store element count from sizeof(TVec) / sizeof(T), guarded by a static_assert.
This fix defaults cairo.mat.view.parallel.sql.enabled to false on machines with fewer than 4 available processors. All other parallel SQL features (filter, group by, top-k, horizon join, window join, Parquet read) remain enabled regardless of core count. Users can still explicitly enable materialized view parallel SQL via configuration on low-core machines.
This fix resolves an issue where read_parquet failed with "encoding not supported" errors when reading Parquet files whose embedded QuestDB metadata schema had a different number of columns than the actual Parquet data. This happens when an external tool (DuckDB, Spark, PyArrow) rewrites a QuestDB-exported Parquet file — for example, dropping partition columns for hive-style directory layouts — but preserves the original key-value metadata. The decoder now compares the QuestDB metadata schema length against the Parquet column count and discards the metadata when they differ, falling back to physical type inference.

March 3, 2026

This release introduces automatic object store WAL cleanup on the primary node, TLS certificate expiration metrics for Prometheus alerting, and new array_sort() and array_reverse() functions. Performance improvements include faster ASOF and WINDOW JOINs through binary search–based frame positioning, along with a configurable WAL writer madvise mode. Several critical bug fixes address crashes in LATEST BY ALL queries, Parquet reads with missing statistics, and backup restore edge cases, while the ACL permission system has been expanded to support up to 256 permissions.

New Features

This feature introduces a WAL cleaner that runs on the primary node and automatically deletes replicated WAL data from object storage once it is no longer needed by any replica or backup. It determines what is safe to delete by consulting two sources of cleanup history — enterprise backup manifests and checkpoint history records — and always retains enough data to support the most recent N backups or checkpoints. The cleaner is conservative by default: it won't delete anything until sufficient history exists, and it picks the most conservative boundary when multiple sources or cluster nodes are involved. Key components include a checkpoint history tracker that records per-table transaction state to the shared replication object store on each CHECKPOINT RELEASE, a backup instance name registry for coordinating cleanup boundaries across multiple nodes, rate limiting and throttling for object store delete operations with auto-tuned defaults per cloud provider (S3, GCS, Azure Blob, R2, etc.), and crash recovery with periodic progress persistence so cleanup resumes where it left off after a restart. Dropped tables are cleaned up after a cooloff period (default 1h) to guard against clock skew. Key configuration properties include replication.primary.cleaner.enabled (default true), replication.primary.cleaner.interval (default 10m), replication.primary.cleaner.backup.window.count (default 5), replication.primary.cleaner.delete.concurrency (auto-tuned 4–12), replication.primary.cleaner.max.requests.per.second (service-dependent), and checkpoint.history.enabled (default true when replication is enabled).
This feature adds Prometheus gauge metrics for TLS certificate time-to-live (TTL) across all four TLS-enabled endpoints: questdb_tls_cert_ttl_seconds_http, questdb_tls_cert_ttl_seconds_http_min, questdb_tls_cert_ttl_seconds_line, and questdb_tls_cert_ttl_seconds_pg. Each gauge reports seconds until the active certificate expires. Values greater than 0 indicate seconds remaining, 0 means expired, and -1 means the certificate has not been loaded or could not be parsed. Gauges are only registered for endpoints where TLS is enabled. The TTL is computed from the certificate's notAfter field, which is extracted via a JNI call into a minimal DER/X.509 parser on the Rust side. The expiration epoch is cached and updated on reload_tls(), so the metric always reflects the active in-memory certificate, not the one on disk.
This feature adds array_sort(DOUBLE[]) and array_reverse(DOUBLE[]) scalar functions that operate on double arrays of any dimensionality. array_sort() sorts each innermost-dimension slice independently, preserving the array's shape, and accepts optional boolean arguments for descending order and nulls-first placement. array_reverse() reverses each innermost-dimension slice. Both functions handle NULL arrays, empty arrays, NaN values, and multidimensional inputs. They support both contiguous unit-stride and non-vanilla array layouts via separate code paths. The sort buffer grows on demand and stays at peak size for the cursor's lifetime to avoid allocation churn on the hot path.
This feature adds minTimestamp and maxTimestamp TIMESTAMP columns to sys.telemetry_wal to capture the data timestamp range per WAL transaction event. WAL telemetry is now enabled by default regardless of the main telemetry setting, reducing the dependency on logs to investigate data writing shape. The WalWriter commit log message has been downgraded from info to debug level unless the commit has a replace range. Schema migration support has been added so that when the column count mismatches the expected schema, the table is dropped and recreated, which is safe given the 1-week TTL.

Improvements

This improvement mirrors the optimization already present in HORIZON JOIN. Without it, the first lookup linearly scans through all slave time frames preceding the master's first timestamp, which is O(N) in the number of frames. With the seekEstimate() optimization, the initial positioning is O(log P) where P is the number of partitions. Specifically, AbstractAsOfJoinFastRecordCursor.openSlaveFrame() now calls seekEstimate() on the first slave frame lookup to binary-search directly to the target partition instead of linearly scanning all preceding frames, benefiting all ASOF JOIN and LT JOIN fast-path factories. WindowJoinTimeFrameHelper.findRowLo() and findRowLoWithPrevailing() also now call seekEstimate() on the first lookup with the same partition-skipping behavior, benefiting both sync and async WINDOW JOIN factories.
WalWriter previously hardcoded POSIX_MADV_RANDOM for memory-mapped column files, which hurts most workloads. This improvement makes the madvise hint opt-in via a new configuration property cairo.wal.writer.madvise.mode with valid values: none (default, no hint), sequential, and random. The random mode is beneficial when ingesting into many tables with many columns, as it prevents the OS from speculatively reading adjacent pages under memory pressure.

Example configuration:
```
cairo.wal.writer.madvise.mode=random
```
This improvement refactors the ACL permission system to support more than 64 permissions by migrating from 64-bit bitmasks to an exponent-based representation with 256-bit aggregate masks. Permission constants changed from long bitmasks to int exponents, and a new PermissionMask class provides 256-bit storage (4 longs) for aggregate permission sets. The permission column type in the database schema changed from long to short (storing exponents instead of bitmasks), reducing storage overhead while supporting up to 256 distinct permissions. PermissionMask.ZERO is now immutable and throws on mutation attempts, and the sentinel value handling for ALL_PERMISSIONS is properly supported across has, set, and clear operations.

Bug Fixes

This fix corrects Numbers.MAX_SAFE_INT_POW_2 from 1L << 31 to 1L << 30. The old value (2^31) does not fit in a signed 32-bit int, so the rehash overflow guard let exactly 2^31 through. The subsequent (int) cast produced Integer.MIN_VALUE, and clear() fed approximately 18 EB to native memset, causing a SIGSEGV. The crash chain occurred when a LATEST BY ALL query on a large table filled an OrderedMap until keyCapacity reached 2^30, then rehash() doubled to newKeyCapacity = 1L << 31, which truncated to a negative value and passed an enormous size to native memset. The fix makes the guard reject newKeyCapacity = 2^31, throwing a clean CairoException("map capacity overflow") instead of crashing the JVM. The constant was also deduplicated from Unordered4Map, Unordered8Map, and UnorderedVarcharMap, each of which had a private copy with the same bug.
The cairo.partition.encoder.parquet.statistics.enabled configuration allows users to disable Parquet statistics, but the read path (ParquetTimestampFinder, TableWriter) and the O3 merge path (O3PartitionJob.processParquetPartition) hard-depended on timestamp column statistics. When statistics were absent, getMinValueLong would hit an assertion crash with -ea or read garbage memory causing silent data corruption without -ea. This fix removes that hard dependency by adding rowGroupMinTimestamp and rowGroupMaxTimestamp methods to PartitionDecoder that try Parquet column statistics first at zero cost, then fall back to decoding the first/last row from actual data pages when statistics are absent. The findRowGroupByTimestamp method also falls back to decoding instead of reading garbage memory, and O3PartitionJob, ParquetTimestampFinder, and TableWriter have been migrated to use the new methods.
During backup, partitions with row_count=0 may not produce meta.msgpack. The restore process previously always attempted to download this file when hash verification was enabled, causing restores to fail with "no partition metadata found" for empty partitions. This fix skips downloading meta.msgpack for empty partitions during restore and skips hash verification in that case, while still requiring metadata for non-empty partitions.
A primary instance restored from a backup could encounter a race condition between the WalPurgeJob and dropped table request processing, where the uploader could not open the _txnlog because it had already been deleted. This fix ensures that the WalPurgeJob does not delete state that the uploader still needs. The bug was unique to backups since backups do not restore .pending files used to control this workflow. At start-up, the adjusted WalUploader's replication logic now ensures that the appropriate .pending file is recreated when missing, which also patches instances that were already restored with older versions of QuestDB Enterprise.
This fix introduces three new ACL permissions: ALTER SYMBOL CAPACITY (column-level), SET REFRESH LIMIT (table-level), and SET REFRESH TYPE (table-level). ALTER SYMBOL CAPACITY replaces the incorrect reuse of ALTER COLUMN TYPE for symbol capacity changes. A startup migration automatically grants ALTER SYMBOL CAPACITY to every entity that previously had ALTER COLUMN TYPE, at the same scope (database, table, or column level), preserving grant options. SET REFRESH LIMIT and SET REFRESH TYPE gate the previously unprotected ALTER MATERIALIZED VIEW ... SET REFRESH LIMIT/IMMEDIATE/MANUAL/EVERY/PERIOD operations. This fix also wires in the previously commented-out authorization check for ALTER TABLE SET PARAM and adds a retry loop with recompile in access list reloading to handle table reference out-of-date exceptions during ACL reload.

February 25, 2026

QuestDB 9.3.3 is a feature-rich release introducing HORIZON JOIN for markout analysis, a new twap() aggregate, SQL-standard WINDOW definitions, JIT compilation on ARM64, and file-based secrets for Kubernetes deployments. It also brings significant performance improvements across Parquet I/O, parallel GROUP BY, UNION queries, and ORDER BY on computed expressions.

New Features

Sensitive configuration options can now be loaded from files using the _FILE suffix convention, enabling seamless integration with Kubernetes Secrets, Docker Secrets, HashiCorp Vault, and other file-based secrets management systems. For example, setting QDB_PG_PASSWORD_FILE=/run/secrets/pg_password as an environment variable or pg.password.file=/run/secrets/pg_password in server.conf will load the password from the specified file. This feature works with all sensitive properties (pg.password, pg.readonly.password, http.password). File contents are automatically trimmed of whitespace. SHOW PARAMETERS displays value_source = 'file' for secrets loaded from files, and secrets are reloaded when file contents change via SELECT reload_config().
This feature allows underscores as number separators in TICK date expressions (e.g., $now-10_000T) and duration suffixes (e.g., ;1_500T), consistent with Numbers.parseInt() which already supports this. Error position reporting in date expression evaluation has also been improved to point at the specific offending location instead of the start of the expression. Invalid underscore placement (leading, trailing, consecutive) is properly detected and reported.
This feature enables QuestDB's JIT code generation to run natively on ARM64 systems in addition to x86. The changes introduce ARM64-specific implementations for register handling, caches, and code generation, while maintaining full compatibility with the existing x86 logic. Conditional compilation is used throughout to select the appropriate architecture at build time. ARM64-specific versions of Function and CountOnlyFunction structs were implemented, including methods for code generation, register setup, and loop logic that mirror the x86 implementations but use ARM64 instructions and register types.
This feature introduces the WINDOW clause, allowing users to define named window specifications that can be reused across multiple window functions within a single query. Window inheritance following the SQL standard is also supported, where a named window can reference another named window as its base, inheriting PARTITION BY, ORDER BY, and frame clauses. Chained inheritance is supported (e.g., w3 references w2, which references w1). Merge rules follow the SQL standard: PARTITION BY is always inherited from the base (child cannot specify its own), ORDER BY in the child takes precedence if specified, and frame clauses in the child take precedence if non-default. Validation ensures base windows must be defined earlier in the WINDOW clause, PARTITION BY in a child window referencing a base is rejected, and circular and self-references are prevented.
This feature introduces the twap(price, timestamp) aggregate function that computes the time-weighted average price using step-function integration, where each price is held constant until the next observation and the TWAP is the area under this step function divided by the total time span. The function supports parallel GROUP BY execution via per-worker native buffers that are merge-sorted during the merge phase, since workers process non-adjacent page frames via work-stealing and an incremental weighted-sum approach would incorrectly bridge gaps between frames. It falls back to a simple arithmetic mean when all observations share the same timestamp. SAMPLE BY with FILL modes is also supported.
This feature adds support for all six decimal types (DECIMAL8, DECIMAL16, DECIMAL32, DECIMAL64, DECIMAL128, and DECIMAL256) in the Parquet format. Write support stores decimals as fixed-length byte arrays in big-endian format per the Parquet specification. Read support includes a WordSwapDecimalColumnSink for DECIMAL128 and DECIMAL256 that correctly converts from Parquet's big-endian format by reversing each 8-byte word independently. The supported Parquet physical types are INT32 for small decimals (DECIMAL8/16/32), INT64 for DECIMAL64, and FixedLenByteArray for all decimal types.
This feature introduces array_build(nDims, size, filler1, ...) that creates DOUBLE[] or DOUBLE[][] arrays with controlled shape and fill values. The size parameter accepts a scalar integer or a DOUBLE[] (using its cardinality). Each filler can be a scalar (repeated for all elements) or a DOUBLE[] (copied element-by-element, NaN-padded if shorter, truncated if longer).
This feature introduces HORIZON JOIN, a specialized time-series join designed for markout analysis — a common financial analytics pattern where you need to analyze how prices or metrics evolve at specific time offsets relative to events such as trades or orders. For each row in the left-hand table and each offset in the horizon, the join computes left_timestamp + offset and performs an ASOF match against the right-hand table. When join keys are provided via ON, only right-hand rows matching the keys are considered. The horizon is defined using either a RANGE FROM <from> TO <to> STEP <step> clause for uniform offsets, or a LIST (<offset>, ...) clause for explicit non-uniform offsets, both aliased with AS. The pseudo-table exposes .offset (LONG) and .timestamp (TIMESTAMP) columns. Both RANGE and LIST use the same interval expression syntax as SAMPLE BY with supported units: U (microseconds), T (milliseconds), s (seconds), m (minutes), h (hours), d (days). Use cases include markout P&L analysis, event impact studies, and time-series correlation at different lags. Current limitations include no combination with other joins at the same query level, no right-hand side WHERE filters, both tables requiring a designated timestamp, positive STEP with FROM ≤ TO for RANGE, and monotonically increasing offsets for LIST.
```
SELECT h.offset / 1000000 AS horizon_sec, t.sym, avg(m.mid) AS avg_mid
FROM trades AS t
HORIZON JOIN mid_prices AS m ON (t.sym = m.sym)
RANGE FROM 1s TO 60s STEP 1s AS h
ORDER BY t.sym, horizon_sec
```
This feature adds a Table Details Drawer to the Web Console for tracking the health and details of a table or materialized view. It simplifies the table schema explanation flow by removing flow-specific schema and adding streaming support, along with a new flow for table health issues. An adaptive polling mechanism checks request latency proactively and sets the interval dynamically. WAL tables polling has been removed except for the suspension dialog. The feature also includes monitoring support in the AI Assistant docs, updated tooltip behavior and styling throughout the application, query truncation support for LiteEditor, history and navigation among right-hand-side bar drawers, and a mechanism to copy table and column names using Ctrl/Cmd+C.
This feature adds import and export support for editor tabs, makes tabs scrollable with an increased minimum width, adds a rename button that appears on tab hover, and updates the editor tab styling.

Improvements

This improvement introduces three optimizations for Parquet partition reads. Late materialization decodes only filter columns first to identify matching rows, then decodes remaining columns only for those rows, significantly reducing decoding overhead for low-selectivity queries. Zero-copy mmap page reading uses a new SlicePageReader that reads Parquet pages directly from the memory-mapped byte slice, bypassing the previous approach of copying page data into an intermediate buffer. Raw array encoding is now enabled by default for partition-to-Parquet conversion, avoiding the overhead of Parquet's nested LIST decoding. In benchmarks, an OHLC aggregation query on an 8-day Parquet partition improved from 600ms to 250ms.
This improvement speeds up Parquet export through two optimizations. SIMD-accelerated encoding uses portable SIMD intrinsics for nullable INT/LONG columns, processing 64 values per iteration for vectorized definition level encoding, yielding up to 16.5% faster throughput and 20% less CPU usage with LZ4RAW compression. A new streaming mode for TableReader applies MADV_SEQUENTIAL on mmap for aggressive read-ahead prefetching and MADV_DONTNEED before munmap for immediate page cache release. Streaming mode bypasses the mmap cache so each partition mapping is independent and fully releasable. Parquet exporters automatically enable streaming mode. Under memory pressure, the combination of both madvise hints recovers 94% of baseline performance (387 MB/s vs 82 MB/s without hints).
This improvement pushes designated-timestamp filters from outside a UNION / UNION ALL / EXCEPT / INTERSECT subquery into each branch of the set operation, enabling per-branch partition pruning on time-series tables. The optimizer only pushes filters that exclusively reference the designated timestamp column, deliberately excluding non-timestamp filters to avoid type mismatches and column resolution issues across branches with different schemas. The outer filter always stays on the wrapper model, so correctness is unchanged. For composite filters like WHERE ts IN '2025-12-01T01;2h' AND x > 5, the optimizer splits the top-level AND into separate conjuncts and processes each independently: it pushes the timestamp conjunct into branches, while x > 5 stays at the parent level only. Safety mechanisms include per-branch semantic guards that skip branches with LATEST BY, LIMIT, or non-pushable SAMPLE BY, column existence checks, and deep cloning to prevent cross-branch mutation.
This improvement adds an unordered mode to PageFrameSequence that uses SOUnboundedCountDownLatch instead of ordered collection, eliminating head-of-line blocking for reducers that don't need ordered results. Keyed GROUP BY, non-keyed GROUP BY, and Top K factories now use this unordered mode, replacing boilerplate ordered collect loops with a single dispatchAllAndAwait() call. Error field copying in storeError() was also fixed to use a dedicated CairoException instance, preventing thread-local exception recycling from corrupting stored error data. At concurrency 8, per-iteration spread (a direct proxy for head-of-line blocking) was reduced by 53–75% across query types, tail latency (p99) improved by 30–44%, and latency predictability (p99/p50 ratio) improved from 2.5–3.6x down to 1.4–1.8x. Throughput at concurrency 8 improved by 6–9% for most query types, and throughput now scales monotonically from concurrency 1 to 2 instead of regressing as on the previous implementation.
CASE expressions on symbol columns with static symbol tables now resolve string constants to integer symbol keys at initialization time and compare by integer at runtime, avoiding per-row string comparisons. Three picker specializations are used based on branch count: single-branch (one equality check), dual-branch (two equality comparisons), and multi-branch (integer-object hash map lookup). When a WHEN value is not found in the symbol table, the branch is silently skipped. Non-static symbol tables (e.g., from casts) fall back to the existing character sequence-keyed comparison.
This improvement replaces the Dragon4 double-to-string algorithm with Ryu (Ulf Adams, 2018), which produces minimal-length, correctly-rounded decimal representations with better performance. A direct double-to-decimal conversion path (Numbers.doubleToDecimal) bypasses string formatting entirely, removing the double→string→decimal roundtrip in CastDoubleToDecimalFunctionFactory. The dependency on jdk.internal.math.FDBigInteger has been eliminated, removing the --add-exports java.base/jdk.internal.math compiler flag.
When sorting by computed expressions (e.g., ORDER BY a + b * c), QuestDB's SortedLightRecordCursor previously re-evaluated the expression on every comparison during tree insertion and output, meaning sort key functions were called O(N log N) times during sorting plus O(N) during output. This improvement adds a sort key materialization layer that pre-computes expensive sort key values into off-heap buffers before sorting, reducing function evaluations from O(N log N) to O(N). Each function gets a complexity score that propagates through the expression tree. If any sort key column exceeds the configured complexity threshold (default 3, configurable via cairo.sql.sort.key.materialization.threshold), that column's values are pre-computed and stored in per-column memory buffers. A materializing record intercepts reads for materialized columns from the buffer and delegates to the base record for everything else. The feature supports two-pass operation where subsequent passes after toTop() look up existing ordinals without re-computing values. This optimization applies to fixed-size types only (no string/varchar/binary materialization) and operates on the general sort path only (no Top-K modification).
This improvement replaces the digit-by-digit loop in doubleToDecimal() with a single wide multiply via a new ofDigitsAndPower(long digits, int power) method on the Decimal interface. The old code extracted up to 17 individual digits from the Ryu significand via long division, then performed 17 wide additions (128-bit or 256-bit). The new code multiplies the significand by a single power of 10. Benchmarks show Decimal64 is 1.8x faster, Decimal128 is 2.2x faster, and Decimal256 is 2.8x faster. The three types converge to roughly the same cost since the single multiply dominates and the word-width difference matters less for one operation than for 17.
This improvement introduces a hybrid export mode for Parquet export (COPY ... TO and HTTP /exp) that passes through raw page-frame-backed columns zero-copy and materializes only computed columns into native buffers, row by row per frame. Queries with no page-frame backing (e.g., cross joins) now use a cursor-based export that materializes all columns row by row, also avoiding the temporary table detour. Only queries requiring re-partitioning (PARTITION BY override) or containing a computed BINARY column still use the temporary table path. The export mode is determined by inspecting the compiled RecordCursorFactory before constructing any temporary table: DIRECT_PAGE_FRAME for factories supporting PageFrameCursor directly (zero-copy), PAGE_FRAME_BACKED for virtual record cursor factories whose base supports page frames with no computed BINARY columns, CURSOR_BASED for no page-frame backing or descending ORDER BY with computed columns, and TEMP_TABLE for PARTITION BY overrides or computed BINARY columns. The HybridColumnMaterializer handles both PAGE_FRAME_BACKED and CURSOR_BASED modes, converting computed SYMBOL columns to STRING in adjusted metadata while dispatching via source type for correct record accessor usage. A buffer pool recycles computed-column native buffers rather than freeing them after each row group flush. This improvement also fixes several resource lifecycle issues on error paths including page-frame cursor leaks, temp directory leaks on moveFile() failure, and use-after-free on error path cleanup. The exporter hierarchy was refactored so that BaseParquetExporter holds shared state while HTTPSerialParquetExporter and SQLSerialParquetExporter extend it independently, removing the prior dependency where the SQL exporter extended the HTTP exporter.

Bug Fixes

When QuestDB fails to start due to configuration errors, a race condition between the JVM shutdown hook and the startup script could produce confusing cat and rm error messages for the hello.txt file. This fix redirects stderr to /dev/null for those commands and adds proper quoting around file paths for robustness with paths containing spaces. The actual startup failure details remain available in the stdout-*.txt log files.
Concurrent Parquet exports could produce corrupted or truncated files when multiple clients exported simultaneously through the same HTTP worker. The issue occurred because ExportQueryProcessor stored per-connection state in per-processor fields. When a worker parked one connection via PeerIsSlowToReadException and began serving another, these fields were overwritten, causing the onWrite() callback to write Parquet chunks to the wrong response on resume. This fix moves the affected fields to per-connection ExportQueryProcessorState.
This fix corrects greedy nanosecond parsing (single N followed by non-digit) to be consistent with milliseconds and microseconds. Previously, .SSSUUUN with input .1234567 produced .123456007Z (7ns) instead of the correct .123456700Z (700ns).
This fix improves the reliability of memory-mapped file handling for columns on Windows platforms. A fallback mechanism for mapping files on Windows has been introduced, which gracefully handles transient permission errors ("Access Denied" even after successful file open) by falling back to an anonymous memory map populated by reading the file contents directly. On non-Windows platforms, the original mapping logic is retained.
This fix addresses an issue where DynamicPropServerConfiguration.reload() accumulated stale entries in changedKeys across reload cycles when only a secret file changed while the properties file remained unchanged. This caused watchers to be spuriously notified about properties that did not actually change. The root cause was that changedKeys.clear() only lived inside updateSupportedProperties(), which is skipped when the properties file hasn't changed. The fix moves changedKeys.clear() to the top of reload(), inside the synchronized block, so every reload cycle starts fresh.
This fix resolves an issue where the ALTER TABLE t ADD COLUMN IF NOT EXISTS col <type> path did not fully resolve parameterized or compound column types before comparing with existing metadata. For DECIMAL and GEOHASH, the path used ColumnType.typeOf() which returns the base type constant, but metadata stores the fully encoded type (with precision/scale/bits). For array types (e.g., DOUBLE[]), the path did not parse array dimensionality brackets, causing type mismatch errors and unconsumed tokens. Unsupported array element types (e.g., INT[]) now correctly report "unsupported array element type" instead of falling through to a misleading error, and unmatched ] brackets after a type name are now detected with a helpful error pointing at the bracket.
This fix adds post-compilation validation in the SQL compiler to reject unconsumed tokens after a valid statement. Previously, trailing content was silently ignored, which could mask user errors. PostgreSQL Wire Protocol compatibility no-op handlers (RESET, CLOSE, UNLISTEN, DISCARD) now require their expected arguments instead of silently accepting bare keywords. BEGIN, COMMIT, and ROLLBACK now properly consume the optional TRANSACTION keyword. The SET statement syntax is now validated to conform to SET [SESSION | LOCAL] name { = | TO } value [, value]*, rejecting malformed forms such as missing name, invalid operator, missing value, and dangling commas.
This fix resolves two related bugs in SqlOptimiser.propagateTopDownColumns0() that caused AssertionError in SqlCodeGenerator.checkIfSetCastIsRequired when UNION sibling models ended up with different topDownColumns counts. The first bug occurred when WHERE clause and timestamp column literals were emitted to union model branches by name resolution; since UNION matches columns by position rather than name, the same alias could resolve to different column indices in different branches, causing one branch to receive extra top-down columns. The fix removes the name-based emission loops in favor of the existing index-based propagation. The second bug occurred when a GROUP BY or SAMPLE BY model as the first UNION member added non-aggregate key columns to its own topDownColumns without propagating them to union siblings. The fix ensures these columns are propagated to all union siblings whose topDownColumns are already populated.
This fix addresses performance degradation when exporting tables with 1M+ distinct symbols to Parquet. Previously, the large default batch size of 1M rows from the general CREATE TABLE AS SELECT setting prevented frequent batch commits, causing symbol index re-scaling to be deferred and degrading performance as the symbol table grew without capacity adjustments. A new configuration property cairo.parquet.export.batch.size with a default of 100K rows is now used specifically for Parquet exports.
When CREATE TABLE ... AS (SELECT ...) fails during data population with a non-Cairo exception, the partially created table and its name lock were not cleaned up. This also affects COPY parquet export, which uses temp tables internally. This fix adds fallback cleanup in COPY parquet export to match the existing HTTP export pattern.
This fix reads the len byte in DecimalBinaryFormatParser as unsigned (& 0xFF) instead of signed. A malformed Influx Line Protocol message with a high len byte (e.g. 0x80) was interpreted as negative, which skipped the VALUES parsing state and left unscaledValues empty. The subsequent load() call then hit an ArrayIndexOutOfBoundsException accessing index 0 of the empty list.
When a view name is quoted (single or double quotes), SqlParser passed the quoted token directly to getTableTokenIfExists(), which failed to find the view. This affected three code paths: SELECT ... FROM 'my_view', SELECT ... JOIN 'my_view' ON ..., and COMPILE VIEW 'my_view'. This fix adds unquote() to all three getTableTokenIfExists() calls so the view is recognized regardless of quoting.
When a CASE/WHEN expression contains a window function (e.g., lag() OVER (...)), column references outside the window function (in THEN/ELSE branches) were not emitted to the translating model. This caused "invalid column" errors when using the original column name alongside a SELECT alias (e.g., price AS p then THEN price). This fix introduces replaceWindowFunctionOrLiteral() that chains window function replacement with literal emission in a single tree traversal pass, so all column references are propagated through the model chain.
This fix introduces CompiledTickExpression, which pre-parses tick expressions containing date variables ($now, $today, $yesterday, $tomorrow) into a single long[] intermediate representation at compile time. Runtime evaluation performs only long arithmetic with no string parsing or allocations. A new DateVariableExpr encodes and evaluates $variable ± offset expressions supporting all time units plus business days. Expressions containing $ variables now produce a dynamic CompiledTickExpression instead of re-parsing the string on every query execution. The evaluation algorithm walks elements to emit intervals, applies day filter bitmasks in local time, handles timezone conversion (numeric subtraction or DST-aware toUTC), applies exchange schedule filtering with optional duration, and sorts and merges overlapping intervals. Additionally, Character.isWhitespace and Character.isDigit calls were replaced with Chars.isAsciiWhitespace and Chars.isAsciiDigit throughout IntervalUtils to avoid locale-dependent behavior.
This fix moves the circuit breaker check in DatabaseCheckpointAgent.checkpointCreate() from after ff.sync() to before it. The POSIX sync() system call flushes all dirty filesystem buffers system-wide, not just QuestDB's files. On busy hosts with heavy I/O from other processes, sync() can block for well over the 60-second query timeout. Previously, the circuit breaker was only checked after sync() returned, so a completed checkpoint was discarded when the timeout had been exceeded during the blocking call. With this fix, if the timeout is already exceeded from the table loop or lock acquisition, the operation fails fast without entering a potentially long-blocking sync(). If the timeout has not been exceeded, sync() runs to completion and the checkpoint succeeds regardless of how long sync() takes.
FILL(LINEAR) was disregarding FROM, meaning it would return calendar-aligned results instead of FROM-aligned results. This fix ensures FROM anchors the initial timestamp and TO applies as a bound. There is no intent for this to support fill expansion (i.e., filling before and after the dataset); it is unclear at this stage how pre- and post-filling should operate with linear fills.
Tick expressions containing date variables ($now, $today, etc.) used as filter predicates — e.g., WHERE now() IN '$now-100s..$now' — resolved $now once at compile time and cached the resulting interval as static timestamps. On repeated execution of a cached SQL statement, the interval never updated, causing the filter to return zero rows once now() moved past the originally computed range. The interval model path was already fixed to compile such expressions into IR via CompiledTickExpression, but the filter function path in InTimestampTimestampFunctionFactory was not updated and still used the static function. This fix detects date variables in constant tick expression strings and routes them through CompiledTickExpression, which re-evaluates the interval from IR on each execution via init().
This fix addresses a vulnerability where ChunkedContentParser.parseChunkLength() had no overflow guard on the hex chunk-size parsing loop. A crafted HTTP request with 16+ hex digits in the chunk size would overflow the long chunkSize to a negative value, corrupting the internal buffer pointer. On the next loop iteration, isEol() would dereference the corrupted pointer, causing the JVM to crash with SIGSEGV. This was reachable over the network with an unauthenticated POST request using Transfer-Encoding: chunked. The fix adds an overflow guard before each chunkSize * 16 accumulation step: if chunkSize > Long.MAX_VALUE >>> 4, the next multiply would overflow a positive long, so the input is rejected as a protocol violation. The server logs the violation and disconnects the client cleanly.
This fix resolves a crash where WINDOW JOIN and HORIZON JOIN would throw a NullPointerException when the right-hand side (slave) is a subquery that applies a timestamp interval filter or reorders columns. The root cause was that SelectedRecordCursorFactory and ExtraNullColumnCursorFactory did not implement newTimeFrameCursor(), which is required by the concurrent time frame cursor infrastructure used in these joins. Additionally, SelectedPageFrameCursor did not implement TablePageFrameCursor, causing a ClassCastException. After fixing those, a subtler double column remapping bug remained: both SelectedConcurrentTimeFrameCursor and SelectedPageFrameCursor were independently remapping column indices through columnCrossIndex, causing symbol table lookups to hit wrong columns and the timestamp column to be read from the wrong position in the address cache. This fix extracts ConcurrentTimeFrameCursor into an interface, adds newTimeFrameCursor() to the affected factories, and simplifies SelectedConcurrentTimeFrameCursor to delegate column remapping entirely to SelectedPageFrameCursor, avoiding double remapping.

February 4, 2026

This release introduces Kubernetes secrets file support for enhanced security in containerized deployments, adds TICK exchange calendar support for financial market applications, and delivers significant Parquet read performance improvements through advanced optimization techniques. The update also addresses several critical bug fixes including a rare WAL file error on Windows replicas, backup log message accumulation issues, and Parquet export corruption under concurrent connections.

New Features

This feature enables reading sensitive configuration values from files using the _FILE suffix convention, allowing native Kubernetes secret file mounts without requiring shell scripts or init containers. It works with both environment variables (QDB_ACL_ADMIN_PASSWORD_FILE) and properties (acl.admin.password.file). All properties marked as sensitive=true are supported, including acl.admin.password, acl.oidc.tls.keystore.password, replication.object.store, cold.storage.object.store, backup.object.store*, and authentication passwords. The _FILE variant takes precedence over direct values, with fallback to direct values when no _FILE variant exists.
This feature adds enterprise exchange calendar support, enabling TICK expressions to filter time intervals by stock exchange trading schedules using syntax like 2025-01-24#XNYS for NYSE trading hours. The implementation includes an EntExchangeCalendarService that loads trading schedules for 67 global exchanges from a bundled Parquet file, supports custom schedule overrides via a _exchange_calendars_custom table, and provides lazy-loading with concurrent access protection. It includes a reload_exchange_calendars() function for managing custom schedules and an exchange_calendars() table function for querying effective calendar data. The feature handles multi-session trading days, lunch breaks, and case-insensitive exchange codes.
This feature implements late materialization optimization for Parquet partitions in parallel query execution with selective filters. Instead of decoding all columns upfront, it first decodes only filter columns to identify matching rows, then decodes other columns only for rows that passed the filter, significantly reducing decoding overhead for low-selectivity queries. The feature introduces SlicePageReader for zero-copy memory-mapped page reading, bypassing intermediate Vec<u8> copies. Additionally, it defaults partitionEncoderParquetRawArrayEncoding to true for better performance by using raw array encoding instead of nested LIST decoding.

Bug Fixes

This fix resolves several race condition issues related to replication. Previously, WAL downloader tasks could leak when cancelled, causing conflicts with WalPurgeJob over file locks. The fix replaces JoinSet::shutdown with a CancellationToken to properly notify and wait for all tasks to complete, preventing file handle leaks on Windows. Additionally, this fix replaces the file-based lock mechanism with a centralized semaphore-based system using WALSegmentLockManager to coordinate access between WalPurgeJob, WalWriter, and the WAL downloader.
This fix addresses a race condition where QuestDB startup failures (such as invalid configuration) would cause confusing cat and rm error messages when the hello.txt file is removed by deleteOnExit() before print-hello.sh can process it. The fix redirects stderr to /dev/null for these commands and adds quotes around $HELLO_FILE for robustness with paths containing spaces. The actual startup failure details remain available in stdout-*.txt logs.
This fix resolves an issue where concurrent Parquet exports could produce corrupted or truncated files when multiple clients exported simultaneously through the same HTTP worker. The problem occurred because the ExportQueryProcessor stored per-connection state in per-processor fields, causing connection contexts to be overwritten when workers switched between connections. This resulted in Parquet chunks being written to the wrong response, producing invalid data that failed to open in downstream tools.

January 29, 2026

January 28, 2026

This release introduces significant SQL enhancements including new aggregate functions (arg_min, arg_max, bool_and, bool_or, bit operations, geomean), window functions (VWEMA, EMA, percent_rank), and geospatial functions (within_box, within_radius). The update also features the new TICK (Temporal Interval Calendar Kit) for interval literals and improved timestamp predicate handling. Additionally, the release includes important bug fixes for ILP writes, window joins, parquet operations, and performance optimizations for concurrent queries and parquet decoding.

New Features

This feature adds Volume-Weighted Exponential Moving Average (VWEMA) as a new window function with the signature avg(price, kind, param, volume). It supports three smoothing modes: 'alpha' for direct smoothing factor (0 < α ≤ 1), 'period' for EMA-style period where α = 2/(period+1), and time units ('second', 'minute', 'hour', 'day') for time-weighted decay with tau parameter. The VWEMA formula calculates numerator = α × price × volume + (1-α) × prev_numerator, denominator = α × volume + (1-α) × prev_denominator, and VWEMA = numerator / denominator. For time-weighted mode, α = 1 - exp(-Δt / τ).
This feature implements arg_min(value, key) and arg_max(value, key) aggregate functions that return the value of the first argument at the minimum/maximum value of the second argument. These functions support 18 type combinations including double, timestamp, long, and uuid types for both value and key parameters. The functions include full parallel execution support with proper merge logic, correct null handling for both keys and values, and UUID comparison using unsigned long comparison. Null keys are ignored, and the functions return null if the value at the min/max key is null.
This feature implements Exponential Moving Average (EMA) as a window function accessible via avg(value, kind, param) syntax. It supports three modes: 'alpha' for direct smoothing factor (0 < alpha ≤ 1), 'period' for N-period EMA where alpha = 2 / (N + 1), and time units ('second', 'minute', 'hour', 'day', 'week') for time-weighted decay using alpha = 1 - exp(-Δt / τ). The function works with both microsecond and nanosecond timestamp precision via TimestampDriver and supports PARTITION BY for computing separate EMAs per group. NULL values are skipped while preserving the previous EMA value.
This feature adds bool_and(T) aggregate function that returns true if all values are true, and bool_or(T) aggregate function that returns true if any value is true. Both functions support parallel execution via merge method, work with GROUP BY and SAMPLE BY clauses, and accept boolean expressions as arguments. The functions include constant folding optimization that returns results directly without row-by-row computation when the argument is constant.
These functions perform bitwise operations on all non-null values in a column, supporting byte, short, int, and long data types. The BIT_AND() function returns the bitwise AND of all values, BIT_OR() returns the bitwise OR, and BIT_XOR() returns the bitwise XOR. All functions include proper null handling (nulls are skipped), constant folding optimization, and parallel execution support. They return null for empty tables or all-null inputs and can be used with GROUP BY clauses.
This function computes the geometric mean of positive numbers using the formula exp(avg(ln(x))) to avoid overflow with large products. It accepts double values (other numeric types convert implicitly) and includes full parallel execution support. The function returns null for negative values, zero values, or empty groups, following DuckDB semantics. Constant folding optimization is included where geomean(c) = c for any positive constant, avoiding aggregate machinery overhead.
This feature enables the query optimizer to recognize DATEADD() calls in time intrinsics and transform them for better performance. When a query contains WHERE dateadd('m', 15, timestamp) = '2022-03-08T18:30:00.000Z', the optimizer transforms it to WHERE timestamp = dateadd('m', -15, '2022-03-08T18:30:00.000Z'), allowing existing interval-based partition pruning to work effectively. The implementation adds AST rewriting to WhereClauseParser and handles comparison operators and BETWEEN expressions.
This feature removes the restriction that prevented window functions from being used as arguments to other functions, enabling queries like SELECT abs(row_number() OVER ()) FROM t and nested window functions such as SELECT sum(row_number() OVER ()) OVER () FROM t. The implementation includes O(1) hash-based deduplication that computes identical window functions only once, significantly improving performance for queries with duplicate window function calls. This feature adds a nesting depth limit of 8 levels to prevent excessive recursion and provides clear error messages when window functions are incorrectly used in PARTITION BY or ORDER BY clauses of window specifications.
This feature adds four new SQL functions optimized for spatial filtering, particularly useful for indexing lidar scans and local area queries. The within_box() function checks if a point is within a rectangular bounding box using Cartesian coordinates, while within_radius() performs circular radius checks. For geographic coordinates, geo_within_radius_latlon() and geo_distance_meters() use equirectangular projection for fast local distance calculations with ~0.1% accuracy for distances under 100km. The implementation includes branchless bit manipulation for optimal vectorized execution, constant folding optimization for compile-time precomputation when reference points are constants, and comprehensive coordinate validation for latitude (-90 to 90) and longitude (-180 to 180) ranges.
This feature enables the query optimizer to recognize OR combinations of timestamp IN predicates as intrinsic interval filters, allowing efficient interval forward scans instead of falling back to row-by-row JIT filtering. Previously, queries like WHERE timestamp IN '2018-01-01' OR timestamp IN '2018-01-02' would use slow JIT filters, but now they utilize the same efficient interval scanning as single IN predicates. The implementation extends the WhereClauseParser to extract timestamp intrinsics from OR trees and adds interval union functionality to combine multiple date ranges into optimized scan operations.
This feature adds the LENGTH_BYTES() SQL function that returns the number of bytes in a varchar argument. The implementation includes performance optimizations for MIN(varchar) and MAX(varchar) group-by functions by using prefix-first comparison with 6-byte prefixes stored in aux memory, and 8-bytes-at-a-time comparison using longAt() instead of byte-by-byte access when prefixes match.
This feature introduces TICK, a powerful syntax for expressing complex temporal intervals in QuestDB. It enables concise specification of multiple disjoint time intervals with timezone awareness in a single expression. Key capabilities include bracket expansion for dates and times, range expansion with inclusive ranges, timezone support with DST awareness, day-of-week filtering, multi-unit duration specifications, ISO week date format support, dynamic date variables with arithmetic, and Cartesian product combinations. TICK supports features like workday filtering, per-element timezones, business day arithmetic, and complex scheduling patterns.
This feature enables pushing timestamp predicates through virtual models where the timestamp column is derived from a dateadd() function, allowing partition pruning to work even when queries transform the timestamp using dateadd. The implementation detects dateadd(unit, constant, timestamp) patterns in SELECT clauses and annotates models with offset info, wraps pushed predicates in and_offset(predicate, unit, offset) during optimization, and supports chained dateadd through multiple nested subquery levels. Auto-detection of timestamp columns from dateadd expressions works without requiring explicit timestamp(ts) clauses.
This feature implements the SQL standard PERCENT_RANK() window function that returns the relative rank of the current row calculated as (rank - 1) / (total_rows - 1). The function returns 0 if there is only one row in the partition. The implementation includes three specialized classes to handle different use cases: no ORDER BY clause (returns 0 for all rows), ORDER BY without PARTITION BY (uses two passes to compute ranks), and both PARTITION BY and ORDER BY (tracks per-partition row counts).
This feature extends the TICK (Temporal Interval Calendar Kit) date variable arithmetic to support additional time units beyond days and business days, including years (y), months (M), weeks (w), hours (h), minutes (m), seconds (s), milliseconds (T), microseconds (u), and nanoseconds (n). Date variable expressions can now be written with or without brackets, making syntax more flexible. Calendar-aware units properly handle varying month lengths and leap years, while sub-day units preserve full microsecond/nanosecond precision when used with $now. This enables precise timestamp arithmetic in interval expressions for high-frequency data analysis.

Improvements

This improvement reduces contention across concurrent queries by avoiding unnecessary partition closures when partitions are in standby for lazy open and by performing mmap operations outside of synchronized blocks in MmapCache. These changes eliminate bottlenecks that could impact query performance under high concurrency scenarios.
This improvement optimizes queries like SELECT dateadd('m', -15, max(ts)) FROM t where MAX() aggregate function is used as a function argument. The optimization moves rewriteSingleFirstLastGroupBy after rewriteSelectClause to enable better query execution plans for timestamp aggregation functions.
This improvement optimizes Parquet partition read performance through four key enhancements: reduces memory allocations during page decoding by eliminating intermediate copies for fixed-width columns and batching memory operations; switches default compression from ZSTD(level=9) to LZ4_RAW for faster decompression at the cost of slightly larger files; uses REQUIRED repetition for non-null Symbol columns to skip definition-level decoding; and skips redundant decode operations on ParquetBuffers cache hits which particularly benefits ASOF JOIN scenarios. Combined, these changes deliver approximately 6x read performance improvement compared to the previous implementation, with query times improving from ~17 seconds to ~3.56 seconds in benchmark tests.
This feature adds streaming outputs support and unifies generate/fix/chat/schema flows in a single interface. The enhancement includes a retry mechanism for failed requests, fixes rerendering issues in the chat window, adds abort signals to model calls, and provides a get_table_details tool for investigating tables() query results. Additionally upgrades Monaco Editor components and resolves content display issues in LiteEditor.

Bug Fixes

This fix resolves an issue where Influx Line Protocol data would be written to the wrong table when a table was renamed while keeping the connection open. The problem occurred because table references were cached in the state of each HTTP connection. The fix introduces a global version counter for renames that triggers cache flushing, WAL rollback, and error return for retry when a rename is detected, preserving request atomicity.
This fix resolves a crash when using window joins with large slave tables. The issue occurred because AsyncWindowJoinRecordCursor.close() was freeing slaveTimeFrameAddressCache before waiting for worker threads to finish, causing workers to access freed memory and trigger segmentation faults. The fix reorders operations to await workers before freeing shared resources and wraps cleanup in try-finally blocks to ensure resources are always freed.
This fix prevents segmentation faults that could occur when a timeout happened during vectorized parallel GROUP BY queries. The issue was caused by frame memory pools used by worker threads being released without properly waiting for all tasks to be processed, creating a race condition where workers could access freed memory concurrently with the aggregation process.
This fix resolves an issue where SQL line comments containing unbalanced single quotes (e.g., -- magic ') would cause the lexer to incorrectly enter string-parsing mode and consume tokens past the newline, resulting in subsequent SQL tokens being lost from the query. The fix improves the line comment parsing logic to properly detect newlines within tokens that span multiple lines and repositions the lexer correctly to reset its state.
This fix resolves failures in GROUP BY queries with many aggregate functions and includes significant performance optimizations for SQL compilation. Added SimpleGroupByFunctionUpdater as a loop-based fallback that avoids JVM bytecode limits when aggregate function count exceeds 32. Optimized alias generation from O(n²) to O(1) amortized complexity using LowerCaseCharSequenceIntHashMap to track sequence numbers. Improved duplicate aggregate detection from O(n²) to O(n) using hash-based comparison with ExpressionNode.deepHashCode(). Query compilation time for queries with 6K aggregate functions improved from 4 seconds to around 300ms.
This fix resolves an issue where symbols defined with DECLARE statements were not properly recognized when used as function arguments during SQL parsing.
This fix resolves Parquet read failures that occurred in window chain join queries. The issue was caused by chained window joins incorrectly using GenericRecordMetadata.copyOf() instead of the required GenericRecordMetadata.deepCopyOf() method.
This fix resolves a bug where WINDOW JOIN queries using aliased columns in expressions with aggregates would fail with an 'Invalid column' error. The issue occurred when the same column was both aliased and used in an expression with aggregates. The fix ensures proper column resolution by passing the correct model to aggregate processing and handling the special case where translating and group-by models are the same in window joins.
This fix resolves an issue where executing multiple PostgreSQL Wire Protocol commands with the same SQL but different parameters caused the second command to return corrupted data. The problem occurred because msgBindCopySelectFormatCodes() incorrectly used factory != null to determine if format codes should be set, but when a prepared statement was reused via copyIfExecuted(), the new entry had factory=null, causing format codes to be skipped and clients to misinterpret binary/text format. The fix replaces the factory check with hasResultSet() which uses sqlType for proper determination.
This fix resolves a bug where ORDER BY <position> failed with "Invalid column" error when used with a CTE containing aggregation and a window function expression. The issue occurred because the position-to-name resolution logic in SqlOptimiser.rewriteOrderByPosition() only checked for GROUP_BY models when determining which model's columns to use, but with window functions the model structure includes a WINDOW model between VIRTUAL and GROUP_BY models. The fix extends the condition to also check for SELECT_MODEL_WINDOW type.
This fix resolves Parquet export compatibility issues with strict readers like PyArrow and Trino for SYMBOL columns spanning multiple row groups. The fix generates a single dictionary page per column chunk instead of incorrectly writing multiple pages, and corrects RLE/bitpack encoding padding to ensure proper format compliance.

January 19, 2026

January 14, 2026

QuestDB 9.3.1 follows the major 9.3.0 release, focusing on stability, correctness, and performance refinements based on early feedback and production usage. This release delivers important fixes across joins, views, and checkpointing, alongside continued performance improvements on hot SQL execution paths.

New Features

This feature implements KSUM() as a window function using the Kahan summation algorithm for improved floating-point precision. This complements the existing KSUM() aggregate function by enabling its use in window contexts and supports all standard window frame types including ROWS, RANGE, partitioned, unbounded, and sliding windows. The function can be used for whole partitions, cumulative sums, sliding windows, and range-based windows.
This feature enables window functions to participate in arithmetic expressions and other operations. Previously, window functions could only appear as standalone SELECT columns. Window function parsing has been moved from SqlParser to ExpressionParser, allowing window functions to be parsed as part of the expression tree. The implementation maintains zero-GC compliance and includes proper nested window function detection to reject invalid contexts while allowing valid use cases like CASE expressions.
This feature exposes the actual data timestamp range in each table through new columns in the tables() function. The table_min_timestamp and table_max_timestamp values are updated when WAL transactions are merged into tables and during startup hydration. Additionally, table_max_timestamp was renamed to table_last_write_timestamp for clarity, and dedup_row_count_since_start was renamed to wal_dedup_row_count_since_start for consistency.

Improvements

This improvement implements streaming Parquet export for HTTP queries, eliminating temporary table/files and improving performance. The feature exports directly from PageFrames when supportsPageFrameCursor() is true, with fallback to creating temp table then streaming export when PageFrameCursor is not supported. Currently available for HTTP /exp endpoint only, this change provides modest performance improvements and enables future parallelization opportunities for row group generation.
This improvement significantly reduces GC pressure and jitter in parallel queries through several key optimizations. The PageFrameAddressCache now uses flat DirectLongList structures instead of nested objects, reducing ~20K object allocations per 1000 frames to just 4 contiguous off-heap lists. Decimal flyweights in map values are lazily initialized to avoid allocations when unused. The GroupByAllocator replaces LongLongHashMap with off-heap DirectLongLongHashMap to prevent per-worker allocations. Page frame size calculations are improved to avoid tiny trailing frames, and work stealing strategy is tuned with a new cairo.sql.parallel.work.stealing.spin.timeout configuration (default 50µs). Benchmarks show 67% fewer GC events, 57% less memory usage, and significantly reduced GC pause times.
This improvement materializes functions within the same-level queryModel that are referenced by other projection columns, eliminating redundant expression evaluations and improving query performance when columns depend on other computed columns.

Bug Fixes

This fix resolves the left side of time series join doesn't have ASC timestamp order error that occurred when ORDER BY timestamp DESC was used in a WINDOW JOIN query.
This fix properly handles tables created after a checkpoint was taken during checkpoint restore. Previously, table directories created after the checkpoint would remain on disk, potentially causing inconsistencies or orphaned data. The solution iterates over all table directories in the database root after recovering checkpoint contents, compares each directory against the checkpoint contents, and removes any table directories that do not exist in the checkpoint.
This fix resolves two race conditions in view state management. The first issue occurred when multiple ViewCompilerJob tasks ran concurrently, allowing a task with an older timestamp to overwrite newer, correct view state. A timestamp check was added to updateViewState() to skip updates when the update timestamp is older than the current view state timestamp. The second issue involved views being created with empty metadata that was hydrated before asynchronous compilation completed, causing queries to see empty designatedTimestamp in tables(). This fix passes compiled RecordMetadata directly to createViewState() so proper column information is available immediately.
This fix resolves an issue where views became suspended after being altered due to missing metadata. The fix ensures metadata is properly restored after view alterations, preventing the suspended state that would make views unusable.
This fix addresses an edge case in materialized view column merging where fixed columns could be corrupted during merge operations. The fix ensures data is copied to the correct memory location (end of old data + column tops) rather than to the end of newly calculated data size, preventing potential data corruption in materialized views.
This fix resolves an issue where the dedup logic would leave behind unused partition directories from O3 merge preparation when appending data directly to existing partitions. The Partition Purge job would then incorrectly count these orphaned directories as the next valid partition version, leading to incorrect partition version tracking and subsequent file not found errors.
This fix resolves UnsupportedOperationException crashes in ASOF JOIN queries when using the Light join factory with multiple join keys including a symbol column. The single-symbol optimization was incorrectly applied when there were multiple join keys, creating a map that only supported INT keys while the record copier attempted to write all column types. In some cases this could also silently return incorrect results.
This fix resolves a race condition where jemalloc background threads were failing to start reliably (~20% failure rate) when QuestDB was launched via questdb.sh, preventing memory from being released back to the OS properly. The solution switches to QuestDB's own jemalloc fork and updates to the latest dev branch commit. Additionally, LD_PRELOAD is now used to load jemalloc only in the Java process rather than the bash script.
This fix removes unwanted scrolling behavior on wheel events from code blocks and assistant suggestions, adds "Open in editor" action to all overflown LiteEditor instances, prevents unnecessary rerenders in assistant response markdown preview, and resolves a bug with context badge clicks where highlighting would fail on removed or modified context queries.
This fix resolves a flickering issue when hovering over news thumbnail images in the right panel. The zoom magnification was rapidly appearing and disappearing due to hover state issues caused by the hoverTimeout variable resetting on every re-render. The fix uses useRef for hover timeout persistence across re-renders and tracks imageToZoom state with a ref to prevent dismissal when zoom is already visible. Additionally, click-to-dismiss functionality was added on both the overlay and zoomed image for better user experience.
This fix addresses a regression where HTML entities like  , <, and > displayed as literal text instead of their corresponding characters in grid cells. An unescapeHtml function was added to decode HTML entities to their actual characters before setting textContent, restoring proper entity rendering while maintaining XSS protection that was implemented in the previous security fix.

January 9, 2026

QuestDB 9.3.0 is now available, bringing a new wave of query expressiveness and usability improvements across the engine and the Web Console. This release introduces window joins for precise time-based analytics, database views for cleaner query composition, AI-assisted workflows in the web console, and the new PIVOT keyword for effortless wide-schema aggregations.

New Features

This feature introduces a new WINDOW JOIN syntax that enables time-based aggregations by joining each row from the left table with matching rows from the right table within a specified time window. The syntax supports both INCLUDE PREVAILING (default) and EXCLUDE PREVAILING clauses, where the former includes the latest right-hand table row before the interval. The implementation uses SIMD instructions for symbol-based joins and supports multi-threaded execution for large datasets. Window joins can be chained in the same query but cannot be mixed with other join types at the same query level.
This feature refactors the glob() function to use native glob matching instead of Java regex, providing compatibility with DuckDB's glob syntax. The implementation supports standard glob wildcards including * for any number of characters, ** for subdirectories, ? for single characters, [abc] for character sets, and [a-z] for character ranges.
This feature improves table drop performance by closing idle WAL writers faster, which enables quicker deletion of WAL and table files from disk. Previously, dropped tables would generate repeated logging about pending WAL segments, but now the cleanup process completes more efficiently.
This feature enables using VARCHAR array bind variables with symbol, varchar, and string columns in IN expressions through PostgreSQL Wire Protocol. This allows reusing prepared statements with different value lists, reducing query compilation overhead for applications that frequently execute similar queries with varying parameter sets.
This feature pushes column projection down to the Parquet decoder, reading only required columns instead of all columns. The optimization also shares Parquet metadata across threads for multi-threaded reads, parsing metadata once per file. Performance improvements are dramatic - ClickBench Q1 execution time improved from 712 seconds to 787ms with multithreading enabled.
This feature adds real-time table write statistics tracking through new columns in the tables() SQL function. New columns include rowCount (approximate row count), pendingRowCount (WAL rows not yet applied), dedupeRowCount (cumulative deduplicated rows), lastWriteTimestamp (last commit timestamp), writerTxn and sequencerTxn (transaction numbers), memoryPressureLevel (0-2 pressure indicator), and various histogram statistics for transaction sizes, write amplification, and merge throughput. The feature uses a new RecentWriteTracker with lock-free reads and bounded memory, supporting both WAL and non-WAL tables with different column semantics. Statistics are approximations updated when writers return to pool, with WAL tracking being real-time and LRU eviction maintaining memory bounds.
This feature introduces the PIVOT keyword, which is a specialized GROUP BY query that helps group a selection of rows into columns, essentially pivoting from a narrow schema to a wide schema. The implementation supports single and multiple columns, per-column aliasing, FOR-IN expressions with single or multiple expressions, GROUP BY, ORDER BY, LIMIT, and dynamic IN lists using subqueries. The feature also includes UNPIVOT functionality to reverse simple pivot operations.
This feature introduces support for database views, which are virtual tables defined by a SELECT statement. Views do not store data themselves; instead, their defining query is executed as a sub-query whenever the view is referenced. The implementation includes CREATE VIEW, DROP VIEW, ALTER VIEW, CREATE OR REPLACE VIEW, COMPILE VIEW, SHOW CREATE VIEW, and views() commands. Views automatically recompile when operations occur on their dependencies and support metadata persistence, state management, and integration with the Web Console.
This feature introduces a comprehensive AI Assistant with conversational interface in the right sidebar, enabling users to interact with AI models (Anthropic Claude & OpenAI GPT) for SQL assistance. The assistant provides query explanations, automatic error fixing, schema explanations, and SQL generation from natural language. It includes Monaco Editor integration with glyph margin icons, inline diff view for reviewing AI suggestions, and persistent chat history stored in IndexedDB. The feature supports multiple AI providers with configurable models and includes token usage tracking and conversation compaction for context management.

Improvements

This improvement prevents LimitRecordCursorFactory.toPlan() from calling the expensive calculateSize() method during EXPLAIN queries, which previously executed the complete query. The optimization also avoids unnecessary calculateSize() calls in hasNext() when LIMIT bounds are non-negative and in size() when the size isn't cheaply available. Additionally, this improvement clarifies LIMIT semantics to align with documentation and fixes related bugs in cursor classes.
This improvement introduces comprehensive optimizations to JIT-compiled filter execution including short-circuit evaluation for scalar AND/OR predicate chains, automatic predicate reordering by estimated selectivity, and multiple caching mechanisms to reduce redundant memory loads. The optimization includes SIMD scatter short-circuiting that skips expensive operations when no matches are found in a batch. Performance improvements range from 23-69% faster execution depending on the query pattern, with the biggest gains seen in highly selective filters.
This improvement implements a fast-path for count-only queries that avoids materializing matching rows by incrementing a counter instead. The optimization eliminates row ID scatter logic, memory writes, and conditional branches in the hot loop, reducing loop instructions by ~70%. Performance improvements range from 60-92% for queries with many matching rows, with the biggest gains for NEQ predicates that match large result sets.
This improvement enables parallel processing of the top-K selection phase when GROUP BY results are sharded due to high cardinality. The optimization processes each shard in parallel using worker threads, then merges results. New configuration properties cairo.sql.parallel.groupby.topk.threshold (default 5M) and cairo.sql.parallel.groupby.topk.queue.capacity control when parallel execution is enabled. ClickBench queries show speedups ranging from 1.3x to 11.2x for the ORDER BY + LIMIT phase.
This improvement optimizes non-keyed GROUP BY queries with MIN() or MAX() functions over designated timestamp columns by using O(1) operations instead of SIMD native functions. The optimization takes the first/last value directly, resulting in significant performance gains for queries like SELECT MIN(EventTime), MAX(EventTime) FROM table.

Bug Fixes

This fix addresses a crash that occurred when calling touch() on a table with newly added columns. When a new column is added to a table, it is initially empty and uninitialised, causing frame.getPageAddress to return a null pointer address. The touch() function did not check for this possibility and would attempt to use the null pointer, resulting in a segmentation fault. This fix adds proper null pointer checks for both columns and indexes to prevent crashes.
This fix adds support for the SHOW default_transaction_read_only metadata query, which is required by npgsql when building data sources with multiple hosts. Without this query support, connection sources would fail to initialise when using PostgreSQL Wire Protocol clients that depend on this metadata.
This fix resolves an issue where the tables() function would display an incorrect designated timestamp column when a column type change occurred on any column positioned before the designated timestamp column in the table schema. The function now correctly identifies and displays the designated timestamp column regardless of schema modifications.
This fix resolves a limitation that prevented copying data between tables with thousands of columns (6000+) due to Java bytecode method size restrictions. The solution introduces multiple copy strategies: a looping variant for very wide tables, and a chunked variant that splits large copy operations into smaller sub-methods to maintain optimal performance. The chunked approach allows each sub-method to be individually optimized by the C2 compiler, significantly improving performance for wide table operations while maintaining compatibility with extremely wide schemas.
This fix resolves an issue where nested SAMPLE BY queries with an ORDER BY clause incorrectly failed with "ASC order over TIMESTAMP column is required but not provided". The problem occurred when SAMPLE BY pushed timestampRequired=true onto a context stack, which propagated to nested models even when an ORDER BY clause would re-sort the data anyway. This fix ensures that when an ORDER BY clause is present, timestampRequired=false is pushed to signal that nested models don't need to maintain timestamp ordering.
This fix resolves an issue where EXPLAIN UPDATE on WAL tables was throwing UnsupportedOperationException because the anonymous RecordCursorFactory created during WAL serialization did not implement getCursor(). The fix replaces the anonymous factory with EmptyTableRecordCursorFactory and improves the explain plan output to show table names when available.
This fix resolves issues when using quoted column names containing dots in JOIN operations, ensuring proper metadata handling for such column references.
This fix resolves a logic bug that caused each inserted row during Parquet export processes to be committed individually as single-row commits, significantly impacting performance during data copying operations.
This fix improves native library loading by auto-detecting native libraries in jlink-ed runtime images. The improvement uses java.home to locate native library directories and only activates when running from jrt: protocol (jlink runtime), eliminating the need for -Dquestdb.libs.dir in start scripts.
This fix addresses an overflow issue when incrementing quotient digits during decimal division. The increment loop in DecimalKnuthDivider.java was updated to use the actual length of the q array, preventing possible misses when incrementing quotient digits during rounding and improving the reliability and correctness of the decimal division logic.
This fix resolves an issue where VARCHAR columns larger than the PostgreSQL Wire Protocol send buffer would fail, making behavior consistent with array columns which already supported fragmented sending. Additionally, this fix addresses a latent bug where the partial-send array offset wasn't being reset when a row couldn't fit in the buffer and had to be abandoned via resetIncompleteRecord(), which would create malformed packets on retry.
Query keys are in form: ${queryText}@${startOffset}-${endOffset}. With DECLARE statements, the Web Console was parsing queries incorrectly, causing the AI chat window to open with wrong information and the editor to shift query keys incorrectly.
This fix prevents execution of foreign code inside the Web Console through malicious queries containing HTML iframe elements with embedded scripts.
This fix adds functionality to abort queries in progress while the query body is being consumed. For running query and running all queries in a tab, it aborts the ongoing process and starts the new one if a new operation comes before the old one is completed. It also prevents the editor from inferring '\t' as a valid query start character by treating it the same as ' '.
This fix adds exponential backoff retry with 5 attempts and delays of 1s, 2s, 4s, 8s, 16s (capped at 30s) when POSTing telemetry to fara.questdb.io. The hourly telemetry loop now continues even after max retries are exhausted, and if the /config endpoint fails during startup, the loop still starts and retries the next hour. Previously, a single failed POST would permanently stop the telemetry loop until the user refreshed the page, causing telemetry gaps during transient network issues or brief outages.

December 18, 2025

This release focuses primarily on bug fixes and performance improvements for replication functionality. Key fixes include resolving suspended table issues on replicas, optimizing segment collection at startup, and addressing metadata file corruption after materialized view recreation. Additional improvements were made to SQL query processing and session management stability.

Improvements

This improvement enhances startup performance by modifying the directory walking logic to skip partition traversal and moving the missed segment collection to run asynchronously as part of each table uploader's logic. The change eliminates startup blocking and enables parallel directory traversal across multiple table uploaders.

Bug Fixes

This fix resolves a race condition where the _event file was copied to the segment directory before the _event.i file during replication. When WalEventReader attempted to map the _event.i file using the max transactions value from the already-updated _event file, it could result in reading beyond the file size via mmap, causing a SIGBUS signal and subsequent java.lang.InternalError. The fix ensures that _event.i file is copied before the _event file to maintain proper synchronization.
This fix ensures that outer LIMIT clauses are properly applied even when the subquery contains both a filter condition and its own LIMIT clause, preventing incorrect query result truncation.
This fix resolves use-after-free bugs in HttpSessionStore caused by OidcUserInfoHolder containing CharSequences backed by parser buffers with limited lifetime. The fix uses immutable String copies from CachedUserInfo instead of passing parser buffer references directly, preventing memory corruption when parser state is reused.

QuestDB 9.2.3 is a patch release bringing the usual swathe of fixes, performance enhancements, and a few new features. You'll see higher QPS, faster GROUP BY queries, and better performance for high-contention materialized views. There will be one more release before Christmas, bringing the new WINDOW JOIN syntax.

New Features

This feature provides users with actionable hints when histogram out of bounds errors occur during APPROX_MEDIAN() queries. The error messages now suggest using a lower precision value or normalizing values to fit within the allowed range, helping users resolve issues when inserted values exceed the histogram's default precision limits.
This feature prevents accidental data loss when future timestamps are inserted by using min(maxTimestamp, wallClockTime) for TTL enforcement instead of just maxTimestamp. A new configuration option cairo.ttl.use.wall.clock (default: true) controls this behavior. Additionally optimizes WAL commit performance by reducing microsecond clock syscalls from 2-3 to 1 per commit.

Improvements

This improvement restricts Unordered4Map and Unordered8Map usage to single-column cases to avoid false sharing issues with jemalloc. The change removes Unordered2Map entirely as it showed no performance benefit. This optimization significantly reduces execution times for certain ClickBench queries when jemalloc is in use.
This improvement enhances materialized view performance by intelligently skipping transactions that are fully replaced by future commits or full refresh operations, reducing unnecessary write operations.
This improvement enables jemalloc by default in Linux x86_64 binary distributions, providing significant performance boosts for allocation-heavy queries such as parallel SAMPLE BY and GROUP BY operations. The improvement also fixes the built-in jemalloc binary which was previously built with the je_ prefix in exported symbols, preventing proper memory allocator replacement. The built-in jemalloc binary is now limited to Linux only, and questdb.sh will return an error when QDB_JEMALLOC=true is set on other operating systems.
This improvement significantly reduces SQL expression parsing latency across various expression types. Performance improvements range from approximately 3x faster for simple expressions like a + b (from 2967ns to 991ns) to over 3x faster for complex expressions like window functions with OVER clauses (from 3467ns to 1232ns). Complex case statements, casting operations, and conditional expressions also see substantial performance gains.
This improvement removes redundant division operations from the AVG() function in parallel non-keyed GROUP BY queries. The native avg*Acc() functions were using avg += (v - avg) / c in each iteration while the outer Avg*VectorAggregateFunction Java function was multiplying the average and count before storing them in partial results. Benchmarks show performance improvements of approximately 18% for simple averaging queries.

Bug Fixes

This fix resolves a race condition in PartitionPurgeJob that could cause transient query failures with partition directory errors. The issue occurred when the purge job detected partition versions that were temporarily created by TableWriter for copy-on-write operations but then removed before commit when deduplication determined the operation wasn't needed. The fix ensures that the next partition version exists before deleting overwritten partition versions, preventing phantom partitions from being counted as valid.
This fix resolves segmentation faults in parallel ORDER BY with LIMIT queries that involve JIT-compiled filters. The crash was caused by missing bind variable memory initialization and occurred both with explicit bind variables and when filters involved non-existing symbols. As a workaround, users can set cairo.sql.parallel.topk.enabled=false to disable the problematic execution plan.
This fix removes the unix line-ending configuration that was incorrectly changing byte sequences in the Web Console package downloaded from NPM, which was causing broken assets. The exclusion workaround is no longer needed.
This fix resolves an issue where Parquet exports would fail when the SQL query contained the NOW() function.
This fix resolves an issue where queries with LIMIT clauses would incorrectly return empty result sets.
This fix resolves a crash that occurs when ordering by a symbol column that was added via ALTER TABLE ADD COLUMN.
This fix resolves inefficient refresh queries when materialized views use small sampling intervals (e.g., microsecond) on tables with sparse data distribution. The improved estimation ensures optimal batch sizes for incremental refresh.
This fix resolves a data corruption issue that could occur during range replace transactions on partitions with column tops. The bug affected the O3OpenColumnJob#merge*Column() methods which could overwrite data in the source partition, making it corrupted. The issue was specific to range replace operations combined with column tops, which only occurs in materialized view scenarios.
This fix resolves incorrect content length headers in HTTP JSON responses that could cause client parsing issues or connection problems.
This fix prevents connection contexts from being closed and pooled multiple times when TLS session initialization fails, which could cause issues with subsequent connections.
This fix prevents unexpected exceptions that could occur when the schema of a Parquet file changes between reads, ensuring more robust file processing.
This fix resolves an issue where using SAMPLE BY ... FILL(NULL) on tables containing decimal columns and then filtering for NULL values in the outer query would throw UnsupportedOperationException: Record.getDecimal128(). The issue occurred because FillRangeRecordCursorFactory.FillRangeRecord and SelectedRecord classes were missing implementations for the getDecimal*() methods. This fix adds support for all decimal types in both classes, enabling proper gap filling with null values for decimal columns and correct delegation of decimal column access to the base record.
This fix resolves an issue where queries with multiple LIMIT clauses would ignore the outer LIMIT when one of them was in a subquery. The problem was tracked by a boolean flag on the model that could not handle multiple LIMIT clauses properly. This fix moves the tracking to the node expression itself, allowing each LIMIT to be tracked independently by the node that represents it in the model tree. The redundant followedLimitAdvice() method was also removed from RecordCursorFactory as it served the same purpose as implementsLimit().
This fix ensures that when adding a symbol column via ALTER TABLE ADD COLUMN to a table with existing data, the symbol table's nullValue flag is properly set to true since all existing rows will have null values for the new column.
This fix resolves an issue where attempting to update a TIMESTAMP column using bind variables would fail with the error "bind variable cannot be used [contextType=unknown, index=1]". The UPDATE statement now correctly handles TIMESTAMP columns when bind variables are used.
This fix resolves an issue where queries with aggregate functions inside CASE expressions were failing with "Aggregate function cannot be passed as an argument". The root cause was that checkForChildAggregates() in SqlOptimiser only traversed lhs and rhs children of expression nodes, missing aggregate functions in CASE expressions which store arguments in node.args. The optimizer now properly handles aggregate functions nested inside expressions with 3 or more arguments.

December 1, 2025

QuestDB Enterprise 3.1.1 combines the latest advancements from OSS releases 9.1 and 9.2, bringing powerful new data types, nanosecond-precision time-series support, and significant query performance improvements. This release introduces native DECIMAL support with up to 76-digit precision, nanosecond timestamps, and delivers substantial performance gains through optimized ASOF JOINs.

Breaking Changes

This change enables multi-threaded execution for SAMPLE BY queries that combine timestamp and aggregate functions in their selected expressions. The improvement also optimizes execution plan selection for SAMPLE BY and GROUP BY queries by avoiding duplicate group keys and reusing existing ones. However, this introduces a breaking change in how designated timestamps are handled when selected multiple times under different aliases - the query engine now picks the first ordered column, with preference given to columns used in ORDER BY clauses or unaliased columns matching the designated timestamp name.
This fix resolves data inconsistency issues in chained materialized views with period refresh, where dependent views could contain rows no longer present in their base views. The change includes a breaking modification where MANUAL PERIOD views are no longer automatically refreshed when periods end - they must now be refreshed explicitly using REFRESH MATERIALIZED VIEW SQL, consistent with other MANUAL views.
This change removes transaction scoreboard v1 implementation. Deployments previously configured with cairo.txn.scoreboard.format=1 will continue to function, but the configuration option is now ignored and the server will automatically use scoreboard v2 instead.

New Features

This feature integrates the DECIMAL type into QuestDB, making it available through SQL, Influx Line Protocol, and PostgreSQL Wire Protocol. The DECIMAL type supports precision up to 76 digits using the syntax DECIMAL(precision, scale) where precision defines the maximum count of digits and scale defines the fractional part digits. This feature includes basic arithmetic operations (+, -, /, *, %), comparison operators, casting functions, aggregate functions (min, max, avg, sum, count), and support for GROUP BY, ORDER BY, SAMPLE BY, and CSV import/export. Unlike traditional numeric types, operations that cannot provide accurate values will throw an error, except for division which returns a HALF_DOWN rounded quotient.
This feature introduces nanosecond precision timestamp support through the TIMESTAMP_NS type. The implementation includes core nanosecond arithmetic and parser/formatter functionality, with the timestamp resolution carried by the data type similar to GEOHASH types. This feature supports SAMPLE BY, ASOF JOIN, materialized views, PARTITION BY, O3 operations, Parquet file writing, CSV text import, and integration with Influx Line Protocol clients for Java, C, and Python. The partition metadata includes timestamp type information, with legacy code defaulting to microsecond precision.
This function generates high-cardinality symbols with skewed distribution patterns, making it useful for testing algorithms that work with realistic symbol distributions where some values appear much more frequently than others.
This function provides array aggregation capabilities by returning the last array value in a group. This feature utilizes the specialized allocator for aggregate functions introduced in previous versions.
This function returns the most frequent value in each group, eliminating the need for complex subqueries involving count() operations and sorting. The implementation supports BOOLEAN, LONG, VARCHAR, STRING, SYMBOL, and DOUBLE data types with parallel execution capabilities. It uses a map-of-maps structure where outer maps store pointers to inner maps that aggregate per-key counts, which are then merged and scanned for the most frequent value. This feature is useful for identifying patterns like the most frequently traded symbol by a trader or the most common sensor state.
This feature adds weighted_avg(), weighted_stddev_rel(), weighted_stddev_freq() functions with an alias weighted_stddev() for weighted_stddev_rel(). According to Wikipedia, there are two variants for the unbiased estimator of population variance: one for frequency weights and another for reliability weights. A frequency weight represents the number of occurrences of the sample in the dataset, while a reliability weight represents the "importance" or "trustworthiness" of the given sample. The implementation provides both functions, with weighted_stddev_rel for reliability weights and weighted_stddev_freq for frequency weights.
This feature introduces Right Outer Join and Full Outer Join functionality, expanding the range of join operations supported. For HashLightInnerJoin and HashLightFullOuterJoin, the implementation now supports limited runtime reordering of left and right cursor, allowing the join to dynamically choose the more efficient build/probe table combination in certain scenarios. Right Outer Join provides users with an alternative when Left Outer Join performance is unsatisfactory.

Improvements

This improvement introduces the Dense ASOF JOIN algorithm, a variant of the Light algorithm that uses binary search to quickly jump to the timestamp matching the first left-hand row instead of starting the scan at the top. This difference is highly important when the right-hand table has history that predates the first left-hand row, as the entire history will be skipped except for recent history needed to find matches of initial few left-hand rows. The optimization also enhances symbol-to-symbol joins for existing Light and Fast algorithms by copying just the symbol key instead of the symbol string through the RecordSink.
This improvement leverages the symbol index on the right-hand side of ASOF JOIN, avoiding the need to visit any rows that don't have the matching symbol. The major use case is markout analysis, where dealing with illiquid assets having their last price posted way back in the past would previously require scanning all rows backwards until finding a match. The optimization makes only one symbol index lookup per partition instead of scanning the full historical data for all symbols.
This improvement optimizes keyed ASOF JOIN operations on non-indexed symbol columns by implementing a memoization strategy. When searching for matching rows in the right-hand side table, the algorithm now remembers the location of each symbol it encounters and reuses that knowledge for future searches. It also remembers where symbols were not found, allowing it to skip already-scanned regions. This optimization is particularly effective for scenarios with low-frequency symbols where the same symbol appears rarely in the right-hand side table, reducing redundant linear scans from taking indefinite time to completing in seconds.
This improvement refines the memoized scan algorithm for ASOF JOIN on symbol columns by making drive-by caching optional through a query hint. While match caching (remembering found symbol locations) remains active by default with minimal overhead, drive-by caching (storing all symbols encountered during searches) is now behind a query hint due to its hash lookup overhead. This change improves performance in typical cases where drive-by caching doesn't provide sufficient benefit to offset its computational cost.
This improvement introduces deferred merge in parallel COUNT_DISTINCT() functions for int, ipv4, long, uuid, and long256 types. Instead of merging hash sets immediately when merging GROUP BY hash tables, the merge is deferred until the getLong() call. This avoids redundant hash set merges in queries with ORDER BY and LIMIT clauses where not all groups need their distinct counts calculated. Additional optimizations include inlining single keys in MapValue for COUNT_DISTINCT(int) and COUNT_DISTINCT(long), and implementing a faster FxHasher-based hash function with reduced load factor.
This improvement extends the fast-path optimization for ORDER BY with LIMIT queries to work with virtual columns, expressions, and subqueries that previously used VirtualRecordCursorFactory and SelectedRecordCursorFactory. The optimization now supports both LONG and TIMESTAMP columns, works with vectorized GROUP BY, and improves CROSS JOIN performance for small right-hand side tables. This allows queries with expressions, constants, and complex select clauses to benefit from the same performance optimizations as simple column-based ordering.
This improvement includes several changes to enhance query engine scalability on multi-core machines. It fixes MapFragment#totalFunctionCardinality not being updated in all cases in AsyncGroupAtom, which caused sharding stats to fluctuate between enabled and disabled states for queries with high-cardinality COUNT_DISTINCT(). Cardinality statistics in AsyncGroupAtom are now calculated as a sum of recorded group by function cardinalities instead of a maximum value, providing better estimates of merge work required. The configuration properties cairo.page.frame.reduce.queue.capacity and cairo.sql.parallel.groupby.merge.shard.queue.capacity are now set to 4 * cpu_cores with a cap of 256 to improve worker thread utilization for parallel GROUP BY and filter queries. A new cairo.sql.parallel.filter.dispatch.limit config option (set to max(cpu_cores, 32) by default) limits in-flight page frame tasks for LIMIT queries to improve latency. Additionally, padding was added to PerWorkerLocks to avoid false sharing.
This improvement disables column pre-touch in parallel filters by default, as it had negative effects on query execution time in many scenarios despite improving performance in certain cases. The cairo.sql.parallel.filter.pretouch.enabled configuration property is now deprecated and ignored by the server. Column pre-touch can now be enabled per-query using a new SQL hint: SELECT /*+ ENABLE_PRE_TOUCH(trades) */ symbol, side, timestamp FROM trades WHERE symbol = 'BTC-USDT' and timestamp in today();
This improvement introduces a new hash map implementation that maps CharSequence to int without relying on heap allocation or garbage collection. The new implementation replaces the existing CharSequenceIntHashMap used in WalWriter, which previously allocated heap-based String objects causing GC-churn during WAL transaction commits. Benchmarks show approximately 12% performance improvement with almost no memory allocations compared to the previous implementation, along with reduced jitter.
This improvement deprecates the useFastAsOfJoin configuration property and existing SQL hints avoid_asof_binary_search and avoid_lt_binary_search. It introduces new positive selection hints: asof_linear_search for the original algorithm with no optimizations, and asof_index_search for the algorithm that leverages the symbol index. These hints apply to both ASOF and LT JOIN variants.

Bug Fixes

This fix resolves out-of-memory errors that occurred with queries like SELECT COUNT(*) FROM read_parquet('hits.parquet') WHERE AdvEngineID <> 0; when processing parquet files with large row groups (up to 600K rows). The issue was caused by keeping row group buffers around for each page frame reduce task, which combined with lack of projections, easily led to OOM errors.
This fix resolves an issue where replicated WAL segments exceeding 4GiB in size would fail consistency checks. The consistency header previously stored the segment size as a u32, causing failures when segments exceeded this limit. The consistency check now correctly casts the u64 length to a u32, effectively using modulo 2^32, allowing the check to pass for larger segments.
This fix addresses network freeze issues where hundreds of stuck connections could prevent replication progress. Replication uploads and downloads now include configurable timeouts based on a base timeout and minimum throughput parameters. The timeout is recalculated and extended whenever upload or download progress is made. Default configuration includes replication.requests.base.timeout=10s and replication.requests.min.throughput=262144 (256KiB/s). This improvement also resolves error handling when object store operations succeed but are considered failed due to timeouts.
This fix resolves authentication failures that occurred when replica instances reloaded access control lists. The issue occurred because transaction numbers were updated before ACL reload completed, causing other workers to believe ACLs were current while still reloading. Requests for newly applied ACL changes like new users or tokens could fail authentication during this window. The transaction number update now occurs after ACL reload completion, ensuring all requests are processed with fully refreshed ACL models.
This fix resolves a race condition where multiple Influx Line Protocol connections competing to create the same table could encounter access denied errors. The issue occurred when the first connection made a table visible to others before owner permissions were granted, creating a brief window where the creator lacked access. Table creation and permission granting are now performed as a single atomic operation, with permissions granted before exposing the table.
WHERE clause filters like sym_col1 = sym_col2 were mistakenly JIT compiled. The compiled filter function was using internal symbol codes for the equality check which produced incorrect results returned from the query. This fix disables JIT compilation for such filters to ensure correct query results.
This fix addresses two nanosecond precision issues: incorrect nanosecond precision handling when calculating TOLERANCE values for ASOF joins, and automatic timestamp type selection during CSV imports when timestamp columns are explicitly typed.
The LEAD() window function's third parameter (default placeholder value) was not initialized correctly when using bind variables in the position lead(value, 1, ?), causing null reference exceptions. This fix ensures proper initialization of bind variables in this context.
This fix resolves scenarios where unsafe memory access operations in compiled Java code would cause InternalError exceptions during table operations. The issue occurred when the O3PartitionPurgeJob attempted to read transaction data while tables were being processed under high system load, resulting in memory access faults in the TxReader component.
This fix resolves incorrect result set size display in the Web Console when the first partition of a query result exceeded the configured maximum page frame size. The improvement modifies size calculation in page frame cursors to properly account for the remaining size of backing partitions and fixes issues where some factories did not account for rows already consumed from cursors. Additionally resolves an issue in PgAttrDefFunctionFactory where cursors were resetting automatically without calling toTop().
This fix prevents TableReferenceOutOfDateException errors in CheckWalTransactionsJob when tables are dropped while background jobs are running. The issue occurred during concurrent operations where cached query plans became invalid due to table schema changes or table removal, causing async jobs to fail with metadata version mismatches.
Fixed a bug where queries with bound parameters executed in a batch would return incorrect results. When multiple SELECT statements with different parameter values were sent together (pipelining), all queries would incorrectly use the parameter values from the last query in the batch instead of their own values. This fix affects PostgreSQL Wire Protocol clients that use pipelining, such as Npgsql (.NET), where multiple queries are sent before waiting for responses. Each query now correctly uses its own bound parameter values.
Fixed a race condition where multiple clients trying to drop the same non-WAL table simultaneously could cause an internal error. This issue affected only legacy (non-WAL) tables and occurred because the table was unlocked too early, allowing another thread to acquire a lock for a non-existing table and attempt to delete an already-deleted directory.
Fixed protocol corruption that occurred when the server sent array data to PostgreSQL Wire Protocol clients. This fix resolves issues where queries returning array columns could cause communication errors between the server and client.
Enhanced the HTTP header parser to support legacy cookie Expires date formats including HTTP 1.0 era 2-digit year formats and ANSI C asctime formats. This improvement adds fallback parsing patterns for deprecated but still-used date formats like Mon, 20-Oct-25 15:57:56 GMT, ensuring RFC 2616 compliance while maintaining backward compatibility. The enhancement includes automatic year adjustment for 2-digit years and minimal performance impact as legacy patterns are only tried after standard formats fail.

QuestDB 9.2.2 is another stability release, primarily fixing some query snags. With this, we've packaged a performance improvement for vertically-scaled machines, and some new SQL functions.

Breaking Changes

This change removes transaction scoreboard v1. Deployments explicitly configured with cairo.txn.scoreboard.format=1 will continue to function, but the option will be ignored and the server will use scoreboard v2.
This fix resolves inconsistent materialized view data that could be observed on chained views with period refresh, where the dependent view may end up with rows that are no longer present in its base view. Additionally, MANUAL PERIOD views are no longer automatically refreshed when a period ends - the only way to refresh a MANUAL PERIOD view is now via REFRESH MATERIALIZED VIEW SQL.

New Features

These new functions provide weighted statistical calculations with support for both frequency weights (representing number of occurrences) and reliability weights (representing importance or trustworthiness). The WEIGHTED_STDDEV_REL() function calculates standard deviation for reliability weights, while WEIGHTED_STDDEV_FREQ() handles frequency weights. An alias WEIGHTED_STDDEV() is provided for WEIGHTED_STDDEV_REL(). These functions are particularly useful for financial calculations such as markout curve analysis.

Improvements

This improvement enhances query engine scalability through several optimizations: fixes cardinality statistics calculation in AsyncGroupAtom to use sum instead of maximum values, sets queue capacities to 4 * cpu_cores with a cap of 256 for better worker thread utilization, introduces cairo.sql.parallel.filter.dispatch.limit configuration option to improve LIMIT query latency, and adds padding to PerWorkerLocks to avoid false sharing. These changes significantly improve performance for parallel GROUP BY and filter queries on multi-core systems.
This improvement modernizes the Web Console login interface with an updated visual design and enhanced user experience.

Bug Fixes

This fix resolves a bug where queries using IN clauses with INT/LONG columns containing -1 could return incorrect results. The root cause was that the internal HashSet implementation used -1 as a special marker value, which conflicted with actual -1 values in the data, causing incorrect query results.
This fix resolves an issue where multiple clients trying to drop the same non-WAL table could result in an internal error. The problem occurred because CairoEngine unlocked the table too early, allowing another thread to acquire a lock for a non-existing table and attempt to delete a directory that no longer exists. This issue affects only legacy (non-WAL) tables.
This fix resolves protocol corruption that occurred when the server sent array data to PostgreSQL Wire Protocol clients. The issue was particularly noticeable when fetching large result sets containing array columns in batches.
This fix updates how null/default values for decimal types are handled in both TableWriter and WalWriter, switching from hardcoded primitive constants to centralized constants in the Decimals class. This improvement ensures that decimal columns default to null values when not provided in Influx Line Protocol data, improving consistency and maintainability in handling decimal nulls across the system.
This fix resolves internal errors that occurred when the database was too lenient with type checking, particularly affecting window functions with default values, array access operations, and random generation functions. The fix renames ColumnType.isAssignableFrom() to isConvertibleFrom() for clarity and replaces problematic invocations with isBuiltInWideningCast() or isSameOrBuiltInWideningCast(). These changes ensure that type conversion errors now fail gracefully with sensible error messages instead of causing internal failures.
This fix resolves a bug where queries with bound parameters executed in a batch would return incorrect results. When multiple SELECT statements with different parameter values were sent together using PostgreSQL Wire Protocol pipelining, all queries would incorrectly use the parameter values from the last query in the batch instead of their own values. This particularly affected PostgreSQL clients that use pipelining, such as Npgsql (.NET), ensuring each query now correctly uses its own bound parameter values.
This fix ensures that images are properly prefetched when handling connection error retry scenarios in the Web Console, preventing display issues during reconnection attempts.
This fix resolves issues with asset loading when QuestDB is deployed under a custom base path, ensuring that CSS, JavaScript, and other static resources are correctly served relative to the configured base URL.

November 25, 2025

QuestDB Enterprise 3.1.0 combines the latest advancements from OSS releases 9.1 and 9.2, bringing powerful new data types, nanosecond-precision time-series support, and significant query performance improvements. This release introduces native DECIMAL support with up to 76-digit precision, nanosecond timestamps, and delivers substantial performance gains through optimized ASOF JOINs.

Breaking Changes

This improvement enables multi-threaded execution for SAMPLE BY queries that combine timestamp and aggregate functions in selected expressions. The optimization improves execution plan selection for SAMPLE BY and GROUP BY queries by avoiding duplicate group keys and reusing existing ones. When the designated timestamp is selected multiple times under different aliases, the query engine now picks the first ordered column, with preference given to columns used in ORDER BY clauses or unaliased columns matching the designated timestamp name.

New Features

This feature integrates the DECIMAL type into the database, making it available through SQL, Influx Line Protocol, and Parquet. The implementation supports decimals with precision up to 76 digits using the syntax DECIMAL(precision, scale) where precision defines the maximum count of digits and scale defines the fractional part digits. The feature includes basic arithmetic operations (+, -, /, *, %), comparison operators, implicit casting from integer types, and support for aggregate functions like min, max, avg, sum, and count. Unlike traditional floating-point types, operations that cannot provide accurate values will throw an error, except for division which returns a HALF_DOWN rounded quotient. The type is fully integrated across all QuestDB interfaces including CSV import/export, PostgreSQL Wire Protocol, and Web Console.
This feature introduces nanosecond precision timestamp support through the TIMESTAMP_NS data type. The implementation includes core nanosecond arithmetic and parser/formatter functionality, with the timestamp resolution carried by the TIMESTAMP data type similar to GEOHASH types. The feature supports all major QuestDB operations including SAMPLE BY, ASOF JOIN, materialized views, O3 ingestion, partition management, and Parquet file writing. The timestamp type information is stored in table metadata and properly handled across CSV imports, Influx Line Protocol clients (Java, C, and Python), and partition purge operations.
This function generates high-cardinality symbols with skewed distribution patterns, making it particularly useful for testing algorithms that need to handle realistic symbol distributions with varying frequencies.
This function extends array aggregation capabilities by returning the last array value in a group. The implementation utilizes the specialized allocator for aggregate functions introduced in previous versions, providing efficient memory management for array operations.
This function returns the most frequent value in each group, eliminating the need for complex subqueries involving count() operations and sorting. The implementation supports BOOLEAN, LONG, VARCHAR, STRING, SYMBOL, and DOUBLE data types with parallel execution capabilities. It uses a map-of-maps architecture where outer maps store pointers to inner maps that aggregate per-key counts, which are then merged and scanned for the most frequent value. This is useful for scenarios like finding the most frequently traded symbol by a trader or identifying the most common sensor state.
This feature introduces support for RIGHT OUTER JOIN and FULL OUTER JOIN operations, expanding the range of join operations available. For HashLightInnerJoin and HashLightFullOuterJoin, the implementation now supports limited runtime reordering of left and right cursors, allowing the join to dynamically choose the more efficient build/probe table combination when the right table is significantly larger than the left table. This provides users with alternative join strategies when LEFT OUTER JOIN performance is unsatisfactory.

Improvements

This improvement introduces the Dense ASOF JOIN algorithm, a variant of the Light algorithm that uses binary search to quickly jump to the timestamp matching the first left-hand row, rather than starting the scan from the top of the right-hand table. This optimization is particularly effective when the right-hand table contains extensive history that predates the first left-hand row, as the entire history is skipped except for recent data needed to match initial left-hand rows. The implementation also optimizes symbol-to-symbol joins through the RecordSink by copying symbol keys instead of symbol strings, and includes specialized implementations for critical performance scenarios like markout analysis.
This improvement leverages the symbol index on the right-hand side of ASOF JOIN operations, avoiding the need to visit rows that don't have matching symbols. This optimization is particularly beneficial for markout analysis scenarios where illiquid assets may have their last price posted far in the past. Instead of scanning backwards through all rows until finding a match, the implementation performs only one symbol index lookup per partition, dramatically reducing the search space for rare symbols.
This improvement optimizes keyed ASOF JOIN operations on non-indexed symbol columns by implementing memoization for symbol locations. When searching for matching right-hand side rows, the implementation now remembers the location of each symbol encountered during the search and reuses that knowledge for future lookups of the same symbol. It also remembers where symbols were not found, allowing it to skip already-scanned regions of the right-hand side table. This optimization is particularly effective for tables with rare symbols that occur infrequently, eliminating redundant scans when the same low-frequency symbol appears in multiple left-hand side rows.
This improvement introduces selective caching for the Memoized Scan algorithm in ASOF JOIN operations on symbol columns. The optimization maintains Match Caching (caching symbol locations after finding matches) while placing Drive-By Caching (caching all symbols encountered during searches) behind a query hint. This change reduces hash lookup overhead in typical cases where few symbols occur in the deep past of the right-hand side table, resulting in better performance for most ASOF JOIN queries.
This improvement introduces deferred merge strategy for parallel COUNT_DISTINCT() functions supporting int, ipv4, long, uuid and long256 types. Hash set merging now occurs during getLong() calls instead of when merging GROUP BY hash tables, avoiding redundant merges in queries with ORDER BY and LIMIT clauses. The optimization includes single-key inlining for COUNT_DISTINCT(int) and COUNT_DISTINCT(long) functions, and implements a faster FxHasher-based hash function with reduced load factor for better performance.
This improvement extends the fast-path optimization for ORDER BY + LIMIT queries to work with virtual columns, expressions, and VirtualRecordCursorFactory cases that were previously excluded. The optimization now supports TIMESTAMP columns in addition to LONG columns, includes fast-path support for vectorized GROUP BY operations, and improves CROSS JOIN performance for small right-hand side tables. The enhancement enables fast-path execution for queries with virtual columns, subqueries, and complex expressions in the select list.
This improvement disables column pre-touch in parallel filters by default to improve cache efficiency and query performance in most scenarios. The cairo.sql.parallel.filter.pretouch.enabled configuration property is now deprecated and ignored by the server. Column pre-touch can now be enabled per-query using the ENABLE_PRE_TOUCH SQL hint, providing more granular control over when this optimization is applied.
This improvement introduces a new hash map implementation that maps CharSequence to int without relying on heap allocation or garbage collection. The WalWriter implementation is updated to use this new hash map, replacing the existing CharSequenceIntHashMap that allocated heap-based String objects and caused GC-churn during WAL transaction commits. Benchmarks show almost no allocations in the new implementation compared to the previous version, with a 12% performance improvement and reduced jitter.
This improvement deprecates the useFastAsOfJoin configuration property and existing hints avoid_asof_binary_search and avoid_lt_binary_search. The new hints positively select algorithms: asof_linear_search selects the original algorithm with no binary search or index scan optimizations, while asof_index_search requests the algorithm that leverages the symbol index. These hints apply to both ASOF and LT JOIN variants.

Bug Fixes

This fix resolves out-of-memory errors that occurred with queries on large parquet files with substantial row groups. Previously, row group buffers were kept around for each page frame reduce task, causing memory exhaustion when processing files with large row groups (up to 600K rows) combined with lack of projections.
This fix resolves an issue where replicated WAL segments exceeding 4GiB in size failed consistency checks. The consistency header stored the segment size as a u32, causing failures when segments grew beyond this limit. The consistency check now correctly casts the u64 length to a u32, effectively using modulo 2^32, allowing the check to pass for larger segments.
This improvement addresses network freeze issues where stuck connections could prevent progress for the entire QuestDB process. Uploads and downloads now support configurable timeouts driven by a base timeout and minimum throughput parameters (replication.requests.base.timeout=10s and replication.requests.min.throughput=262144 by default). The timeout is recalculated and extended whenever progress is made during upload or download operations, allowing connections to be closed and re-established when they become unresponsive.
This fix resolves authentication failures that occurred when replica instances reloaded access control lists. The issue was that ACL reload updated transaction numbers before the reload actually completed, causing other workers to believe ACLs were up-to-date while they were still reloading. This resulted in requests depending on newly applied ACL changes failing authentication during the reload window. The transaction number update has been moved to the end of ACL reload to ensure all requests are processed after the ACL model is fully refreshed.
This fix resolves a race condition where multiple Influx Line Protocol connections competing to create the same table could encounter access denied errors. When the first connection created a table, it made the table visible to other connections before owner permissions were granted, leaving a brief window where the creator lacked access. This improvement makes table creation and permission granting a single atomic operation by granting permissions before exposing the table.
This fix resolves an issue where WHERE clause filters like sym_col1 = sym_col2 were mistakenly JIT compiled. The compiled filter function was using internal symbol codes for the equality check which produced incorrect query results. This patch disables JIT compilation for such filters to ensure correct behavior.
This fix resolved two nanosecond precision issues: incorrect nanosecond precision handling when calculating TOLERANCE values for ASOF joins, and automatic timestamp type selection during CSV imports when timestamp columns are explicitly typed.
This fix resolved an issue where using a bind variable as the third parameter (default placeholder value) in the LEAD() window function, such as lead(value, 1, ?), was not initialized correctly, leading to null reference exceptions.
This fix resolved scenarios where unsafe memory access operations in compiled Java code caused InternalError exceptions during table operations, particularly affecting transaction reading and partition purge operations under high system stress.
This fix resolved incorrect result set size display in the Web Console when the first partition of a query's result exceeded the configured maximum page frame size. The improvement modified size calculation in page frame cursors to account for remaining partition size and fixed issues where some factories did not account for rows already consumed from the cursor.
This fix resolves TableReferenceOutOfDateException errors that occurred when async jobs attempted to access tables that had been dropped or recreated during concurrent operations. The issue manifested in the CheckWalTransactionsJob when cached query plans referenced outdated table metadata, causing the job to fail when verifying table tokens against tables that no longer existed or had different metadata versions.
This fix adds support for parsing legacy HTTP cookie Expires values that use deprecated date formats from the HTTP 1.0 era and ANSI C asctime formats. The parser now handles 2-digit year formats like Mon, 20-Oct-25 15:57:56 GMT and ANSI C formats like Sun Nov 6 08:49:37 1994. These patterns are added as fallback formats after standard HTTP formats, ensuring backward compatibility while maintaining RFC 2616 compliance for all three recommended HTTP date formats.

This release is a stability and performance update in the QuestDB 9.x line. It focuses on tightening up SQL correctness, improving latency, and hardening edge cases around our ingestion protocol, materialized views, with a few small quality-of-life features.

New Features

This feature extends the FIRST_NOT_NULL() function to support array column types, providing the ability to find the first non-null array value in a dataset. The implementation follows the same pattern as other first/last aggregate functions while optimizing memory usage by including the boolean column type only where strictly needed.
These functions provide file system access and pattern matching capabilities. The files(s) function performs recursive directory scanning, glob(s) accepts glob patterns for file matching, and glob(Ss) applies glob pattern matching to strings as an alternative to regex and LIKE matching. Example usage: SELECT glob('file1.txt', 'fi?e*.txt') or files('/import/trades') WHERE glob(path, '*') LIMIT 3.
This feature introduces new API endpoints /api/v1/sql/validate and /api/v1/sql/execute for SQL validation and execution. The existing /exec API remains supported but is now deprecated. The implementation includes enhanced fuzz testing infrastructure and fixes for the JSON processor, CSV/Parquet export processor, and HTTP client.
This feature introduces new PERIOD (SAMPLE BY INTERVAL) syntax for period materialized views, providing an easier way to write PERIOD (LENGTH 1d TIME ZONE 'Europe/London'). Period materialized views now support second unit ('s') for length and delay intervals, in addition to the existing minutes ('m'), hours ('h'), and days ('d') units. The syntax cannot be combined with SAMPLE BY ... WITH OFFSET ....
This feature allows charts to automatically scale the Y-axis based on actual data values instead of forcing zero as the minimum. This improvement makes small variations in large baseline values clearly visible, which is particularly useful for monitoring scenarios like temperature or pressure readings where the variations are small but significant. The chart now dynamically adjusts to show the actual data range, eliminating wasted chart space and providing better data visualization.

Improvements

This improvement reduces allocation rate when parsing HTTP headers by extending the internal map with a method to store keys without creating defensive copies. Previously, the header parser would pool HTTP headers but still create defensive copies of all keys when storing them in a map, generating unnecessary garbage.
This improvement eliminates unnecessary DirectIntList instance creation during OrderedMap resizing operations and removes redundant address calculations for offset and hash code operations. The optimization reduces garbage collection pressure and improves performance across various map sizes, with particularly notable improvements in merge operations.
This improvement significantly reduces query latency spikes by moving expensive memory unmapping operations to background threads. Users running latency-sensitive workloads will experience more consistent query response times, particularly at p99 and above. The feature is opt-in and can be enabled by setting cairo.file.async.munmap.enabled=true in the server configuration. Benchmarks show substantial improvements in latency distribution, with average latency dropping from 46.46ms to 16.02ms and P99 latency improving from 359.68ms to 136.19ms.
This improvement introduces a specialized cursor factory that emits CROSS JOIN output directly in the required order without materializing the orders table. The optimization significantly improves performance for markout curve queries that generate sampling points over the markout horizon of orders. In benchmarks with 3 million orders and 121 sampling points each, query time improved from 135 seconds to 17 seconds while reducing RAM usage from 10.7 GB to 2.4 GB.
This improvement introduces cardinality statistics to count_distinct() functions and optimizes parallel keyed GROUP BY operations. The enhancement includes eager merging of parallel count_distinct() functions, lightweight cardinality reporting, and automatic switching to map sharding for high-cardinality operations. The number of map shards is now fixed at 256 for better CPU cache locality, and the default cairo.sql.parallel.groupby.sharding.threshold is reduced to 10k for more frequent parallel merge operations.

Bug Fixes

This fix addresses a compatibility issue where Sender implementations were encoding ColumnType.NULL when sending null array values to the server. Since ColumnType.NULL changes each time a new column type is added, this created backwards compatibility problems. Senders now omit field values for null arrays, excluding the column from the Influx Line Protocol message entirely, consistent with how other column types handle null values.
This fix adds support for UUID binding variables within JIT-compiled filters. Previously, PostgreSQL Wire Protocol queries eligible for JIT compilation that used UUID parameters would fail with an "unsupported bind variable type: UUID" error. The fix eliminates the need for explicit casting workarounds that were required to disable JIT compilation.
This fix resolves a bug in the optimizeOrderBy pass that was incorrectly removing ORDER BY clauses in sub-queries of aggregations. Functions like first(), last(), first_not_null(), and last_not_null() require rows to be received in the expected order to work properly. The optimizer now correctly preserves ordering requirements for these aggregate functions.
This fix resolves a compilation error that occurred when using aggregate functions with unary minus operators. The optimizer code previously assumed binary - and + operations and failed to perform null checks for unary cases. The fix adds proper null checking and validation to ensure the operation is binary when expected.
This fix corrects the LimitRecordCursorFactory.size() method which was returning incorrect results for LIMIT x, y queries when the underlying base cursor had fewer rows than y - x. The fix ensures proper size calculation by calling countRows() when the base cursor size is unknown, following the existing pattern used in other branches for consistency.
This fix resolves a SQL parsing issue for materialized view ADD INDEX statements when the SQL ends with a semicolon.
This fix ensures that when a SAMPLE BY query specifies only one fill option, the logic for reorganizing fill values based on the positional relationship between TopDownColumns and BottomUpColumns remains consistent across all aggregation columns.
This fix handles TableReferenceOutOfDateException when ALTER TABLE is re-compiled inside the WAL apply process. Additionally, it removes error logging for wa*.lock file lock errors (now logged as debug) and fixes incorrect writerTxn values in wal_tables queries.
This fix resolves an internal error that caused certain ASOF JOIN queries to fail unexpectedly.
This fix prevents query hints specified in the main SELECT statement from incorrectly being applied to Common Table Expressions (CTEs) within the same query.
This fix resolves conditions within the Limit factory that were causing incorrect row handling in subqueries combined with UNION operations.
This fix ensures that Parquet export properly handles empty tables and partitions by generating valid Parquet files containing only schema information, preventing export failures that could previously occur.
This fix resolves an issue where the questdb_json_queries_connections gauge was being decremented by sources like 'export' and 'other' that would not increment the gauge, causing negative values.
This fix moves the column width saving logic before the setColumnWidth call to prevent using stale column width references when arranging displayed array cell contents.

November 13, 2025

This release introduces significant new features including support for Decimal data types and symbol capacity auto-scaling enabled by default. Key improvements include critical fixes for storage corruption during deduplicate writes, enhanced ASOF JOIN performance for dense data interleaving, and better handling of async jobs under extreme system load. The release also transitions to using GraalVM CE JDK and includes various performance optimizations and infrastructure improvements.

New Features

This feature integrates the DECIMAL type into the database, making it available through SQL, Influx Line Protocol, and PostgreSQL Wire Protocol. The implementation supports decimals with precision of 76 or less using the syntax DECIMAL(precision, scale) where precision defines the maximum count of digits and scale defines the fractional digits. The feature includes basic arithmetic operations (+, -, /, *, %), comparison operators, type casting functions, aggregate functions (min, max, avg, sum, count), and math functions (abs, round, sign, floor, ceil). Unlike traditional floating-point types, when an operation cannot provide an accurate value, the database throws an error, with division being an exception that returns a HALF_DOWN rounded quotient.
This feature introduces a new hash map implementation that maps CharSequence to int without heap allocation or garbage collection, replacing the existing CharSequenceIntHashMap in WalWriter that allocated heap-based String objects causing GC churn during WAL transaction commits. Benchmarks show approximately 12% performance improvement with almost zero allocations (0.622 B/op vs 72000.708 B/op) and reduced jitter compared to the original implementation.

Improvements

This improvement automatically adjusts symbol table capacity as needed, eliminating the need for manual configuration in most use cases.
This improvement introduces the Dense ASOF JOIN algorithm, a variant of the Light algorithm that uses binary search to quickly jump to the timestamp matching the first left-hand row instead of starting the scan at the top. This optimization is highly important when the right-hand table has history that predates the first left-hand row, as the entire history will be skipped except for recent history needed to find matches of initial few left-hand rows. The improvement also optimizes symbol-to-symbol joins for existing Light and Fast algorithms by copying just the symbol key instead of the symbol string. In benchmarks with 167.5 million trade rows and 1.01 billion price rows, this feature achieved 730 milliseconds execution time compared to 9.7 seconds with the Light algorithm.
This improvement transitions QuestDB releases to use GraalVM Community Edition JDK instead of the previous JDK distribution. The change has been validated through release CI and Docker builds to ensure compatibility and functionality.

Bug Fixes

This fix prevents TableReferenceOutOfDateException errors that could occur when async jobs attempt to access tables that have been dropped or recreated during concurrent operations.
This fix prevents InternalError crashes that could occur in async jobs when the system is under extreme memory pressure, particularly affecting operations like O3 partition purging.
This fix resolves a critical storage corruption issue that could occur when deduplication operations resulted in identical data, particularly during materialized view refreshes. The bug could cause double memory close errors and incorrect directory deletions.
This fix resolves incorrect result set size display in the Web Console when the size of the first partition exceeded the configured maximum page frame size. The fix modifies size calculation in page frame cursors to account for the remaining size of the backing partition and addresses related bugs where some factories did not account for rows already consumed from the cursor.
This fix prevents out-of-memory errors when executing queries on large Parquet files with substantial row groups. The issue occurred because row group buffers were being retained for each page frame reduce task during parallel execution, leading to memory exhaustion when processing files with large row groups (up to 600K rows).
This fix resolves an issue where the query extraction logic incorrectly processed single quotes inside SQL comments, causing multi-statement queries to be treated as a single query. The improvement adds proper comment detection when toggling quote state parsing.

November 7, 2025

November 3, 2025

QuestDB 9.1.1 focuses on data interoperability, nanosecond precision, and analytical performance. This release introduces Parquet export, a faster ASOF JOIN for sparse symbol data, and several engine optimizations improving both query throughput and stability.

Breaking Changes

This feature introduces several nanosecond timestamp functions including now_ns() for current timestamp with nanosecond precision, rnd_timestamp_ns() for generating random timestamps with nanoseconds, and timestamp_sequence_ns() for creating timestamp sequences with nanosecond precision. The extract() function now supports extracting nanoseconds from timestamps. Additionally, this introduces MillsTimestampDriver for better consistency between Date and Timestamp data types. However, rnd_timestamp() now only returns timestamp_micro precision and timestamp_sequence() only returns timestamp_micro precision without considering the first argument's type for timestamp type determination.
This fix enables multi-threaded execution for SAMPLE BY queries with selected expressions mixing timestamp and aggregate functions. It also improves execution plan selection for SAMPLE BY and GROUP BY queries by avoiding duplicate group keys and reusing existing ones. When designated timestamp is selected multiple times under different aliases, the query engine now picks the first ordered column. If a selected column is used as the first ORDER BY column, it takes preference regardless of position in the SELECT clause. Unaliased columns matching the designated column name are preferred over aliased ones.

New Features

This function is useful for testing algorithms with high-cardinality symbols that have skewed distribution of rows.
This feature adds support for exporting Parquet files via COPY, improving ergonomics and making it easier for users to extract data as Parquet files without blocking writes. The feature supports exporting entire tables or subquery results with configurable options including compression codec, compression level, row group size, data page size, statistics, Parquet version, and raw array encoding. HTTP export is also supported via the /exp endpoint with fmt=parquet parameter.
This feature introduces protocol v2 where timestamps passed to the client in nanoseconds are sent as nanoseconds on the wire, while microseconds are sent as microseconds. The unit used to send data on the wire now drives the column type when tables or columns are created via Influx Line Protocol. If data is sent in nanoseconds, the column type will be TIMESTAMP_NS; if sent in microseconds, it will be TIMESTAMP. The feature also includes overflow checking for TIMESTAMP_NS columns and fixes the reset() method on the HTTP sender.
This feature implements the last(array) aggregate function using the specialized allocator for aggregate functions. This function allows users to retrieve the last array value in a group or partition, extending QuestDB's array handling capabilities for time-series analysis.
This feature puts Drive-By Caching behind a query hint for the Memoized Scan algorithm in ASOF JOIN on symbol columns. The algorithm uses two types of caching: Match Caching (single hash lookup per row) and Drive-By Caching (hash lookup for every visited row). While Drive-By Caching benefits queries with many symbols in the deep past of the table, it introduces overhead in typical cases where fewer such symbols exist, making the query hint necessary for optimal performance.

Improvements

This improvement allows the read_parquet() function to automatically detect when Parquet data is already sorted by timestamp, enabling more efficient query processing. The optimization also includes updates to compression library versions for zstd, snappy and other algorithms in parquet2.
This improvement disables column pre-touch in parallel filters by default as it negatively impacted query execution time in many scenarios. The cairo.sql.parallel.filter.pretouch.enabled configuration property is now deprecated and ignored. Column pre-touch can now be enabled per-query using the SQL hint /*+ ENABLE_PRE_TOUCH(table_name) */ for specific use cases where it provides performance benefits.
This improvement extends the fast path optimization for ORDER BY with LIMIT queries to work with virtual columns, expressions, and VirtualRecordCursorFactory cases that were previously excluded. The optimization now supports vectorized GROUP BY, applies to TIMESTAMP columns in addition to LONG columns, and improves CROSS JOIN performance for small right-hand side tables. This enhancement allows more complex queries with subqueries, expressions, and virtual columns to benefit from the fast path optimization.
This improvement introduces deferred merge in parallel COUNT_DISTINCT() functions for int, ipv4, long, uuid and long256 types. Hash set merge is now performed during getLong() calls instead of when merging GROUP BY hash tables, avoiding redundant hash set merges in queries with ordering and limits. The optimization includes inlining single keys in MapValue for COUNT_DISTINCT(int) and COUNT_DISTINCT(long), and introduces a faster FxHasher-based hash function with reduced load factor from 0.75 to 0.5.
This improvement addresses performance issues when keyed ASOF JOIN operations encounter symbols that occur rarely in the right-hand side table. Previously, the linear scan would restart from scratch for each left-hand side row, causing repeated expensive searches for the same rare symbols. This optimization introduces memoization that remembers the location of each symbol encountered during the search and reuses that knowledge for subsequent scans. It also tracks regions where symbols were not found, allowing the scan to skip already-searched areas of the right-hand side table.
This improvement provides clearer and more helpful error messages when users reference invalid columns in JOIN queries, making it easier to identify and resolve query issues.
This improvement optimizes ASOF JOIN operations on single symbols by re-indexing symbols and joining on integer values rather than symbol text. The optimization is activated with the ASOF_LINEAR_SEARCH hint and represents a variant of the linear ASOF JOIN algorithm, providing significant performance gains for single-symbol join scenarios.

Bug Fixes

When a query was cancelled, GroupByVectorAggregateJob would log the error with critical level since it wasn't handling errors properly. This fix ensures proper error handling during query cancellation.
In case of ~1B row table and daily sampling, estimateRowsPerBucket() could overflow and return a very small estimate leading to too large interval size chosen for the materialized view refresh.
This fix resolves connection leaks due to broken operation id order in the Windows I/O dispatcher. The bug could manifest when opening a TCP connection but writing to it only after some time, allowing the heartbeat mechanism to kick in. This fix also moves operation and event id generation code to AbstractIODispatcher for consistency across all I/O dispatcher implementations and switches OSX dispatcher to use long counter to prevent overflow issues.
This fix prevents custom Web Console configurations from being overridden during QuestDB upgrades, preserving user customizations.
This fix resolves an overflow issue in SampleByIntervalIterator when calculating high boundaries on large SAMPLE BY intervals with relatively large step values. A new configuration property cairo.mat.view.max.refresh.step (default 1y) was introduced to define the maximum step interval, after which refresh falls back to a single SAMPLE BY interval to prevent overflow issues.
This fix ensures that interval constants are properly recognized and applied as time intrinsics in SQL expressions.
This fix addresses two issues with SAMPLE BY to parallel GROUP BY rewrite optimization. First, queries mixing timestamp and aggregate function expressions were producing unordered and incorrect results due to improper rewrite handling - the rewrite is now disabled for such queries. Second, the rewrite could incorrectly omit columns from the result set when expressions involved the designated timestamp column.
This fix resolves a null pointer exception that could occur during SQL query optimization in the ORDER BY rewrite process.
This fix resolves an issue where the lead() window function's third parameter (default placeholder value) was not initialized correctly when using bind variables in the position lead(value, 1, ?), causing null reference exceptions during execution.
This fix resolves a critical but rare edge case where the server could fail to start with a NullPointerException during initialization. The issue was triggered by a prior execution of SQL UPDATE on a table that was open for reading.
This fix resolves an issue where the indexer column top was not reset on truncate when an indexed symbol column was added to a non-partitioned table, causing errors on subsequent index rebuilds. The issue was specific to non-partitioned tables and was discovered through fuzz testing.
This fix resolves an issue where WHERE clause filters like sym_col1 = sym_col2 were incorrectly JIT compiled. The compiled filter function was using internal symbol codes for the equality check, which produced incorrect query results. This fix disables JIT compilation for such symbol column equality filters to ensure accurate results.
This fix resolves an issue where Parquet export would fail when processing timestamps in descending order. The problem occurred because checking the timestamp index alone was insufficient to determine if the timestamp was designated - the scan direction must also be considered for proper validation.
This fix resolves parsing errors for cookie Expires values using legacy date formats. The HTTP header parser now supports HTTP 1.0 era 2-digit year formats and ANSI C asctime formats as fallback options after standard formats. The new patterns include E, d-MMM-yy HH:mm:ss Z for HTTP 1.0 and E MMM dd HH:mm:ss yyyy for ANSI C formats. This maintains RFC 2616 compliance while ensuring backward compatibility with legacy systems that still use deprecated date formats.
This fix resolves an issue on busy Linux systems where the server could fail to accept connections when many clients attempt to open new connections simultaneously. The problem manifested as client connection timeouts during high-load scenarios.
This fix resolves an issue where the query extraction logic was incorrectly processing single quotes inside SQL comments, causing multiple separate queries to be treated as a single query. The improvement adds proper comment detection when toggling quote parsing states.

October 27, 2025

This release adds Parquet export support and OAuth2 token authentication for PostgreSQL Wire Protocol, along with various bug fixes for replication, ILP, and SQL operations.

New Features

This feature introduces OAuth2 token authentication support for PostgreSQL Wire Protocol connections. It includes a new configuration key to enable the feature (disabled by default), and when enabled, passwords are treated as OAuth2 tokens if the username is '_sso' or no username is provided.
This feature adds a new symbolTableSize column to the SHOW COLUMNS output that displays the number of symbol dictionary codes present in the symbol table. This improvement also fixes table reader initialization for zero-column tables that previously failed when all columns were dropped.

Bug Fixes

This fix resolves an issue where replica tables could be suspended due to interaction between WAL downloader and WAL apply processes. The problem occurred when the Fd Cache cached sequencer file descriptors while the downloader used file renames to move files into correct directories, causing structure change metadata to become unavailable during WAL application.
This fix corrects the handling of double column data when ingested into symbol columns through Influx Line Protocol, ensuring proper symbol value conversion and storage.
This fix prevents automatic connection closure when cast errors occur during Influx Line Protocol processing, such as when sending a float value to a string column. When line.tcp.disconnect.on.error is set to false, the connection remains open and errors are no longer logged at critical level.
This fix resolves compilation errors that occurred when casting symbol columns to symbol type within GROUP BY queries. The fix also removes string sinks from IPv4Function as they were only used in UPDATE statements, which now generate wrapper cast functions like other fixed-size types.
This fix adds proper validation to prevent multiple DECLARE statements from being used within a single query block, which was previously causing parsing issues.
This fix corrects null-handling issues in RecordToRowCopier utilities where null values were not properly converted during implicit casting. Previously, a null LONG column would incorrectly retain the minimum long value instead of becoming Double.NaN when cast to DOUBLE.
This fix ensures consistent permission checking across all EXPLAIN commands by adding compile-time authorization for CREATE TABLE and CREATE MATERIALIZED VIEW statements. It also provides proper error messages for unsupported EXPLAIN DROP, EXPLAIN ALTER, and EXPLAIN REFRESH commands.
This fix resolves overflow issues that could occur during materialized view refresh operations, ensuring data integrity and preventing calculation errors.
This fix resolves a crash that occurred when using bind variables with the LEAD() window function, ensuring stable query execution with parameterized queries.
This fix ensures that the actual write failure error is reported instead of a misleading table locked error, providing clearer diagnostics for write operations.
This fix resolves a rare condition where dropping an index could cause table operations to become suspended, ensuring continued table availability during index management operations.
This fix resolves an assertion error that could occur when executing SQL queries with symbol index operations after table metadata changes, particularly affecting GROUP BY queries with symbol filters.
This fix resolves assertion exceptions that occur in the writer when encountering operating system errors during write operations.
This fix resolves a file descriptor leak that occurred when the ff.length(keyFd) call throws an exception during index writer initialization. The issue was caused by the kFdUnassigned flag being set to false before the potentially failing operation, preventing proper cleanup of the file descriptor.
This fix resolves a null pointer exception that could occur during column purge operations.

October 3, 2025

QuestDB 9.1.0 is a feature-rich release bringing nanosecond timestamp data type, performance and stability improvements.

Breaking Changes

This change replaces the binary search algorithm for serving IN(longlist...) predicates with a hashset implementation that scales better as list size increases. A new configuration setting cairo.sql.jit.max.in.list.size.threshold specifies the maximum size of IN lists accepted by the JIT compiler. When this limit is exceeded, the JIT compiler will abort and compile a Java filter instead, as JIT compilation unrolls IN lists into equality comparisons chained by logical OR operators, which degrades performance for large lists. This change also fixes an existing bug where empty lists could cause incorrect behavior.

New Features

This feature adds new window functions: first_value(timestamp), first_value(long), last_value(timestamp), last_value(long), max(timestamp), max(long), min(timestamp), and min(long).
This feature adds a new symbolTableSize column to the SHOW COLUMNS output that contains the number of symbol dictionary codes present in the symbol table. This improvement also fixes table reader initialization in cases where tables have zero columns after dropping all columns.
This feature introduces comprehensive nanosecond timestamp support across QuestDB, including core arithmetic operations, parser/formatter functionality, SAMPLE BY operations, ASOF joins, materialized views, O3 ingestion, partition operations, Parquet export, CSV import, and Influx Line Protocol clients. The TIMESTAMP data type now carries resolution information similar to GEOHASH data types, with partition operations accompanied by timestampType metadata.
This feature integrates async-profiler 4.1 with two profiling modes: continuous profiling that starts with the agent and on-demand profiling that attaches to running instances. The implementation supports configurable loop duration (default 30 minutes), JFR format output with timestamped files, custom profiling parameters, and automatic profile directory creation. Currently available for Linux x86-64 distributions with the libasyncProfiler.so library and asprof/jfrconv binaries included in the lib/ directory.
This feature leverages the symbol index on the right-hand side of ASOF JOIN operations, avoiding the need to visit rows that don't have the matching symbol. This optimization is particularly beneficial for markout analysis scenarios where you need to lookup matching currency rates or other symbol-based data across large time ranges. Instead of scanning backwards through all rows until finding a match, the feature performs only one symbol index lookup per partition, dramatically improving performance for queries involving illiquid assets or sparse data.
This feature introduces Right Outer Join and Full Outer Join functionality, expanding the range of supported join operations. For HashLightInnerJoin and HashLightFullOuterJoin, the implementation now supports limited runtime reordering of left and right cursors, allowing joins to dynamically choose more efficient build/probe table combinations when the right table is significantly larger than the left table.
This feature enables the Java Influx Line Protocol client to handle multiple possible endpoints. On first connection, the client polls /settings to determine if the instance is read-write or read-only. The sender automatically retries HTTP sends on 404 or 421 responses if the instance status changes after initialization.
This feature returns the most frequent appearance of a column value for each group, eliminating the need for complex subqueries involving COUNT() operations and sorting. The function is useful for identifying patterns like the most frequently traded symbol by a trader or the most common state for a sensor. The implementation uses a map-of-maps structure where outer maps store pointers to inner maps that aggregate per-key counts, with parallel execution support for improved performance.
This feature deprecates the useFastAsOfJoin configuration property and introduces positive algorithm selection hints for ASOF and LT JOIN variants. The asof_linear_search hint selects the original algorithm without binary search or index scan optimizations, while asof_index_search requests the algorithm that leverages symbol indexes. The existing avoid_asof_binary_search and avoid_lt_binary_search hints are deprecated but still honored.
This feature allows the server to accept gzip-encoded Influx Line Protocol data, addressing bandwidth inefficiencies in the text-based protocol. While compression introduces CPU overhead that reduces raw throughput, it significantly reduces bandwidth usage, which can raise the maximum throughput ceiling when receiving text-heavy data from multiple sources. The feature is particularly beneficial for scenarios involving logs and documents where the client-side encoding overhead is masked by improved network efficiency.
This feature improves PostgreSQL Wire Protocol query performance when using IN() predicates on multiple indexed columns. The optimizer now considers column cardinality when selecting which indexed column to scan, even in the presence of binding variables, by choosing the column with higher cardinality for better selectivity. This optimization applies when the new column is different from the current key column, avoiding unsafe key merging operations.
This function generates Zipf/Power Law Distribution of symbols from a provided list, where the probability of each symbol decays from left to right with a given 'alpha' velocity. Additionally includes rnd_symbol_weighted() function for more verbose weighted symbol generation with explicit probability values for each symbol.
This feature enables searching for column names within the schema filter, automatically expanding tables that contain matching columns. The implementation optimizes performance by removing redundant column requests on schema reload, fetching symbol column details on demand using SHOW COLUMNS FROM, and includes a 10-second focus listener timeout for schema refresh.
This feature provides visual feedback when executing multiple queries simultaneously, showing loading indicators and notifications for each individual query's progress and completion status.

Improvements

This improvement reintroduces a dedicated thread pool for materialized view refresh operations to prevent resource contention with WAL apply jobs. Heavy materialized view workloads can no longer monopolize the shared writer pool, eliminating potential delays or starvation of Apply WAL jobs and preventing database write performance degradation during intensive MV operations. When the dedicated MV pool is disabled or has 0 workers assigned, the MV refresh job will not run.
This improvement optimizes the AVG(short) function by using a more appropriate accumulator type instead of long128, which was overkill for the value range. The Java implementation now uses long instead of double as the accumulator, eliminating short-to-double conversion overhead and enabling faster addition instructions. Additionally reduces memory footprint of SUM() aggregate functions and adds SumShortGroupByFunction for more efficient processing of short and byte values without null checks.
This improvement rewrites trivial expressions that operate on the same column within GROUP BY queries for better performance. The optimization also includes integer value support for CSV import of IPv4 columns and adds yyyy-mm-dd date format to default text loader formats. Performance testing shows significant improvements, with ClickBench Q35 hot execution times improving from 724ms to 388ms.

Bug Fixes

This fix resolves a memory and file descriptor leak that occurred when a partition removal was followed by a schema change. The issue happened because stale bitmap indexes remained in the bitmapIndexes list beyond the valid range after partition deletion. When a table schema change triggered createNewColumnList(), it created a new bitmap index list but only copied indexes within the valid range, leaving the stale indexes orphaned and their file descriptors never closed.
This fix corrects the issue where suspended tables due to corrupt WAL files showed outdated writerTxn numbers in wal_tables queries. The problem occurred during exception handling when the Seq Table Tracker was incorrectly updated to the table txn number instead of the seqTxn number.
This fix resolves timeout errors when opening TableReader after dropping and recreating non-WAL tables. The issue occurred when purge jobs pushed max txn using TxnScoreboardPoolV2.isRangeAvailable(), preventing the reader from locking the latest transaction in the scoreboard. The fix makes TxnScoreboardPoolV2.isRangeAvailable() a read-only operation that doesn't modify the max value.
This improvement enhances the reliability of the HTTP Influx Line Protocol sender by adding automatic retry logic during initial connection setup when fetching server configuration. Retries are attempted for up to the configured maxRetriesNanos duration and only occur on server errors (500, 503, 504, 507, 509, 523, 524, 529, 599) and network failures. Success is now defined as any 2xx status code instead of only 200 for settings fetch.
This fix resolves spurious memory mapping errors that occurred during stress testing of WAL purge operations, particularly when generating many WAL segments while running WalPurgeJob frequently. The issue was discovered through fuzz testing and could manifest after several minutes of operation on Windows systems.
This fix removes the implicit contract that failure to extend memory will close the memory, preventing table readers from being left in an inconsistent state during error conditions.
This fix resolves incorrect symbol value conversion when using double column data to ingest into symbol columns through Influx Line Protocol. The fix also enhances exception messages for invalid UTF-8 to UTF-16 conversions.
This fix resolves a race condition that created misleading "cannot lock table" errors when the actual issue was a failure in the ApplyWal job, such as missing WAL metadata changes. The fix prevents the creation of duplicate TableWriter objects in WriterPool that resulted from these race conditions.
This fix resolves compilation errors when casting symbol columns to symbol type within GROUP BY queries. Additionally, string sinks were removed from IPv4Function as they were only used in UPDATE operations - UPDATE statements will now generate wrapper cast functions similar to other fixed-size types.
This fix addresses a critical issue where column rename failures in TableWriter could leave metadata files in an inconsistent state. The solution reorders operations so commits happen before _todo file cleanup, adds table transaction information to the _todo file for validation, and ensures TableWriter is marked as distressed if any rename step fails. The fix also modifies TableWriter initialization to read transaction information before processing _todo files.
This fix prevents multiple DECLARE statements from being executed within a single query block, ensuring proper SQL syntax validation and preventing potential parsing conflicts.
This fix resolves two nanosecond-related issues: incorrect nanosecond precision handling when calculating TOLERANCE values for ASOF joins, and improper automatic timestamp type selection during CSV imports when timestamp columns are explicitly typed.
This fix prevents the Influx Line Protocol TCP connection and writer from being closed on cast errors when line.tcp.disconnect.on.error is set to false. Previously, protocol errors like attempting to send a float value to a string column would always close the connection regardless of the configuration setting. The error logging level has also been reduced from critical to a more appropriate level.
This fix resolves an assertion error that could occur in SQL queries after metadata changes, particularly in GROUP BY operations involving symbol index lookups. The error was discovered through fuzz testing and manifested as an AssertionError in TableReader.getBitmapIndexReader during query execution with symbol filtering operations.
This fix ensures that all EXPLAIN commands consistently check user permissions before execution. Previously, EXPLAIN CREATE TABLE and EXPLAIN CREATE MATERIALIZED VIEW did not perform authorization checks, while other EXPLAIN commands did. This improvement also adds proper error messages for EXPLAIN DROP, EXPLAIN ALTER, and EXPLAIN REFRESH commands.
This fix resolves multiple issues with bind variables in function calls. Previously, bind variables were defined as typeless arrays requiring explicit casting, unsupported types would default to STRING or cause confusing errors, and functions with array arguments wouldn't define types properly for PostgreSQL drivers. The server now correctly rejects incompatible and unsupported types with clear error messages, and the function parser treats UNDEFINED types to match the best function signature, enabling proper bind variable usage with drivers like tokio-postgres.
This fix resolves a file descriptor leak that occurred when the ff.length(keyFd) call throws an exception during index writer initialization. The problem was that the kFdUnassigned flag was being set to false before the potentially throwing call, causing the keyFd to leak in error scenarios.
When importing CSV files in QuestDB Enterprise, the Web Console queries user group memberships with SHOW GROUPS username. This fix properly quotes the username parameter to handle spaces and dot characters that would otherwise break the SQL query.

September 15, 2025

This release focuses on stability improvements with fixes for rare replication issues, database shutdown crashes, and OSS synchronization. It includes new window functions and symbol table visibility features from the OSS codebase.

New Features

This feature adds new window functions: first_value(timestamp), first_value(long), last_value(timestamp), last_value(long), max(timestamp), max(long), min(timestamp), and min(long).
This feature adds a new symbolTableSize column to the SHOW COLUMNS output, which contains the number of symbol dictionary codes present in the symbol table. This improvement also fixes table reader initialization in case of zero-columns tables.

Improvements

This improvement reintroduces a dedicated thread pool for materialized view refresh operations to prevent resource contention with WAL apply jobs. Heavy materialized view workloads can no longer monopolize the shared writer pool, preventing delays or starvation of Apply WAL jobs and maintaining database write performance during intensive MV operations.
This improvement changes the algorithm for serving IN(longlist...) predicate to use a hashset instead of a binary search, which scales better as list size increases. A new configuration setting cairo.sql.jit.max.in.list.size.threshold specifies the size of the IN list that will be accepted by the JIT compiler. If this limit is exceeded, the JIT compiler will abort and compile a Java filter instead, since JIT compilation unrolls IN lists into equality comparisons chained by logical or operators, which degrades performance for large lists. This improvement also fixes a bug where an empty list would incorrectly attempt to read a value.

Bug Fixes

This fix addresses a race condition where the ApplyWal2TableJob could fail and suspend when a segment directory was temporarily missing during the move operation from downloaded to target location. The issue occurred because the directory move was implemented as a two-step process that left a gap where no segment directory existed. This fix reverts to moving files individually when the directory already exists, while maintaining performance optimizations for fresh new segments. Additionally, this fix resolves a race condition in file lock acquisition on Linux and MacOS where the lock file could be deleted after opening but before locking.
This fix prevents a fatal error that could occur during database shutdown when notifications were sent to the WalUploader after it had already been destroyed. The crash would manifest as a Java Runtime Environment fatal error during the shutdown sequence.
This fix ensures that group memberships and service account assignments are properly cleaned up when a user, group, or service account is dropped. Previously, these linked objects would remain orphaned after the parent entity was removed. This change also includes improved logging of user information messages from OAuth2 providers when required claims cannot be found.
This fix addresses a memory and file descriptor leak that could occur when a partition removal was followed by a schema change. The issue happened because stale bitmap indexes were left in the bitmapIndexes list beyond the valid range after partition deletion. When createNewColumnList() was triggered by a schema change, these stale indexes were not included in the new list, causing them to become orphaned and never properly closed.
After a table was suspended due to a corrupt WAL, the wal_tables query displayed an outdated writerTxn number. This fix ensures that during exception handling, the Seq Table Tracker is updated to the table txn number rather than seqTxn.
When a non-WAL table was dropped and re-created, purge jobs could push max txn using TxnScoreboardPoolV2.isRangeAvailable() leading to a timeout when opening TableReader, where it could not lock the latest transaction in the scoreboard. This fix makes TxnScoreboardPoolV2.isRangeAvailable() a read-only scoreboard operation that does not modify the max value.
This fix resolves memory mapping errors that occurred during stress testing with frequent WAL segment generation and purge job execution, particularly affecting Windows environments during extended test runs.
This fix removes the implicit contract that failure to extend memory will close the memory, preventing table readers from being left in an inconsistent state during error conditions.

August 29, 2025

August 15, 2025

This release includes performance improvements for replication, with optimizations allowing replicas to skip unnecessary downloads and improved downloader performance. It also includes various bug fixes and OSS synchronization.

Improvements

This improvement introduces an important performance optimization for QuestDB Enterprise replication. Previously, when a replica lagged behind, it would download all missing sequencer _txnlog parts at each iteration, repeatedly downloading the same parts if the replica was behind by several thousand transactions. This optimization tracks downloaded transactions separately, allowing the replica's downloader to skip parts it already has. While this provides minor benefits for replicas with sufficient resources, it significantly reduces work for replicas that are caught significantly behind, such as primaries recreated from backups, replicas that were offline, or underspecced replica boxes struggling with spikes.
This improvement introduces various performance enhancements that decrease disk IO caused by replication, reduce unnecessary downloads, and improve concurrency by performing downloads in parallel. The changes include configuring sleep time after replication errors to use the poll interval instead of a hardcoded 200ms value, and fixing summary logic to avoid using filtered indexes. These optimizations reduce the time required for a replica to catch up when it has fallen behind.

Bug Fixes

This fix addresses lingering table metrics that would consume space in DataDog or other scraping middleware after tables were dropped. Once a table is dropped on the primary and replica, the prometheus metrics for that table are now cleared after a configurable number of requests. Two new configuration options are introduced: replication.metrics.dropped.table.poll.count=10 removes dropped table metrics after N times scraped, and replication.metrics.per.table=true allows completely disabling per-table replication metrics.

9.0.2 is a performance and stability release. It accelerates common query patterns, reduces resource usage, and adds quality-of-life features across SQL, the PostgreSQL wire protocol (PGWire), ingestion, and operations.

New Features

This feature allows negative array indices to select elements from the back of arrays, where -1 represents the last element. For example, arr[-1] returns the last element, arr[-2] returns the second-to-last element, and range operations like arr[1:-1] and arr[-2:5] are also supported.
These new SQL functions calculate minimum, maximum, and standard deviation (both sample and population) of double arrays, supporting both flat and multi-dimensional arrays. This feature includes efficient recursive and flat-array handling with improved processing of special values like NaN and Infinity to ensure correct results in array aggregation operations.
This feature extends the TRUNCATE TABLE syntax to support the full format: TRUNCATE TABLE [IF EXISTS] table1[,table2...] [KEEP SYMBOL TABLES]. The IF EXISTS clause prevents errors when attempting to truncate non-existent tables, and multiple tables can be truncated in a single statement.
This feature introduces FdCache and MmapCache classes to cache file descriptor and memory map operations, reducing redundant system calls and improving resource reuse. The implementation includes new caching layers and updates native file operations to use overlapped I/O on Windows.
This feature adds reshape() methods to enable runtime dimension changes with variants for 1D, 2D, and 3D arrays, plus a clear() method to reset append position for error recovery after sender.cancelRow() operations. Arrays can now be resized at runtime and single array instances can be reused across different shapes.
This feature enables tables with array columns to be converted to and from Apache Parquet format using ALTER TABLE CONVERT PARTITION TO PARQUET. Arrays are exported as lists of double values by default, following Parquet list requirements. An alternative native binary format is available via the cairo.partition.encoder.parquet.raw.array.encoding.enabled=true configuration property.
This feature enforces case-insensitive uniqueness validation for column names in the table schema dialog, preventing duplicate columns that differ only by case such as 'Age' and 'age'. Previously, the Web Console incorrectly allowed such duplicate column names to be created.

Improvements

This improvement adds a fast path to double quote replacement in VarcharAdapter. Performance testing with the ClickHouse hits dataset showed parsing and WAL writes improved from 480 seconds to 410 seconds when sent to the POST /imp endpoint.
This improvement changes QuestDB's default threading strategy from a single shared pool to three specialized thread pools. The Network thread pool handles network IO for HTTP, PostgreSQL Wire Protocol and Influx Line Protocol servers. The Query thread pool manages parallel queries including parallel filters and GROUP BY operations. The Writer thread pool runs WAL Apply Jobs, Table Writer jobs, materialized view refresh, and various purge and housekeeping jobs.
This improvement replaces eager size calculation with lazy evaluation, using partition metadata to avoid unnecessarily opening partitions and applying approximate timestamp filtering before expensive partition reads. For hash joins, size computation is deferred until explicitly requested via calculateSize(), eliminating upfront hash map traversal. This optimization particularly benefits LIMIT queries on large datasets, time-filtered queries on partitioned tables, and Parquet partitions where metadata access is cheaper than full partition access.
This improvement enables parallel execution of top-K queries by calculating top K records on each worker thread, then merging the results into a single tree chain. The feature can be disabled using the cairo.sql.parallel.topk.enabled=false configuration property. This optimization also fixes query cache clean-up on HTTP and PostgreSQL Wire Protocol connection close to improve cache efficiency for short-lived connections.

Bug Fixes

This fix resolves a race condition where SqlParser instance shared a single stateful visitor to replace variables. Under high query load, state from one SQL query could bleed into an unrelated SQL query, causing spurious 'undeclared variable' errors.
This fix removes the minimal refresh interval defined by cairo.mat.view.min.refresh.interval property. The previous logic led to larger refresh intervals being used for base table commits with small time intervals, such as single timestamp commits. For example, with SAMPLE BY 1s, even single timestamp commits would result in a 100-second refresh interval.
This fix resolves incorrect alignment of fill values in SAMPLE BY queries when specific columns are projected in the SELECT clause.
This fix prevents tables from becoming suspended when a replace commit operation fails on a column with a column top configuration.
This fix resolves race conditions in window function factories that were incorrectly designed as stateful, storing execution context as instance fields. Since QuestDB uses shared instances of function factories, concurrent SQL compilation would overwrite parameters, causing queries to fail or produce incorrect results. The fix makes window function factories stateless by removing all mutable state.
This fix resolves a bug in YearTimestampSampler where rounding calculations were incorrect when the rounded year was a leap year but the input year wasn't. This issue affected materialized view refresh operations where the range replace lo value could end up being greater than the minimal timestamp of the transaction.
This fix changes the method of determining whether a table is a WAL table in TableUpdateDetails to prevent exceptions when writing to non-WAL tables through Influx Line Protocol over TCP.
This fix implements transaction-aware initialization and reuse of bitmap index readers for improved stability under concurrent access. It includes enhanced null-value handling in forward readers and more consistent partition lifecycle handling and resource cleanup.
This fix addresses a rare memory leak that occurred when queries executed while a table was being altered simultaneously. The issue happened during internal query retries when the SQL compiler attempted to create a filter during query compilation. If filter creation failed due to concurrent schema modifications, the factory wasn't being freed before the retry, causing memory from failed attempts to not be released. While queries completed successfully through automatic retries, this edge case affected systems with concurrent schema modifications and queries on the same table.
This fix resolves a file descriptor leak that occurred when a table was dropped while WAL (Write-Ahead Log) files were being purged concurrently with TableWriter operations. The leak happened due to double-closing of file descriptors when exceptions occurred during WAL file operations. When TableWriter attempted to open WAL files that were being deleted by the WAL Purge job, it would get an exception and close the open WAL files, then attempt to close them again when the TableWriter object was closed, leading to a file descriptor leak.

July 23, 2025

Release 9.0.1 brings notable enhancements to SYMBOL indexes, along with several bug fixes. Materialized Views now support indexed symbols, improving performance for workloads that rely on large views with numerous symbols. We've significantly optimized index performance under high concurrency, delivering 10x–100x speedup for index-intensive operations.

New Features

This feature introduces a unique data identifier to distinguish separate clusters and prevent conflicts during replication and restore operations. The identifier is accessible through the CURRENT_DATA_ID() SQL function and consists of a 128-bit UUID stored in the database root as .data_id. This enables future conflict detection during restore operations against replications and backups.
This feature enables adding and dropping indexes on symbol columns within materialized views using ALTER MATERIALIZED VIEW syntax. SQL automatically calculates the correct index capacity based on current data in the view, though capacity can still be specified manually. Example usage: ALTER MATERIALIZED VIEW view_1 ALTER COLUMN sym ADD INDEX;

Improvements

This improvement significantly enhances index lookup performance for partitions that do not change from query to query by replacing unnecessary unmap/remap operations with intelligent version checking. This optimization works fully on no-change to append partition update scenarios and also improves performance of index lookups during out-of-order (O3) operations by reducing map/unmap calls.

Bug Fixes

This fix resolves an internal error that occurred when comparing NULL arrays of the same type, such as (null::double[] = null::double[]) in SQL queries.
This fix provides clearer, actionable error messages when attempting to create materialized views from SELECT statements that contain invalid column aliases.
This fix resolves an issue where DISTINCT queries using table alias expressions would incorrectly throw an 'Invalid table name or alias' error.
This fix ensures that column aliases maintain their specified casing in query results. Previously, aliases would revert to the original column name casing regardless of how they were specified in the query.
This fix addresses an edge case in the unquoting logic that was not properly handling escaped quotes in symbol filters during JIT compilation.

QuestDB Releases

QuestDB Enterprise 3.2.5Based on QuestDB OSS 9.3.5

New Features

Bug Fixes

QuestDB 9.3.5

Breaking Changes

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.2.4Based on QuestDB OSS 9.3.4

New Features

Bug Fixes

QuestDB 9.3.4

Breaking Changes

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.2.3Based on QuestDB OSS 9.3.3

New Features

Improvements

Bug Fixes

QuestDB 9.3.3

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.2.2Based on QuestDB OSS 9.3.2

New Features

Bug Fixes

QuestDB Enterprise 3.2.1Based on QuestDB OSS 9.3.2

New Features

QuestDB 9.3.2

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.2.0Based on QuestDB OSS 9.3.1

New Features

Bug Fixes

QuestDB 9.3.1

New Features

Improvements

Bug Fixes

QuestDB 9.3.0

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.1.2Based on QuestDB OSS 9.2.3

Improvements

Bug Fixes

QuestDB 9.2.3

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.1.1Based on QuestDB OSS 9.2.2

Breaking Changes

New Features

Improvements

Bug Fixes

QuestDB 9.2.2

Breaking Changes

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.1.0Based on QuestDB OSS 9.2.1

Breaking Changes

New Features

Improvements

Bug Fixes

QuestDB 9.2.1

New Features

Improvements

Bug Fixes

QuestDB 9.2.0

New Features

Improvements

Bug Fixes

QuestDB Enterprise 3.0.5Based on QuestDB OSS 9.1.1

Bug Fixes

QuestDB 9.1.1

Breaking Changes

New Features

QuestDB Enterprise 3.2.5
Based on QuestDB OSS 9.3.5

QuestDB Enterprise 3.2.4
Based on QuestDB OSS 9.3.4

QuestDB Enterprise 3.2.3
Based on QuestDB OSS 9.3.3

QuestDB Enterprise 3.2.2
Based on QuestDB OSS 9.3.2

QuestDB Enterprise 3.2.1
Based on QuestDB OSS 9.3.2

QuestDB Enterprise 3.2.0
Based on QuestDB OSS 9.3.1

QuestDB Enterprise 3.1.2
Based on QuestDB OSS 9.2.3

QuestDB Enterprise 3.1.1
Based on QuestDB OSS 9.2.2

QuestDB Enterprise 3.1.0
Based on QuestDB OSS 9.2.1

QuestDB Enterprise 3.0.5
Based on QuestDB OSS 9.1.1

QuestDB Enterprise 3.0.3
Based on QuestDB OSS 9.0.3

QuestDB Enterprise 3.0.2
Based on QuestDB OSS 9.0.3

QuestDB Enterprise 3.0.1
Based on QuestDB OSS 9.0.2

QuestDB Enterprise 3.0.0
Based on QuestDB OSS 9.0.1