-
Notifications
You must be signed in to change notification settings - Fork 94
BM25 index on TimescaleDB hypertable: Invalid docid page magic on chunk scans #291
Description
Summary
Creating a BM25 index on a TimescaleDB hypertable and then inserting + querying data produces a consistent InternalServerError on every query that touches chunk indexes. The error occurs on freshly initialized databases with no prior data or schema changes, so this is not related to attnum drift or index corruption from upgrades.
Environment
- PostgreSQL 18.1
- TimescaleDB (PG18 PGDG package, latest available as of March 2026)
- pg_textsearch v0.6.1 (installed from GitHub releases .deb)
- pg_textsearch is listed in
shared_preload_libraries - Tested on both amd64 and arm64
Error
asyncpg.exceptions.InternalServerError: Invalid docid page magic at block 8: expected 0x54504944, found 0x00000000 - stopping recovery
The 0x54504944 value is ASCII "TPID", which is the expected BM25 docid page header. The 0x00000000 indicates uninitialized pages on the chunk's BM25 index.
Reproduction
1. Schema setup
-- Extensions
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS pg_textsearch;
-- Table
CREATE TABLE public.episodes (
id integer GENERATED ALWAYS AS IDENTITY,
store_id integer NOT NULL,
content text NOT NULL,
summary text,
background text,
importance integer NOT NULL DEFAULT 50,
created_at timestamptz NOT NULL DEFAULT now(),
span tstzrange NOT NULL DEFAULT tstzrange(now(), now() + interval '5 minutes', '[)'),
trigram_text text GENERATED ALWAYS AS (
lower(concat_ws(' ', coalesce(content, ''), coalesce(summary, ''), coalesce(background, '')))
) STORED,
PRIMARY KEY (id, created_at)
);
-- Convert to hypertable with daily partitioning
SELECT create_hypertable('public.episodes', by_range('created_at', INTERVAL '1 day'));
-- Create BM25 index on the parent hypertable
CREATE INDEX idx_episodes_trigram_text_bm25
ON public.episodes USING bm25(trigram_text)
WITH (text_config='english', k1=1.2, b=0.75);2. Insert some data
INSERT INTO public.episodes (store_id, content, summary, importance, created_at)
VALUES
(1, 'Alice met with Bob to discuss the quarterly report', 'Meeting about Q1 results', 70, now()),
(1, 'The server migration was completed ahead of schedule', 'Infrastructure update', 60, now()),
(1, 'New feature request from the design team for dark mode', 'Feature request logged', 45, now());3. Query using BM25 (triggers the error)
SELECT
e.id,
-1.0 * (e.trigram_text <@> to_bm25query('quarterly report', 'idx_episodes_trigram_text_bm25'))::double precision AS bm25_score
FROM public.episodes e
WHERE
e.store_id = 1
AND e.trigram_text IS NOT NULL
AND btrim(e.trigram_text) <> ''
ORDER BY e.trigram_text <@> to_bm25query('quarterly report', 'idx_episodes_trigram_text_bm25') ASC
LIMIT 10;This query consistently produces:
ERROR: Invalid docid page magic at block 8: expected 0x54504944, found 0x00000000 - stopping recovery
4. Direct chunk query works fine
Querying the chunk table directly (bypassing the parent) does not produce the error:
SELECT chunk_schema || '.' || chunk_name
FROM timescaledb_information.chunks
WHERE hypertable_name = 'episodes';
-- Then query that chunk directly with the same BM25 operatorWhat I think is happening
When TimescaleDB creates a new chunk, it propagates the parent table's indexes to the chunk. For standard index types (btree, GIN, GiST), this works correctly. For BM25 indexes, the chunk's index pages appear to not be properly initialized during propagation. The BM25 metadata pages on the chunk contain zeroed bytes instead of the expected "TPID" magic header.
This is consistent across fresh database initializations (completely new volumes, no prior data). No ALTER TABLE ... ADD/DROP COLUMN operations occur between index creation and the failing query, so attnum drift between parent and chunk is not involved.
The v0.5.0 release notes mention "Improvements for bm25 indexes on hypertables." The v0.6.0 release introduced a major rewrite (segment format V4, arena allocator, parallel page pool), which may have regressed hypertable chunk index propagation.
Workaround
We are currently falling back to vector-only search when BM25 fails. We are evaluating switching to native PostgreSQL tsvector + GIN indexes for the hypertable, since those handle chunk propagation correctly.
Additional notes
- The BM25 index on a non-hypertable (
threads.user_intent) in the same database works without issues. REINDEX INDEX idx_episodes_trigram_text_bm25does not fix the problem, because new chunks created by subsequent inserts will have the same uninitialized pages.- The
shared_preload_librariesrequirement from v0.6.0 is satisfied.