feat: widen segment offsets from uint32 to uint64 (V4 format) by tjgreen42 · Pull Request #220 · timescale/pg_textsearch

tjgreen42 · 2026-02-11T01:14:25Z

Summary

Widen all segment logical byte offsets from uint32 to uint64, removing the 4GB segment size limit that causes offset overflow with large indexes (e.g., 138M documents producing a ~33GB merged L1 segment)
Introduce V3/V4 format versioning with backward-compatible readers: V3 segments are read and widened transparently, writes always produce V4, and natural compaction upgrades V3 to V4 over time
Bound parallel build memory: per-cycle memory context, DSA trim/limit, and worker cap based on maintenance_work_mem
Use palloc_extended with MCXT_ALLOC_HUGE in docmap finalization for allocations exceeding MaxAllocSize (~1GB) on very large segments
Remove MCXT_ALLOC_HUGE from merge vocabulary arrays (segment merge and parallel build buffer merge) — use plain palloc/repalloc so MaxAllocSize trips visibly if vocabulary ever reaches ~1GB

Format changes

Component	V3 (legacy)	V4 (new)	Overhead
`TpSegmentHeader`	88 bytes	128 bytes	1 per segment
`TpDictEntry`	12 bytes	16 bytes	+33% per term
`TpSkipEntry`	16 bytes	20 bytes	+25% per block

Estimated overhead on a 33GB index: ~380MB (~1.1%).

Benchmark: MS MARCO (8.8M passages)

Metric	main	V4 (this branch)	Delta
Index build time	6m 4.5s	5m 50.1s	-4.0% (noise)
Index size	1,189 MB	1,227 MB	+3.2% (+38 MB)
Query latency (avg)	10.63 ms	10.33 ms	-2.8% (noise)

The +38 MB (+3.2%) index size increase is expected from the wider offsets. Build time and query latency differences are within run-to-run variance.

Key design decisions

Dual-struct strategy: V3 legacy structs are preserved read-only. On read, version is detected from the header magic, and V3 fields are widened to V4. On write, V4 is always emitted.
Extern version-aware readers: tp_segment_read_dict_entry() and tp_segment_read_skip_entry() centralize V3→V4 dispatch, avoiding duplicated version logic across scan.c, merge.c, and dump.c.
Packed skip entries: TpSkipEntry remains __attribute__((packed)) to minimize posting index overhead (20 bytes vs 24 if aligned).
Bounded parallel build memory: Each merge cycle uses a dedicated MemoryContext so allocations are truly freed. DSA is trimmed after clearing worker buffers and has a size limit based on worker count.

Files changed

File	Changes
`src/segment/segment.h`	V3 legacy structs, V4 widened structs, version constants, helpers
`src/segment/pagemapper.h`	Widen inline functions to `uint64`
`src/segment/segment.c`	Version-aware open, widened read/write, extern `read_dict_entry`, PRIu64 format strings
`src/segment/scan.c`	Version-aware dict/skip entry reads via centralized functions
`src/segment/merge.c`	Widened offset fields, version-aware source reads, V4 output, plain palloc for vocabulary
`src/segment/docmap.c`	`palloc_extended` with `MCXT_ALLOC_HUGE` for large docmap allocations
`src/am/build_parallel.c`	Widened offset fields, V4 output, per-cycle memory context, DSA trim/limit, worker cap, plain palloc for vocabulary
`src/query/bmw.c`	Updated expected spill count in BMW test
`test/expected/bmw.out`	Updated expected test output

Testing

make compiles without warnings
make format-check passes
make installcheck passes
Benchmark on 8.8M MS MARCO: +3.2% index size, query performance within noise
138M document index build verifies correct uint64 offsets on ~33GB merged segment
Queries return correct results on the large index (no offset overflow)

CLAassistant · 2026-02-11T01:14:33Z

All committers have signed the CLA.

The segment format used uint32 for all logical byte offsets, limiting segments to 4GB of data. With 138M documents, the merged L1 segment exceeds this at ~33GB, causing offset overflow and corrupted query results. This widens all offsets to uint64 while preserving backward compatibility with V3 segments. Dual-struct strategy: V3 legacy structs (read-only) and V4 structs (read/write). On read, version is detected from header and V3 fields are widened to uint64. On write, V4 is always emitted. Natural compaction upgrades V3 segments to V4 over time. Changes: - Add V3 legacy structs (TpSegmentHeaderV3, TpDictEntryV3, TpSkipEntryV3) - Update V4 structs: TpSegmentHeader (88→128 bytes), TpDictEntry (12→16 bytes), TpSkipEntry (16→20 bytes) - Version-aware readers: tp_segment_read_dict_entry(), tp_segment_read_skip_entry() handle V3/V4 transparently - Widen tp_segment_read/get_direct to uint64 logical_offset - Widen pagemapper inline functions to uint64 - V4 write paths in segment.c, merge.c, and build_parallel.c - Version-aware dump functions with PRIu64 format strings - Use palloc_extended with MCXT_ALLOC_HUGE in docmap for >2GB allocations Overhead: ~1.1% on a 33GB index (skip entries 16→20 bytes, dict entries 12→16 bytes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace WRITE_POSTING_BLOCKS macro in build_parallel.c with a static write_posting_blocks() function and always-chunked processing. This eliminates the two-path branching (bulk vs chunked) by always using chunked collection with a large enough chunk size (8M postings) that most terms complete in a single iteration. Removes the now-unused collect_buffer_term_postings() and estimate_buffer_term_postings() functions. - Fix missing header.data_size assignment in build_parallel.c (pre- existing bug where data_size was never set, leaving it as 0). - Change tp_segment_read_skip_entry() to accept uint64 skip_index_offset directly instead of TpDictEntry*. This removes the fragile tmp_dict workaround in merge.c where a temporary TpDictEntry was constructed just to pass a single field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…it, and worker cap The parallel index build on large datasets (138M MS MARCO v2 passages) consumed unbounded memory because: (1) leader-side palloc/pfree never returned pages to the OS, (2) DSA grew monotonically since dsa_free() only recycles internally, (3) no DSA size limit was set, and (4) writer.pages was leaked each merge cycle. - Wrap each merge cycle in a dedicated MemoryContext so all leader allocations are truly freed when the context is deleted - Call dsa_trim() after clearing worker buffers to release unused DSA segments back to the OS - Set dsa_set_size_limit() based on worker count and spill threshold so workers get a clear ERROR instead of OOMing the machine - Cap worker count to maintenance_work_mem / 32MB per worker - Free writer.pages after tp_segment_writer_finish() (matching merge.c) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

V4's wider uint64 offsets increase segment overhead slightly, causing 500-row hybrid test to produce 8 spilled entries instead of 7.

Use plain palloc/repalloc instead of MCXT_ALLOC_HUGE so that MaxAllocSize trips visibly rather than silently allowing 1GB+ allocations. Add explicit casts for multilevel pointer conversions.

The limit was too restrictive: workers ignore maintenance_work_mem and each buffer can hold up to tp_memtable_spill_threshold postings (~384 MB), so with 4 workers the 1 GB floor was easily exceeded. Proper memory bounding requires reworking the spill threshold to respect maintenance_work_mem, which is a larger follow-up.

tjgreen42 force-pushed the feat/v4-uint64-segment-offsets branch 3 times, most recently from cd4cb7c to 1eea22d Compare February 12, 2026 17:16

tjgreen42 and others added 5 commits February 17, 2026 11:16

fix: update bmw test expected spill count for V4 format

c1bb42b

V4's wider uint64 offsets increase segment overhead slightly, causing 500-row hybrid test to produce 8 spilled entries instead of 7.

fix: remove HUGE allocations from merge vocabulary arrays

13dc08d

Use plain palloc/repalloc instead of MCXT_ALLOC_HUGE so that MaxAllocSize trips visibly rather than silently allowing 1GB+ allocations. Add explicit casts for multilevel pointer conversions.

tjgreen42 force-pushed the feat/v4-uint64-segment-offsets branch from dfa9870 to 13dc08d Compare February 17, 2026 19:16

tjgreen42 marked this pull request as ready for review February 17, 2026 19:27

tjgreen42 added 2 commits February 17, 2026 11:50

chore: remove DEBUG1 elog from parallel build DSA limit

957c07c

tjgreen42 merged commit 45437f1 into main Feb 17, 2026
19 checks passed

tjgreen42 deleted the feat/v4-uint64-segment-offsets branch February 17, 2026 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: widen segment offsets from uint32 to uint64 (V4 format)#220

feat: widen segment offsets from uint32 to uint64 (V4 format)#220
tjgreen42 merged 7 commits intomainfrom
feat/v4-uint64-segment-offsets

tjgreen42 commented Feb 11, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tjgreen42 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Format changes

Benchmark: MS MARCO (8.8M passages)

Key design decisions

Files changed

Testing

Uh oh!

CLAassistant commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tjgreen42 commented Feb 11, 2026 •

edited

Loading

CLAassistant commented Feb 11, 2026 •

edited

Loading