feat: leader-only merge for parallel index build#244
Merged
Conversation
Replace the two-phase parallel build (worker compaction + work-stealing merge groups) with a simpler architecture: workers flush L0 segments to BufFiles without compaction, then the leader performs a single N-way merge of all segments directly to paged storage. This produces a single segment per index build, which is optimal for query performance (no multi-segment scanning needed). Removes ~800 lines of complexity: worker_maybe_compact_level, plan_merge_groups, compute_total_pages_needed, worker_execute_merge_group, write_temp_segment_to_index_parallel, and all Phase 2 worker code (COPY + work-steal).
- Move FlushRelationBuffers before metapage update for consistency with merge.c pattern (segment data durable before metadata) - Use merge_source_close() instead of manual pfree for cleanup - NULL readers[] slot when source takes ownership to prevent double-close - Remove dead segment_count field from TpParallelWorkerResult
Remove code rendered dead by the leader-only merge refactor: - Remove BufFile write path from merge sink (is_buffile branches, buffile_write_at, merge_sink_init_buffile, merge_sink_init_pages_parallel) - Remove atomic page counter from TpSegmentWriter (page_counter field, tp_segment_writer_init_parallel, write_page_index_with_counter, tp_page_index_entries_per_page) - Remove dead includes and extern from build_parallel.c - Simplify TpMergeSink struct to pages-only fields 5 files changed, -209 lines net.
The "merging N segments from M workers" message is noisy and not useful for end users. The launched-workers message already indicates parallel build is active.
68177d8 to
ee9c738
Compare
tjgreen42
added a commit
that referenced
this pull request
Mar 3, 2026
## Summary - Update comparison page with results from benchmark run [22642807624](https://github.com/timescale/pg_textsearch/actions/runs/22642807624) - Overall throughput improved from 2.8x to 3.2x faster than System X - Build time gap narrowed from 2.0x to 1.6x (270s → 234s) - Key improvements since Feb 9: SIMD bitpack decoding (#250), stack-allocated decode buffers (#253), BMW term state pointer indirection (#249), arena allocator rewrite (#231), leader-only merge (#244) ## Testing - Numbers extracted from benchmark run on commit 1b09cc9 - gh-pages branch also needs updating (will push after merge)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Simplify parallel index build by replacing the complex two-phase architecture
(worker compaction + work-stealing merge groups) with a single leader-only
N-way merge.
How it works now:
What was removed (~800 lines):
worker_maybe_compact_level)plan_merge_groups,compute_total_pages_needed)worker_execute_merge_group)Benchmark Results
GitHub CI Benchmarks
All validations PASSED.
Local MS-MARCO v2 (138M rows, 4 workers)
Test plan
make installcheck— all 48 SQL tests pass