Skip to content

fix: TOCTOU race in parallel build loses documents#240

Merged
tjgreen42 merged 1 commit intomainfrom
fix/parallel-build-toctou
Feb 25, 2026
Merged

fix: TOCTOU race in parallel build loses documents#240
tjgreen42 merged 1 commit intomainfrom
fix/parallel-build-toctou

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Feb 25, 2026

Summary

  • Fix a TOCTOU race condition in tp_leader_process_buffers() that can silently lose documents during parallel index builds. The leader collected ready buffers before checking workers_done, allowing a window where a worker marks its final buffer READY and increments workers_done between the two reads — causing the leader to exit without processing the final buffer.
  • Fix a latent deadlock where shared->nworkers was set to the requested worker count but never updated to the actually launched count. If Postgres launched fewer workers, workers_done could never reach nworkers.
  • Update expected test output for the build-completed NOTICE format change from feat: reduce build chatter for partitioned tables #214.

The race

Leader:                              Worker (finishing):
                                     1. Mark final buffer READY
                                     2. SpinLock → workers_done++
1. Collect buffers → num_ready=0
   (missed the READY buffer)
2. Check all_done → true
3. num_ready(0) > 0 → false → skip
4. while(!all_done) → EXIT
   *** FINAL BUFFER NEVER PROCESSED ***

The fix

Check workers_done first (under SpinLock, providing the memory barrier), then collect buffers. Workers set buffer READY before the SpinLock acquire around workers_done++, so after the leader's SpinLock release, all final buffers are guaranteed visible. The loop now exits only when all_done && num_ready == 0.

Impact

On MS-MARCO (8.8M passages), this race non-deterministically lost ~4M documents (~45% of the corpus). One CI run indexed only 4,834,444 documents while the same data loaded correctly in another run (8,841,770 documents). The root cause was discovered while investigating benchmark validation failures in #239.

Testing

Regression tests pass (47/47). The race is non-deterministic so it cannot be reliably reproduced in unit tests — validation should be done via the MS-MARCO benchmark pipeline which exercises parallel builds at scale.

The leader's buffer processing loop collected ready buffers BEFORE
checking workers_done. A worker marking its final buffer READY and
incrementing workers_done between these two reads caused the leader
to see all_done=true with num_ready=0, exiting without processing
the final buffer. With large datasets this could silently lose
millions of documents -- we observed ~4M of 8.8M documents lost on
MS-MARCO.

Fix by checking workers_done FIRST (under SpinLock, providing the
necessary memory barrier), THEN collecting buffers. Workers set
buffer READY before the SpinLock acquire around workers_done++, so
after the leader's SpinLock release, all final buffers are
guaranteed visible. The loop now exits only when all_done AND
num_ready == 0.

Also fix a latent deadlock: shared->nworkers was set to the
requested count but never updated to the actual launched count.
If Postgres launched fewer workers, workers_done could never
reach nworkers.
@tjgreen42 tjgreen42 force-pushed the fix/parallel-build-toctou branch from e225608 to d6b27ea Compare February 25, 2026 01:59
tjgreen42 added a commit that referenced this pull request Feb 25, 2026
Port applicable fixes from PR #240 to the rewritten parallel build:

1. Error out if no parallel workers could be launched, preventing a
   division-by-zero in the TID range block assignment.

2. Use nworkers_launched (actual count) instead of nworkers (requested
   count) in worker_execute_merge_group() and worker memory budget
   calculation. When Postgres launches fewer workers than requested,
   the old code over-divided the memory budget and iterated over
   non-existent worker results.

The TOCTOU race fix from PR #240 does not apply — the rewritten
two-phase barrier design has no leader buffer processing loop.
@tjgreen42 tjgreen42 marked this pull request as ready for review February 25, 2026 17:00
@tjgreen42 tjgreen42 merged commit b75e7e3 into main Feb 25, 2026
15 checks passed
@tjgreen42 tjgreen42 deleted the fix/parallel-build-toctou branch February 25, 2026 17:00
tjgreen42 added a commit that referenced this pull request Feb 26, 2026
Port applicable fixes from PR #240 to the rewritten parallel build:

1. Error out if no parallel workers could be launched, preventing a
   division-by-zero in the TID range block assignment.

2. Use nworkers_launched (actual count) instead of nworkers (requested
   count) in worker_execute_merge_group() and worker memory budget
   calculation. When Postgres launches fewer workers than requested,
   the old code over-divided the memory budget and iterated over
   non-existent worker results.

The TOCTOU race fix from PR #240 does not apply — the rewritten
two-phase barrier design has no leader buffer processing loop.
tjgreen42 added a commit that referenced this pull request Feb 26, 2026
Port applicable fixes from PR #240 to the rewritten parallel build:

1. Error out if no parallel workers could be launched, preventing a
   division-by-zero in the TID range block assignment.

2. Use nworkers_launched (actual count) instead of nworkers (requested
   count) in worker_execute_merge_group() and worker memory budget
   calculation. When Postgres launches fewer workers than requested,
   the old code over-divided the memory budget and iterated over
   non-existent worker results.

The TOCTOU race fix from PR #240 does not apply — the rewritten
two-phase barrier design has no leader buffer processing loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant