Skip to content

fix: use min fieldnorm for BMW skip entries in parallel build#230

Merged
tjgreen42 merged 1 commit intomainfrom
fix/parallel-build-block-max-norm
Feb 18, 2026
Merged

fix: use min fieldnorm for BMW skip entries in parallel build#230
tjgreen42 merged 1 commit intomainfrom
fix/parallel-build-block-max-norm

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Feb 18, 2026

Summary

  • Fix write_posting_blocks() in parallel build to compute MIN fieldnorm (shortest doc) instead of MAX (longest doc) per block
  • The block_max_norm skip entry field must store the minimum fieldnorm so BMW computes valid score upper bounds; using maximum caused BMW to incorrectly skip blocks containing high-scoring short documents
  • The serial build (segment.c) and merge (merge.c) paths already used min_norm correctly — this aligns the parallel build path
  • Add parallel_bmw regression test that deterministically reproduces the bug: medium-length docs set the BMW threshold in early blocks, then mixed short+long doc blocks follow where the wrong upper bound causes BMW to skip them

Test plan

  • parallel_bmw test fails deterministically without fix (0 short docs in top-10), passes with fix (10 short docs)
  • Verified 5/5 stable passes for bmw + parallel_build tests
  • Full regression suite passes (only pre-existing binary_io failure)
  • make format-check passes

The parallel build's write_posting_blocks() computed MAX fieldnorm per
block instead of MIN. The block_max_norm field stores the minimum
fieldnorm (shortest document) so BMW can compute a valid upper bound
on block scores. Using the maximum (longest document) produced
artificially low upper bounds, causing BMW to incorrectly skip blocks
containing high-scoring short documents.

The serial build (segment.c) and merge (merge.c) paths already used
min_norm correctly. This fix aligns the parallel build path.

Add a regression test (parallel_bmw) that deterministically reproduces
the bug using a 3-tier document design: medium-length docs establish
the BMW threshold in early blocks, then mixed short+long doc blocks
follow. With the wrong MAX fieldnorm, the upper bound for the mixed
blocks is based on the long docs and falls below the threshold, causing
BMW to skip them entirely and miss the short docs that should rank
highest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tjgreen42 tjgreen42 merged commit b37eaab into main Feb 18, 2026
15 checks passed
@tjgreen42 tjgreen42 deleted the fix/parallel-build-block-max-norm branch February 18, 2026 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant