fix: VACUUM correctly removes dead index entries#267
Merged
Conversation
56e05e8 to
a4b7e1c
Compare
Reproduces the correctness issue where plain VACUUM does not remove dead index entries, leading to false positives when CTIDs are reused by new inserts.
Replaces the old no-op bulkdelete (which discarded callback results) with a correctness fix that: 1. Spills memtable to segments 2. Identifies segments containing dead CTIDs (O(segments) memory) 3. Rebuilds affected segments from heap via TpBuildContext 4. Updates metapage statistics This prevents false positives when Postgres reuses CTIDs after VACUUM frees heap line pointers.
All four test scenarios pass: - CTID reuse: alpha_matches=0 (fix confirmed) - All-deleted segment removal works - Multi-level vacuum preserves remaining docs - No-op vacuum path works Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The original test used WHERE <@> < 0 which triggers a seqscan that re-evaluates the operator on the heap tuple, masking the bug. The fix uses ORDER BY <@> LIMIT which triggers an index scan where xs_recheck=false causes Postgres to trust the stale index entries. Verified: test produces alpha_false_positives=1 without the fix, and alpha_false_positives=0 with the fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DatumGetTextPP may detoast, allocating memory. By calling it inside per_doc_ctx, the detoasted datum is freed on each MemoryContextReset instead of accumulating in the outer context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the complex alpha_false_positives subquery pattern with a direct SELECT that expects 0 rows. The index scan returns nothing for deleted terms when the fix is working correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e3c7148 to
481b7d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
entries, leading to false positives when Postgres reuses CTIDs
tp_bulkdelete:disjoint_sources=falseinvariant inmerge path
Limitations
with few deletes). VACUUM: add deleted-document bitmap for efficient dead tuple tracking #264 tracks the efficient bitmap approach.
Testing
vacuum_rebuildregression test covering: