-
Notifications
You must be signed in to change notification settings - Fork 94
VACUUM: add deleted-document bitmap for efficient dead tuple tracking #264
Copy link
Copy link
Open
Description
Summary
Currently, tp_bulkdelete iterates all indexed CTIDs and calls the VACUUM callback but discards the return value—dead entries are never removed from the index. This causes a correctness bug when Postgres reuses CTIDs after VACUUM: the index can return completely wrong rows for queries.
The CTID reuse problem
- Row A at CTID (42,5) is indexed with term "database"
- Row A is deleted, plain VACUUM runs
ambulkdeleteis supposed to remove dead index entries before VACUUM frees heap line pointers for reuse—but we don't- New row B gets CTID (42,5), contains "networking"
- Index says (42,5) matches "database"—Postgres fetches row B, visibility check passes (row B is live), wrong row returned
For ORDER BY ... LIMIT queries (our most common pattern), Postgres trusts the index ordering and doesn't re-evaluate the <@> operator against the heap tuple. This is a false positive, not just a stale score.
Immediate fix (separate PR)
Rebuild affected segments during ambulkdelete using the existing bulk-indexing codepath. Correct but expensive for large segments with few deletions.
This ticket: deleted-document bitmap
A more efficient long-term solution:
- During
ambulkdelete, record dead doc IDs in a per-segment bitmap (Roaring or simple bitset) - Store the bitmap alongside the segment (e.g., dedicated pages after the segment data)
- During scan/BMW iteration, check the bitmap to skip dead docs
- During segment merge/compaction, use the bitmap to omit dead entries from the output segment—this is the natural cleanup point
Benefits over segment rebuild
- O(dead_docs) work during VACUUM instead of O(all_docs_in_segment)
- No re-tokenization of live documents
- Fixes the LIMIT shortfall (over-fetch by bitmap cardinality to compensate for filtered docs)
- Improves query performance (skip dead docs during BMW iteration rather than scoring them)
Statistics considerations (not blocking)
total_docsand per-termdoc_freqin segment dictionaries are not updated after deletes, which affects BM25 IDF calculation- For small numbers of deletes this is negligible; mass deletion of documents with specific terms can skew ranking
- Fixing this properly requires either updating segment metadata during VACUUM or recomputing during merge
References
- OPTIMIZATION_ROADMAP.md "DELETE Handling" section and Phase 4.1 (Roaring Bitmaps)
- Lucene uses a similar approach: per-segment live-docs bitset, merged during segment compaction
- Tantivy:
DeleteBitSetper segment
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels