Skip to content

VACUUM: add deleted-document bitmap for efficient dead tuple tracking #264

@tjgreen42

Description

@tjgreen42

Summary

Currently, tp_bulkdelete iterates all indexed CTIDs and calls the VACUUM callback but discards the return value—dead entries are never removed from the index. This causes a correctness bug when Postgres reuses CTIDs after VACUUM: the index can return completely wrong rows for queries.

The CTID reuse problem

  1. Row A at CTID (42,5) is indexed with term "database"
  2. Row A is deleted, plain VACUUM runs
  3. ambulkdelete is supposed to remove dead index entries before VACUUM frees heap line pointers for reuse—but we don't
  4. New row B gets CTID (42,5), contains "networking"
  5. Index says (42,5) matches "database"—Postgres fetches row B, visibility check passes (row B is live), wrong row returned

For ORDER BY ... LIMIT queries (our most common pattern), Postgres trusts the index ordering and doesn't re-evaluate the <@> operator against the heap tuple. This is a false positive, not just a stale score.

Immediate fix (separate PR)

Rebuild affected segments during ambulkdelete using the existing bulk-indexing codepath. Correct but expensive for large segments with few deletions.

This ticket: deleted-document bitmap

A more efficient long-term solution:

  1. During ambulkdelete, record dead doc IDs in a per-segment bitmap (Roaring or simple bitset)
  2. Store the bitmap alongside the segment (e.g., dedicated pages after the segment data)
  3. During scan/BMW iteration, check the bitmap to skip dead docs
  4. During segment merge/compaction, use the bitmap to omit dead entries from the output segment—this is the natural cleanup point

Benefits over segment rebuild

  • O(dead_docs) work during VACUUM instead of O(all_docs_in_segment)
  • No re-tokenization of live documents
  • Fixes the LIMIT shortfall (over-fetch by bitmap cardinality to compensate for filtered docs)
  • Improves query performance (skip dead docs during BMW iteration rather than scoring them)

Statistics considerations (not blocking)

  • total_docs and per-term doc_freq in segment dictionaries are not updated after deletes, which affects BM25 IDF calculation
  • For small numbers of deletes this is negligible; mass deletion of documents with specific terms can skew ranking
  • Fixing this properly requires either updating segment metadata during VACUUM or recomputing during merge

References

  • OPTIMIZATION_ROADMAP.md "DELETE Handling" section and Phase 4.1 (Roaring Bitmaps)
  • Lucene uses a similar approach: per-segment live-docs bitset, merged during segment compaction
  • Tantivy: DeleteBitSet per segment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions