Skip to content

db/state: move refcnt to visibleFiles object#21397

Merged
taratorio merged 15 commits into
mainfrom
alex/think_on_begin_35
May 27, 2026
Merged

db/state: move refcnt to visibleFiles object#21397
taratorio merged 15 commits into
mainfrom
alex/think_on_begin_35

Conversation

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

Context

Aggregator.BeginFilesRo() was made lock-free in #20462/#20490, but physical
file deletion stayed gated by two per-FilesItem atomics (refcount + canDelete).
Two atomics guarding one destructive action (closeFilesAndRemove) is the TOCTOU
double-free behind #21384, and a per-file refcount taken after the snapshot
pointer is loaded can't protect the load→pin window by itself.

What this does

Replaces per-file refcount/canDelete with MVCC reclamation gated by a refcount
on the published bundle (aggregatorVisible) — the MDBX freelist model (a page
freed at txnid T is reclaimable once the oldest live reader's txnid > T),
realized in Go by reference-counting the generation object instead of each file.

  • Published bundles form an oldest→newest chain; a reader pins exactly one via
    refcnt. refcnt only grows while a bundle is current, only shrinks once superseded.
  • BeginFilesRo does validate-after-pin (one atomic add + re-check), closing the
    load→pin window. One add instead of dozens of per-file increments.
  • Files removed from dirtyFiles by a merge/prune are attached to the outgoing
    generation's retired set and physically deleted only once that generation
    (and every older one) drains — reclaimed oldest-first, single owner of
    closeFilesAndRemove, no per-file flag, no double-free.
  • DebugBeginDirtyFilesRo (BuildMissedAccessors) pins the generation the same way,
    so its captured dirty files — including unindexed ones absent from the visible
    set — are protected for the duration of the accessor build.

FilesItem.refcount/canDelete are now used only by the forkable subsystem
(out of scope here).

Design + file lifecycle: docs/plans/20260525-lockfree-file-reclamation-spec.md.

Status

WIP. Validated locally: db/state/... under -race (no data races), make lint,
make erigon integration.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors db/state snapshot-file deletion to avoid TOCTOU/double-free hazards by moving from per-FilesItem refcount/canDelete gating to generation-based (bundle) refcounting with oldest-first reclamation of retired files.

Changes:

  • Add generation-chained aggregatorVisible bundles with a single refcnt pin per reader, plus a retired set reclaimed when the oldest drained generation advances.
  • Update merge/cleanup paths to “retire” files (remove from dirtyFiles) and defer physical deletion to the bundle reclaimer.
  • Update debug pinning and GC tests to validate deferred deletion behavior under both normal readers and debug accessor-building readers.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/plans/20260525-lockfree-file-reclamation-spec.md New design spec describing generation-based reclamation and invariants.
db/state/aggregator.go Implements bundle refcnt pinning, generation chain, retired-file reclamation, and updated merge cleanup plumbing.
db/state/dirty_files.go Replaces deleteMergeFile with retireMergeFiles and removes per-visible-file refcount increment/decrement helpers.
db/state/merge.go Threads “retired files” up from domain/history/II cleanup so aggregator can publish+defer deletion.
db/state/domain.go Removes per-file refcount pin/unpin in domain RO tx lifecycle (now covered by bundle pin).
db/state/history.go Removes per-file refcount pin/unpin in history RO tx lifecycle (now covered by bundle pin).
db/state/inverted_index.go Removes per-file refcount pin/unpin in II RO tx lifecycle (now covered by bundle pin).
db/state/aggregator_debug.go Debug dirty-files RO tx now pins the current bundle generation instead of per-file refcounting.
db/state/gc_test.go New tests verifying deferred deletion semantics for normal readers, debug pins, and concurrent reclaim.
db/state/aggregator_debug_test.go Updates test to assert debug Close is not a deleter (reclaimer is sole deleter).
db/state/squeeze.go Wraps recalcVisibleFiles(nil) with dirtyFilesLock where needed and updates signature calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/state/aggregator_debug_test.go
Comment thread db/state/gc_test.go Outdated
TestDirtyFilesRoTx_CloseIsNotADeleter leaked the seg.Decompressor FD, so on
Windows t.TempDir cleanup could not unlink the still-open file. Close it via
defer (the Close() under test must not delete the file, only release the FD).
Also refresh a stale "via FilesItem.refcount" comment — the debug RoTx now pins
the bundle generation.
@AskAlexSharov AskAlexSharov changed the title [wip] db/state: bundle-refcount snapshot file reclamation [wip] db/state: move refcnt to visibleFiles object May 25, 2026
@AskAlexSharov AskAlexSharov changed the title [wip] db/state: move refcnt to visibleFiles object db/state: move refcnt to visibleFiles object May 27, 2026
@AskAlexSharov AskAlexSharov changed the title db/state: move refcnt to visibleFiles object [wip] db/state: move refcnt to visibleFiles object May 27, 2026
@AskAlexSharov AskAlexSharov changed the title [wip] db/state: move refcnt to visibleFiles object db/state: move refcnt to visibleFiles object May 27, 2026

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — single-owner reclamation is the right fix, and TestHistoryVerification_SimpleBlocks is clean under -race -count=5 locally. Three non-blocking nits:

  1. The refcount==0 && canDeletecloseFilesAndRemove() TOCTOU (this PR's whole point) still lives in the out-of-scope forkable subsystem — snap_repo.go:119/:123 and proto_forkable.go:332-333, plus the FilesItem.refcount/canDelete fields they keep alive. The double-free class survives there; worth a follow-up to fix/remove forkable or a comment asserting it's dead.

  2. checkForVisibility's canDelete guard (dirty_files.go:685) is now dead on the agg path (canDelete is never set after this PR) — the spec listed it for removal.

  3. openFolder() calls recalcVisibleFiles relying on the caller holding dirtyFilesLock, but unlike reclaimRetiredLocked nothing in its name/comment signals that — a future caller could regress it. A one-line comment or a Locked suffix would pin it down.

@yperbasis yperbasis requested a review from JkLondon May 27, 2026 09:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comment thread db/state/aggregator.go
Comment thread db/state/aggregator.go
Comment thread db/state/merge.go
Comment thread db/state/merge.go
Comment thread docs/plans/20260525-lockfree-file-reclamation-spec.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants