db/state: move refcnt to visibleFiles object#21397
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors db/state snapshot-file deletion to avoid TOCTOU/double-free hazards by moving from per-FilesItem refcount/canDelete gating to generation-based (bundle) refcounting with oldest-first reclamation of retired files.
Changes:
- Add generation-chained
aggregatorVisiblebundles with a singlerefcntpin per reader, plus aretiredset reclaimed when the oldest drained generation advances. - Update merge/cleanup paths to “retire” files (remove from
dirtyFiles) and defer physical deletion to the bundle reclaimer. - Update debug pinning and GC tests to validate deferred deletion behavior under both normal readers and debug accessor-building readers.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/plans/20260525-lockfree-file-reclamation-spec.md | New design spec describing generation-based reclamation and invariants. |
| db/state/aggregator.go | Implements bundle refcnt pinning, generation chain, retired-file reclamation, and updated merge cleanup plumbing. |
| db/state/dirty_files.go | Replaces deleteMergeFile with retireMergeFiles and removes per-visible-file refcount increment/decrement helpers. |
| db/state/merge.go | Threads “retired files” up from domain/history/II cleanup so aggregator can publish+defer deletion. |
| db/state/domain.go | Removes per-file refcount pin/unpin in domain RO tx lifecycle (now covered by bundle pin). |
| db/state/history.go | Removes per-file refcount pin/unpin in history RO tx lifecycle (now covered by bundle pin). |
| db/state/inverted_index.go | Removes per-file refcount pin/unpin in II RO tx lifecycle (now covered by bundle pin). |
| db/state/aggregator_debug.go | Debug dirty-files RO tx now pins the current bundle generation instead of per-file refcounting. |
| db/state/gc_test.go | New tests verifying deferred deletion semantics for normal readers, debug pins, and concurrent reclaim. |
| db/state/aggregator_debug_test.go | Updates test to assert debug Close is not a deleter (reclaimer is sole deleter). |
| db/state/squeeze.go | Wraps recalcVisibleFiles(nil) with dirtyFilesLock where needed and updates signature calls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TestDirtyFilesRoTx_CloseIsNotADeleter leaked the seg.Decompressor FD, so on Windows t.TempDir cleanup could not unlink the still-open file. Close it via defer (the Close() under test must not delete the file, only release the FD). Also refresh a stale "via FilesItem.refcount" comment — the debug RoTx now pins the bundle generation.
refcnt to visibleFiles object
refcnt to visibleFiles objectrefcnt to visibleFiles object
refcnt to visibleFiles objectrefcnt to visibleFiles object
refcnt to visibleFiles objectrefcnt to visibleFiles object
yperbasis
left a comment
There was a problem hiding this comment.
Approving — single-owner reclamation is the right fix, and TestHistoryVerification_SimpleBlocks is clean under -race -count=5 locally. Three non-blocking nits:
-
The
refcount==0 && canDelete→closeFilesAndRemove()TOCTOU (this PR's whole point) still lives in the out-of-scope forkable subsystem —snap_repo.go:119/:123andproto_forkable.go:332-333, plus theFilesItem.refcount/canDeletefields they keep alive. The double-free class survives there; worth a follow-up to fix/remove forkable or a comment asserting it's dead. -
checkForVisibility'scanDeleteguard (dirty_files.go:685) is now dead on the agg path (canDeleteis never set after this PR) — the spec listed it for removal. -
openFolder()callsrecalcVisibleFilesrelying on the caller holdingdirtyFilesLock, but unlikereclaimRetiredLockednothing in its name/comment signals that — a future caller could regress it. A one-line comment or aLockedsuffix would pin it down.
Context
Aggregator.BeginFilesRo()was made lock-free in #20462/#20490, but physicalfile deletion stayed gated by two per-
FilesItematomics (refcount+canDelete).Two atomics guarding one destructive action (
closeFilesAndRemove) is the TOCTOUdouble-free behind #21384, and a per-file refcount taken after the snapshot
pointer is loaded can't protect the load→pin window by itself.
What this does
Replaces per-file
refcount/canDeletewith MVCC reclamation gated by a refcounton the published bundle (
aggregatorVisible) — the MDBX freelist model (a pagefreed at txnid
Tis reclaimable once the oldest live reader's txnid> T),realized in Go by reference-counting the generation object instead of each file.
refcnt.refcntonly grows while a bundle is current, only shrinks once superseded.BeginFilesRodoes validate-after-pin (one atomic add + re-check), closing theload→pin window. One add instead of dozens of per-file increments.
dirtyFilesby a merge/prune are attached to the outgoinggeneration's
retiredset and physically deleted only once that generation(and every older one) drains — reclaimed oldest-first, single owner of
closeFilesAndRemove, no per-file flag, no double-free.DebugBeginDirtyFilesRo(BuildMissedAccessors) pins the generation the same way,so its captured dirty files — including unindexed ones absent from the visible
set — are protected for the duration of the accessor build.
FilesItem.refcount/canDeleteare now used only by the forkable subsystem(out of scope here).
Design + file lifecycle:
docs/plans/20260525-lockfree-file-reclamation-spec.md.Status
WIP. Validated locally:
db/state/...under-race(no data races),make lint,make erigon integration.