[SharovBot] fix data race in FilesItem.closeFilesAndRemove#21384
Closed
erigon-copilot[bot] wants to merge 1 commit into
Closed
[SharovBot] fix data race in FilesItem.closeFilesAndRemove#21384erigon-copilot[bot] wants to merge 1 commit into
erigon-copilot[bot] wants to merge 1 commit into
Conversation
Use sync.Once to ensure closeFilesAndRemove executes at most once per FilesItem, preventing concurrent close from deleteMergeFile and refcntDecrement paths racing on the same instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>
AskAlexSharov
requested changes
May 24, 2026
Member
|
Closing in favour of PR #21397 |
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
May 27, 2026
## Context `Aggregator.BeginFilesRo()` was made lock-free in erigontech#20462/erigontech#20490, but physical file deletion stayed gated by two per-`FilesItem` atomics (`refcount` + `canDelete`). Two atomics guarding one destructive action (`closeFilesAndRemove`) is the TOCTOU double-free behind erigontech#21384, and a per-file refcount taken *after* the snapshot pointer is loaded can't protect the load→pin window by itself. ## What this does Replaces per-file `refcount`/`canDelete` with MVCC reclamation gated by a refcount on the published bundle (`aggregatorVisible`) — the MDBX freelist model (a page freed at txnid `T` is reclaimable once the oldest live reader's txnid `> T`), realized in Go by reference-counting the generation object instead of each file. - Published bundles form an oldest→newest chain; a reader pins exactly one via `refcnt`. `refcnt` only grows while a bundle is current, only shrinks once superseded. - `BeginFilesRo` does validate-after-pin (one atomic add + re-check), closing the load→pin window. One add instead of dozens of per-file increments. - Files removed from `dirtyFiles` by a merge/prune are attached to the outgoing generation's `retired` set and physically deleted only once that generation (and every older one) drains — reclaimed oldest-first, single owner of `closeFilesAndRemove`, no per-file flag, no double-free. - `DebugBeginDirtyFilesRo` (BuildMissedAccessors) pins the generation the same way, so its captured dirty files — including unindexed ones absent from the visible set — are protected for the duration of the accessor build. `FilesItem.refcount`/`canDelete` are now used only by the forkable subsystem (out of scope here). Design + file lifecycle: `docs/plans/20260525-lockfree-file-reclamation-spec.md`. ## Status WIP. Validated locally: `db/state/...` under `-race` (no data races), `make lint`, `make erigon integration`. --------- Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
(*FilesItem).closeFilesAndRemove()detected by-raceinTestHistoryVerification_SimpleBlocksdeleteMergeFile(setscanDelete=true, checksrefcount.Load()==0) andrefcntDecrement(doesrefcount.Add(-1)==0 && canDelete.Load()), allowing both paths to concurrently entercloseFilesAndRemove()on the sameFilesItemsync.Oncefield toFilesItemand wrapcloseFilesAndRemove()body inOnce.Do(), ensuring at-most-once execution regardless of concurrent callersTest plan
go test -race -run TestHistoryVerification_SimpleBlocks -count=5 ./execution/verify/...— all 5 runs pass, no DATA RACE warningsgo build ./...passes🤖 Generated with Claude Code