RoSnapshots: lock-free .View method#20490
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors db/snapshotsync.RoSnapshots to make View() (and other visible-segment reads) lock-free by publishing a single atomically-swapped “visible” snapshot, similar in spirit to PR #20462 but applied to rosnapshots.
Changes:
- Replace
visibleLock + visible []VisibleSegmentswithatomic.Pointer[snapshotVisible]and publish recomputed visibility via atomic swap. - Derive
SegmentsMax()from the published visible snapshot instead of maintaining a separate atomic updated during segment opening. - Update and extend tests to use the new atomic-visible access pattern and add regressions around
SegmentsMaxand view pinning across generations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| db/snapshotsync/snapshots.go | Introduces atomically-published snapshotVisible, removes visibleLock, makes View() lock-free, and recalculates/publishes visibility + segmentsMax together. |
| db/snapshotsync/snapshots_test.go | Updates tests for atomic-visible access; adds regression tests for SegmentsMax visibility semantics and view pinning across recalcs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
JkLondon
approved these changes
Apr 14, 2026
This was referenced May 24, 2026
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
May 27, 2026
## Context `Aggregator.BeginFilesRo()` was made lock-free in erigontech#20462/erigontech#20490, but physical file deletion stayed gated by two per-`FilesItem` atomics (`refcount` + `canDelete`). Two atomics guarding one destructive action (`closeFilesAndRemove`) is the TOCTOU double-free behind erigontech#21384, and a per-file refcount taken *after* the snapshot pointer is loaded can't protect the load→pin window by itself. ## What this does Replaces per-file `refcount`/`canDelete` with MVCC reclamation gated by a refcount on the published bundle (`aggregatorVisible`) — the MDBX freelist model (a page freed at txnid `T` is reclaimable once the oldest live reader's txnid `> T`), realized in Go by reference-counting the generation object instead of each file. - Published bundles form an oldest→newest chain; a reader pins exactly one via `refcnt`. `refcnt` only grows while a bundle is current, only shrinks once superseded. - `BeginFilesRo` does validate-after-pin (one atomic add + re-check), closing the load→pin window. One add instead of dozens of per-file increments. - Files removed from `dirtyFiles` by a merge/prune are attached to the outgoing generation's `retired` set and physically deleted only once that generation (and every older one) drains — reclaimed oldest-first, single owner of `closeFilesAndRemove`, no per-file flag, no double-free. - `DebugBeginDirtyFilesRo` (BuildMissedAccessors) pins the generation the same way, so its captured dirty files — including unindexed ones absent from the visible set — are protected for the duration of the accessor build. `FilesItem.refcount`/`canDelete` are now used only by the forkable subsystem (out of scope here). Design + file lifecycle: `docs/plans/20260525-lockfree-file-reclamation-spec.md`. ## Status WIP. Validated locally: `db/state/...` under `-race` (no data races), `make lint`, `make erigon integration`. --------- Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
yperbasis
pushed a commit
to Sahil-4555/erigon
that referenced
this pull request
Jun 1, 2026
…ntech#21545) ## Problem - `dirtySegment.close()` (closes seg and idx) can happen on subsegments once some collation does `OpenFolder`, which uses `TypedSegments` which closes the subsegments. - `closeWhatNotInList`-- merge calls this and can crash because of close earlier. - maybe user in erigontech#19930 observed this - This started happening more after I tried to take snapshot merge off the build semaphore - erigontech#21526 ``` panic: runtime error: invalid memory address or nil pointer dereference seg.(*Decompressor).FilePath snapshotsync.(*DirtySegment).closeAndRemoveFiles snapshots.go:420 snapshotsync.(*RoTx).Close snapshots.go:537 snapshotsync.(*View).Close ``` ## Fix In `closeWhatNotInList`, skip segments with `refcount > 0`: a live reader still references them, so closing now would invalidate that reader. They are reaped on a later pass once the reader releases them (`closeWhatNotInList` already runs on every `OpenFolder`). `View`/`BeginRo` stays lock-free (erigontech#20490) — the fix is purely in the close path. ## Test `TestCloseWhatNotInListVsLiveViewDoesNotCrash` reproduces the crash deterministically (pure `snapshotsync`, no merge machinery): it builds sub-segments, opens a `View` over them, drops a covering merged file on disk, reopens (so `NoOverlaps` removes the subs from the list), and asserts `View.Close` does not crash. It fails before this change and passes after. Co-authored-by: Sudeep Kumar <sudeep.kumar@erigon.tech>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
like #20462 - but for rosnapshots