Skip to content

RoSnapshots: lock-free .View method#20490

Merged
AskAlexSharov merged 9 commits into
mainfrom
alex/lock_free_rosn_35
Apr 14, 2026
Merged

RoSnapshots: lock-free .View method#20490
AskAlexSharov merged 9 commits into
mainfrom
alex/lock_free_rosn_35

Conversation

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

like #20462 - but for rosnapshots

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors db/snapshotsync.RoSnapshots to make View() (and other visible-segment reads) lock-free by publishing a single atomically-swapped “visible” snapshot, similar in spirit to PR #20462 but applied to rosnapshots.

Changes:

  • Replace visibleLock + visible []VisibleSegments with atomic.Pointer[snapshotVisible] and publish recomputed visibility via atomic swap.
  • Derive SegmentsMax() from the published visible snapshot instead of maintaining a separate atomic updated during segment opening.
  • Update and extend tests to use the new atomic-visible access pattern and add regressions around SegmentsMax and view pinning across generations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
db/snapshotsync/snapshots.go Introduces atomically-published snapshotVisible, removes visibleLock, makes View() lock-free, and recalculates/publishes visibility + segmentsMax together.
db/snapshotsync/snapshots_test.go Updates tests for atomic-visible access; adds regression tests for SegmentsMax visibility semantics and view pinning across recalcs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/snapshotsync/snapshots.go
Comment thread db/snapshotsync/snapshots.go Outdated
@AskAlexSharov AskAlexSharov changed the title [wip] RoSnapshots: lock-free .View method RoSnapshots: lock-free .View method Apr 14, 2026
@AskAlexSharov AskAlexSharov requested a review from JkLondon April 14, 2026 05:07
@AskAlexSharov AskAlexSharov enabled auto-merge April 14, 2026 05:37
@AskAlexSharov AskAlexSharov added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit 05a1e3d Apr 14, 2026
35 checks passed
@AskAlexSharov AskAlexSharov deleted the alex/lock_free_rosn_35 branch April 14, 2026 20:21
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request May 27, 2026
## Context

`Aggregator.BeginFilesRo()` was made lock-free in erigontech#20462/erigontech#20490, but
physical
file deletion stayed gated by two per-`FilesItem` atomics (`refcount` +
`canDelete`).
Two atomics guarding one destructive action (`closeFilesAndRemove`) is
the TOCTOU
double-free behind erigontech#21384, and a per-file refcount taken *after* the
snapshot
pointer is loaded can't protect the load→pin window by itself.

## What this does

Replaces per-file `refcount`/`canDelete` with MVCC reclamation gated by
a refcount
on the published bundle (`aggregatorVisible`) — the MDBX freelist model
(a page
freed at txnid `T` is reclaimable once the oldest live reader's txnid `>
T`),
realized in Go by reference-counting the generation object instead of
each file.

- Published bundles form an oldest→newest chain; a reader pins exactly
one via
`refcnt`. `refcnt` only grows while a bundle is current, only shrinks
once superseded.
- `BeginFilesRo` does validate-after-pin (one atomic add + re-check),
closing the
  load→pin window. One add instead of dozens of per-file increments.
- Files removed from `dirtyFiles` by a merge/prune are attached to the
outgoing
generation's `retired` set and physically deleted only once that
generation
  (and every older one) drains — reclaimed oldest-first, single owner of
  `closeFilesAndRemove`, no per-file flag, no double-free.
- `DebugBeginDirtyFilesRo` (BuildMissedAccessors) pins the generation
the same way,
so its captured dirty files — including unindexed ones absent from the
visible
  set — are protected for the duration of the accessor build.

`FilesItem.refcount`/`canDelete` are now used only by the forkable
subsystem
(out of scope here).

Design + file lifecycle:
`docs/plans/20260525-lockfree-file-reclamation-spec.md`.

## Status

WIP. Validated locally: `db/state/...` under `-race` (no data races),
`make lint`,
`make erigon integration`.

---------

Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
yperbasis pushed a commit to Sahil-4555/erigon that referenced this pull request Jun 1, 2026
…ntech#21545)

## Problem

- `dirtySegment.close()` (closes seg and idx) can happen on subsegments
once some collation does `OpenFolder`, which uses `TypedSegments` which
closes the subsegments.
- `closeWhatNotInList`-- merge calls this and can crash because of close
earlier.
- maybe user in erigontech#19930 observed
this
- This started happening more after I tried to take snapshot merge off
the build semaphore - erigontech#21526

```
panic: runtime error: invalid memory address or nil pointer dereference
  seg.(*Decompressor).FilePath
  snapshotsync.(*DirtySegment).closeAndRemoveFiles   snapshots.go:420
  snapshotsync.(*RoTx).Close                         snapshots.go:537
  snapshotsync.(*View).Close
```

## Fix

In `closeWhatNotInList`, skip segments with `refcount > 0`: a live
reader still references them, so closing now would invalidate that
reader. They are reaped on a later pass once the reader releases them
(`closeWhatNotInList` already runs on every `OpenFolder`).

`View`/`BeginRo` stays lock-free (erigontech#20490) — the fix is purely in the
close path.

## Test

`TestCloseWhatNotInListVsLiveViewDoesNotCrash` reproduces the crash
deterministically (pure `snapshotsync`, no merge machinery): it builds
sub-segments, opens a `View` over them, drops a covering merged file on
disk, reopens (so `NoOverlaps` removes the subs from the list), and
asserts `View.Close` does not crash. It fails before this change and
passes after.

Co-authored-by: Sudeep Kumar <sudeep.kumar@erigon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants