Skip to content

db/snapshotsync: make index building and opening segments safe with other files building events#21571

Merged
sudeepdino008 merged 4 commits into
mainfrom
sudeep/merge-inflight-range-registry
Jun 9, 2026
Merged

db/snapshotsync: make index building and opening segments safe with other files building events#21571
sudeepdino008 merged 4 commits into
mainfrom
sudeep/merge-inflight-range-registry

Conversation

@sudeepdino008

@sudeepdino008 sudeepdino008 commented Jun 2, 2026

Copy link
Copy Markdown
Member

Problem

Fix

A per-(type, from, to) in-flight registry on RoSnapshots:

  • TryAcquireRange / ReleaseRange — atomic claim (LoadOrStore); false ⇒ another builder owns it.
  • isInProgress — read-only check.

Wired in:

  • merge claims every (type,range) it produces before writing; a range it can't fully claim is being built elsewhere → skipped this pass; released after integrate.
  • dump (dumpRange) claims its (type,range) across .seg+.idx build.
  • BuildMissedIndices TryAcquires per segment, skips if held — so it never rebuilds an in-flight file and concurrent recovery calls can't double-build.
  • openSegments skips in-progress ranges (read-only), so it never creates a duplicate DirtySegment for a half-built file. Once the owning builder is done processing, it can readd to dirtyFiles and do OpenFolder, so we don't miss the new file.

@sudeepdino008 sudeepdino008 changed the title db/snapshotsync: guard in-flight file builds with a (type,range) registry db/snapshotsync: make index building and opening segments safe with other files building events Jun 2, 2026
@sudeepdino008 sudeepdino008 changed the base branch from sudeep/block-merge-off-build-semaphore to main June 2, 2026 05:46
@sudeepdino008 sudeepdino008 marked this pull request as ready for review June 2, 2026 05:49
…d-index)

Running the block merge off the build semaphore exposed a race: the merge
writes a merged .seg to its final path before building its index, so a
concurrent OpenFolder picks up the bare .seg and BuildMissedIndices rebuilds
the same index on the build semaphore — colliding with the merge's own
buildIdx (the -to-block.idx.tmp rename fails) and stalling state collation
for the duration.

Add a per-(type,from,to) in-flight registry on RoSnapshots: TryAcquireRange/
ReleaseRange (atomic claim) and isInProgress (read check). Producers (merge,
dump) and missed-index recovery claim the files they build; openSegments
skips in-progress ranges and BuildMissedIndices skips/serializes on them, so
a given index always has a single builder. Also makes concurrent
BuildMissedIndices calls safe.
@sudeepdino008 sudeepdino008 force-pushed the sudeep/merge-inflight-range-registry branch from ac3018d to d96b16d Compare June 2, 2026 07:55
Comment thread db/snapshotsync/freezeblocks/block_snapshots.go Outdated
Comment thread db/snapshotsync/snapshots.go Outdated
Sudeep Kumar added 2 commits June 5, 2026 04:55
…tric acquire/release keys

- dumpRange: a claimed (type,range) is now an explicit error instead of a
  silent unprotected build (review: a function creating a fresh file must
  own it)
- BuildMissedIndices: acquire and release now use the same (t, from, to)
  values instead of keying release off segment.FileInfo
…mp error

A claimed range during dump is benign (another builder owns it), but the
plain error reached RetireBlocksInBackground's Error log on every retire
attempt. Follow the ErrHeimdallDataIsNotReady pattern: wrap a sentinel,
log at Debug and retry on the next retire cycle.

@awskii awskii left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry logic is correct and both earlier comments are addressed. Two notes:

  • Land this before #21526 — it's the guard that makes the off-semaphore merge safe, and they're siblings off main, not stacked.
  • Merge's claim/skip/rollback (merger.go:181-199) has no test (no caller with a concurrent claimant) — that's the path #21526 relies on. Worth a test that pre-claims a range and asserts Merge skips it and leaves the claim intact.

Pre-claims one type of a merge range and asserts Merge skips the whole
range, rolls back its partial claims, keeps the pre-claim intact, and
still merges (and releases) the unclaimed range. Verified the test fails
against the pre-registry Merge.
@sudeepdino008 sudeepdino008 added this pull request to the merge queue Jun 9, 2026
Merged via the queue into main with commit 8c0d8bd Jun 9, 2026
169 of 171 checks passed
@sudeepdino008 sudeepdino008 deleted the sudeep/merge-inflight-range-registry branch June 9, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants