revert "db/state: fix stale slot resurrection after unwind and deletion (#20483)"#20509
Merged
Conversation
sudeepdino008
commented
Apr 13, 2026
Member
- Reverts a91d498 while we investigate regressions
| // then re-written within the same step, unwinding the re-write must restore | ||
| // the deletion marker so that getLatestFromFiles doesn't return the stale | ||
| // pre-unwind value from frozen files. | ||
| func TestDomain_UnwindRestoresDeletionMarker(t *testing.T) { |
Collaborator
There was a problem hiding this comment.
does TestDomain_UnwindRestoresDeletionMarker catching valid problem?
| // cursor iterates rows newest-first, so an interrupted prune can delete the | ||
| // newest DB row for a key while leaving an older tombstone. getLatest must | ||
| // cross-check files to avoid returning the stale tombstone as authoritative. | ||
| func TestDomain_LargeValuesInterruptedPruneDoesNotResurrectTombstone(t *testing.T) { |
Collaborator
There was a problem hiding this comment.
does TestDomain_LargeValuesInterruptedPruneDoesNotResurrectTombstone catching valid problem?
yperbasis
previously approved these changes
Apr 13, 2026
Member
|
Whats found regression? |
awskii
approved these changes
Apr 13, 2026
mh0lt
added a commit
that referenced
this pull request
Apr 17, 2026
TemporalMemBatch tracks per-key overlay writes as []dataWithTxNum with
each write stamped by txNum. getLatest returns dataWithTxNums[len-1]
— the most recently appended entry — without comparing it against
sd.unwindToTxNum. Unwind() only recorded unwindToTxNum and an
unwindChangeset; it never touched sd.domains / sd.storage. Since
unwindChangeset is consulted only when the overlay misses, any key
still present in the overlay would keep returning a pre-unwind
write made inside the unwound txNum range.
Observed on post-Fusaka mainnet catch-up: after a forkchoice-driven
unwind, the first re-executed block reads a storage slot that was
first-written inside the unwound range. The overlay returns the
post-target write, flipping the SSTORE cost from SET (20000) to
RESET (2900) — exactly a 17100-gas shortfall per affected slot.
On mainnet block 24899403 this compounded to diff=-73829 across
several slots; block 24898955 showed diff=-34200.
Fix: on Unwind, walk sd.domains and sd.storage and drop any
dataWithTxNum whose txNum > unwindToTxNum. If a key's slice
empties out, delete the key so the unwindChangeset fallback (or
the underlying tx) can supply the pre-unwind answer.
Runs under sd.latestStateLock so the transition is visible
atomically to concurrent reads.
A regression test in db/state/execctx/domain_shared_test.go
(TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey) writes
a first-time storage value at txNum=100, calls Unwind(50), and
asserts the overlay no longer returns the post-target write.
Test fails on the pre-fix code and passes with this change.
Note: DB-layer siblings of this bug exist separately (missing
tombstone on domain unwind for []byte{} diffs; getLatestFromDb
fallthrough past in-range deletion markers, previously addressed
by #20483 and reverted by #20509). Those need their own surgical
fixes with dedicated regression guards and are intentionally
out-of-scope here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
mh0lt
added a commit
that referenced
this pull request
Apr 17, 2026
…estFromDb
Two linked bugs in the DB-level domain unwind and read paths caused stale
data resurrection after an unwind that reverted a first-time write or a
deletion inside the unwound range. Production symptom: post-Fusaka mainnet
execution returned less gas than the header on the first re-executed block
after a forkchoice-driven unwind (observed diff = multiples of 17100 =
SSTORE_SET - SSTORE_RESET, plus 2600 cold-access flips when receipt cache
storage was affected).
1. unwind() conflated two DomainEntryDiff shapes under `if len(value) > 0`:
- nil → different step, skip restore (legacy V0 changeset shape)
- []byte{} → no previous value, should write an empty tombstone so the
key appears absent again after the unwind completes
Both were being skipped, so first-time writes in the unwound range left
no tombstone. This is corrected to `if value != nil`, matching the
diff's documented contract.
2. getLatestFromDb discarded empty-value entries whose step was within the
frozen file range via the step-age guard, letting the caller fall through
to getLatestFromFiles. Frozen files have no concept of deletions, so the
file returns the pre-deletion value — exactly the resurrection the
deletion was meant to prevent. Empty entries are now returned as
authoritative regardless of step age; the step-age guard still applies
to non-empty entries.
This is a re-landing of the narrow core of #20483, which was reverted by
#20509 while regressions were investigated. The LargeValues cross-check
from that PR is intentionally NOT included here — it handled an
interrupted-prune edge case specific to CodeDomain / RCacheDomain that
was the most likely regression source. If that case proves necessary we
will add it as a separate, independently revertible change with its own
dedicated test.
Tests:
- TestDomain_UnwindRestoresDeletionMarker (DupSort + LargeValues) — writes
a key, deletes it, re-writes within the same step, builds files, then
unwinds the re-write. Fails on pre-fix code (getLatest returns the
stale post-unwind value2 from the frozen file); passes with the fix.
Exercises the write-side bug.
- TestDomain_DeletedKeyNotResurrectedByFiles (DupSort + LargeValues) —
documents the read-side contract by writing a key and deleting it at
a step that falls within file range. Passes on current main even
without the fix (the file-build + prune semantics evolved since #20483),
but is retained as a forward regression guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 17, 2026
…20625) ## Summary Fix a post-unwind stale-read in the in-memory domain overlay that causes gas-used mismatches on post-Fusaka mainnet catch-up. `TemporalMemBatch` stores per-key overlay writes as `[]dataWithTxNum`, each entry stamped with its write `txNum`. `getLatest` returns `dataWithTxNums[len-1]` — the most recently appended entry — without comparing it against `sd.unwindToTxNum`. `Unwind()` only recorded `unwindToTxNum` + an `unwindChangeset`; it never touched `sd.domains` / `sd.storage`. Since `unwindChangeset` is consulted only when the overlay misses, any key still present in the overlay kept returning a pre-unwind write made *inside* the unwound `txNum` range. ## Observed symptom Post-Fusaka mainnet catch-up, after a forkchoice-driven unwind. The first re-executed block reads a storage slot that was first-written inside the unwound range. The overlay returns the post-target write, flipping the SSTORE cost from `SET (20000)` to `RESET (2900)` — **exactly a 17100-gas shortfall per affected slot**. - Block 24,898,955: `diff=-34200` (2 slots × 17100) - Block 24,899,403: `diff=-73829` (compound — several slots affected) Live trace instrumentation (not shipped) captured 3,082 `SD_STALE_READ` events between an unwind at `txNum=3454259398` and the resulting mismatch at block 24,899,403. ## Fix On `Unwind`, walk `sd.domains` and `sd.storage` and drop any `dataWithTxNum` whose `txNum > unwindToTxNum`. If a key's slice empties out, delete the key so the `unwindChangeset` fallback (or the underlying tx) supplies the pre-unwind answer. Runs under `sd.latestStateLock` so the transition is atomic to concurrent reads. Storage-btree mutations are staged after `Scan` to respect btree iterator rules. ## Regression test `TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey` in `db/state/execctx/domain_shared_test.go`: - writes a first-time storage value at `txNum=100` - calls `Unwind(50)` - asserts the overlay no longer returns the post-target write Test fails on pre-fix code with the exact error that mirrors the mainnet symptom; passes with this change. ## Test plan - [x] `go test -short ./db/state/...` — all pass - [x] `make lint` — 0 issues - [x] `make erigon` — builds clean - [ ] Manual sync verification: post-Fusaka mainnet with `chaindata/` wiped and `snapshots/` retained (same repro that produced block-24,899,403 mismatch) — sync progresses past the catch-up / first-forkchoice-unwind window without a gas mismatch. ## Known adjacent issues, out of scope DB-layer siblings of this bug exist separately and are *not* addressed here: 1. `db/state/domain.go:1317` — on-disk unwind currently conflates `nil` ("different step, skip") and `[]byte{}` ("key was absent, write tombstone") via `if len(value) > 0`, so first-time writes in the unwound range leave no restoring tombstone. 2. `db/state/domain.go:1665` — `getLatestFromDb` discards deletion markers at a step within file range, so the caller falls through to `getLatestFromFiles`, which has no concept of deletions and returns stale pre-deletion data. Both were previously addressed by #20483 and reverted by #20509 while regressions were investigated. They need their own narrower fixes with dedicated regression guards and should be staged as separate PRs so they're independently revertible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mh0lt
added a commit
that referenced
this pull request
Apr 17, 2026
Fix the DB-level unwind path so first-time-write reversals leave a proper
empty-value tombstone in the domain values table, rather than deleting the
current-step entry and silently falling back to files (which have no
concept of deletions and therefore return stale pre-unwind data).
`DomainEntryDiff.Value` carries three distinct shapes that must be handled
differently during unwind (per the comment in db/kv/helpers.go):
- nil — "different step": prev value lives at another step, skip
restore (only produced by legacy V0 changesets where
valueLen==0 deserializes as nil)
- []byte{} — "no previous value": key was absent before this step; write
an empty tombstone at the unwind step so subsequent reads
see the key as absent again
- non-empty — restore the actual previous value
The old guard `if len(value) > 0` conflated `nil` and `[]byte{}` and
skipped both, leaving no tombstone after unwinding a first-time write.
Corrected to `if value != nil`, matching the diff's documented contract.
The written tombstone lands at max(step, currentFilesEndStep), which by
construction is beyond the frozen file range, so the existing step-age
guard in getLatestFromDb already accepts it without needing to change the
read path.
This is a deliberately narrow re-land of #20483 (reverted by #20509). The
PR's other two pieces — the `len(v)==0` empty-authority bypass in
getLatestFromDb and the LargeValues cross-check in getLatest — are NOT
included here. Regressions surfaced in kurtosis/assertoor_glamsterdam
(which runs with --experimental.bal and the storagerefundtx spammer,
generating a deluge of tombstones) suggest the empty-authority bypass
has downstream interactions with the BAL / post-Amsterdam paths that
need separate narrowing. The interrupted-prune LargeValues case is the
original regression suspect from #20509 and remains out-of-scope.
Tests in db/state/domain_test.go:
- TestDomain_UnwindRestoresDeletionMarker (DupSort + LargeValues) — write
a key, delete it, re-write within the same step, build files, then unwind
the re-write. Fails on pre-fix code (getLatest returns the stale post-
unwind value from the frozen file); passes with this fix. Directly
exercises the write-side bug.
- TestDomain_DeletedKeyNotResurrectedByFiles (DupSort + LargeValues) —
retained as a forward regression guard for the read-side contract,
even though it passes on both pre-fix and post-fix code on current main
(file-build + prune semantics have evolved since #20483 such that the
original scenario no longer triggers the read-side bug on its own).
Tested: go test -short ./db/state/... all pass; make lint 0 issues;
make erigon builds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AskAlexSharov
pushed a commit
that referenced
this pull request
Apr 18, 2026
Fix the DB-level unwind path so first-time-write reversals leave a proper
empty-value tombstone in the domain values table, rather than deleting the
current-step entry and silently falling back to files (which have no
concept of deletions and therefore return stale pre-unwind data).
`DomainEntryDiff.Value` carries three distinct shapes that must be handled
differently during unwind (per the comment in db/kv/helpers.go):
- nil — "different step": prev value lives at another step, skip
restore (only produced by legacy V0 changesets where
valueLen==0 deserializes as nil)
- []byte{} — "no previous value": key was absent before this step; write
an empty tombstone at the unwind step so subsequent reads
see the key as absent again
- non-empty — restore the actual previous value
The old guard `if len(value) > 0` conflated `nil` and `[]byte{}` and
skipped both, leaving no tombstone after unwinding a first-time write.
Corrected to `if value != nil`, matching the diff's documented contract.
The written tombstone lands at max(step, currentFilesEndStep), which by
construction is beyond the frozen file range, so the existing step-age
guard in getLatestFromDb already accepts it without needing to change the
read path.
This is a deliberately narrow re-land of #20483 (reverted by #20509). The
PR's other two pieces — the `len(v)==0` empty-authority bypass in
getLatestFromDb and the LargeValues cross-check in getLatest — are NOT
included here. Regressions surfaced in kurtosis/assertoor_glamsterdam
(which runs with --experimental.bal and the storagerefundtx spammer,
generating a deluge of tombstones) suggest the empty-authority bypass
has downstream interactions with the BAL / post-Amsterdam paths that
need separate narrowing. The interrupted-prune LargeValues case is the
original regression suspect from #20509 and remains out-of-scope.
Tests in db/state/domain_test.go:
- TestDomain_UnwindRestoresDeletionMarker (DupSort + LargeValues) — write
a key, delete it, re-write within the same step, build files, then unwind
the re-write. Fails on pre-fix code (getLatest returns the stale post-
unwind value from the frozen file); passes with this fix. Directly
exercises the write-side bug.
- TestDomain_DeletedKeyNotResurrectedByFiles (DupSort + LargeValues) —
retained as a forward regression guard for the read-side contract,
even though it passes on both pre-fix and post-fix code on current main
(file-build + prune semantics have evolved since #20483 such that the
original scenario no longer triggers the read-side bug on its own).
Tested: go test -short ./db/state/... all pass; make lint 0 issues;
make erigon builds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AskAlexSharov
pushed a commit
that referenced
this pull request
Apr 18, 2026
Fix the DB-level unwind path so first-time-write reversals leave a proper
empty-value tombstone in the domain values table, rather than deleting the
current-step entry and silently falling back to files (which have no
concept of deletions and therefore return stale pre-unwind data).
`DomainEntryDiff.Value` carries three distinct shapes that must be handled
differently during unwind (per the comment in db/kv/helpers.go):
- nil — "different step": prev value lives at another step, skip
restore (only produced by legacy V0 changesets where
valueLen==0 deserializes as nil)
- []byte{} — "no previous value": key was absent before this step; write
an empty tombstone at the unwind step so subsequent reads
see the key as absent again
- non-empty — restore the actual previous value
The old guard `if len(value) > 0` conflated `nil` and `[]byte{}` and
skipped both, leaving no tombstone after unwinding a first-time write.
Corrected to `if value != nil`, matching the diff's documented contract.
The written tombstone lands at max(step, currentFilesEndStep), which by
construction is beyond the frozen file range, so the existing step-age
guard in getLatestFromDb already accepts it without needing to change the
read path.
This is a deliberately narrow re-land of #20483 (reverted by #20509). The
PR's other two pieces — the `len(v)==0` empty-authority bypass in
getLatestFromDb and the LargeValues cross-check in getLatest — are NOT
included here. Regressions surfaced in kurtosis/assertoor_glamsterdam
(which runs with --experimental.bal and the storagerefundtx spammer,
generating a deluge of tombstones) suggest the empty-authority bypass
has downstream interactions with the BAL / post-Amsterdam paths that
need separate narrowing. The interrupted-prune LargeValues case is the
original regression suspect from #20509 and remains out-of-scope.
Tests in db/state/domain_test.go:
- TestDomain_UnwindRestoresDeletionMarker (DupSort + LargeValues) — write
a key, delete it, re-write within the same step, build files, then unwind
the re-write. Fails on pre-fix code (getLatest returns the stale post-
unwind value from the frozen file); passes with this fix. Directly
exercises the write-side bug.
- TestDomain_DeletedKeyNotResurrectedByFiles (DupSort + LargeValues) —
retained as a forward regression guard for the read-side contract,
even though it passes on both pre-fix and post-fix code on current main
(file-build + prune semantics have evolved since #20483 such that the
original scenario no longer triggers the read-side bug on its own).
Tested: go test -short ./db/state/... all pass; make lint 0 issues;
make erigon builds.
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 18, 2026
…estFromDb (#20627) ## Summary Re-land the narrow core of #20483 (reverted by #20509), addressing the DB-layer siblings of the post-unwind stale-read bug. Complements #20625 which addressed the in-memory overlay side. Two linked bugs in the DB-level domain unwind and read paths caused stale data resurrection after an unwind that reverted a first-time write or a deletion inside the unwound range. ### Symptom observed on mainnet Post-Fusaka mainnet catch-up sync with a chaindata/ wipe (snapshots/ retained). On the first re-executed block after a forkchoice-driven unwind, execution returned less gas than the header — diffs observed of `-34200` (block 24,898,955), `-73829` (block 24,899,403), `-118872` (block 24,899,594). The diffs break down into multiples of `SSTORE_SET - SSTORE_RESET = 17100` plus cold-access flips of `2600`. Previous PR #20625 cleared the first two by pruning the in-memory overlay on Unwind. Block 24,899,594 still failed because the overlay was already flushed to DB at Unwind time — the stale-read path now was purely DB-layer, addressed here. ## Fixes ### 1. `unwind()` must restore empty tombstones — `db/state/domain.go:1317` (both DupSort and LargeValues paths) `DomainEntryDiff.Value` has three shapes, documented in `db/kv/helpers.go:247`: - `nil` — "different step": prev value lives at another step, skip restore (legacy V0 changeset shape) - `[]byte{}` — "no previous value": key was absent before this step; write an empty tombstone so the key appears absent again after the unwind completes - non-empty — restore the actual previous value The old guard `if len(value) > 0` skipped *both* `nil` and `[]byte{}`, leaving no tombstone after unwinding a first-time write. Corrected to `if value != nil`. ### 2. `getLatestFromDb` must treat empty values as authoritative — `db/state/domain.go:1665` Empty-value entries are deletion tombstones. The step-age guard previously discarded them when their step fell within the frozen file range, causing the caller to fall through to `getLatestFromFiles`. Frozen files encode deletions as absence, so the file returns the pre-deletion value — the exact resurrection the deletion was meant to prevent. Empty entries are now returned as `found=true` regardless of step age; the step-age guard still applies to non-empty entries. ## Relationship to #20483 / #20509 This is a deliberate re-land of the narrow core of #20483. Key differences: - **Excluded**: the LargeValues cross-check in `getLatest` (PR #20483 lines 1697–1737). That handled an interrupted-`PruneSmallBatches` edge case specific to `CodeDomain` / `RCacheDomain` and was the most likely source of the regressions that motivated #20509's revert. If it proves necessary, it can be added later as a separate PR with its own dedicated test. - **Included**: both matched fixes (write-side tombstone + read-side authoritativeness). They are a pair — neither is useful alone; staging them separately risks merging half and shipping a version that's still broken. ## Tests - `TestDomain_UnwindRestoresDeletionMarker` (DupSort + LargeValues subtests) — writes a key, deletes it, re-writes within the same step, builds files, then unwinds the re-write. **Fails on pre-fix code** (getLatest returns the stale post-unwind `value2` from the frozen file); passes with the fix. Exercises the write-side bug directly. - `TestDomain_DeletedKeyNotResurrectedByFiles` (DupSort + LargeValues subtests) — documents the read-side contract by writing a key and deleting it at a step that falls within file range. Passes on current `main` even without the fix (the file-build + prune semantics evolved since #20483 and no longer hit the specific stale-read in this exact test scenario), but retained as a forward regression guard and as documentation of the invariant. ## Test plan - [x] `go test -short ./db/state/...` — all pass - [x] `make lint` — 0 issues - [x] `make erigon` — builds clean - [x] Manual repro of the production symptom (mainnet sync from snapshots-only) in combination with #20625 — sync progresses past the catch-up / first-forkchoice-unwind window without a gas mismatch. (Re-verification run in progress alongside this PR.) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lemenkov
pushed a commit
to fedora-ethereum/erigon
that referenced
this pull request
May 30, 2026
…rigontech#20625) ## Summary Fix a post-unwind stale-read in the in-memory domain overlay that causes gas-used mismatches on post-Fusaka mainnet catch-up. `TemporalMemBatch` stores per-key overlay writes as `[]dataWithTxNum`, each entry stamped with its write `txNum`. `getLatest` returns `dataWithTxNums[len-1]` — the most recently appended entry — without comparing it against `sd.unwindToTxNum`. `Unwind()` only recorded `unwindToTxNum` + an `unwindChangeset`; it never touched `sd.domains` / `sd.storage`. Since `unwindChangeset` is consulted only when the overlay misses, any key still present in the overlay kept returning a pre-unwind write made *inside* the unwound `txNum` range. ## Observed symptom Post-Fusaka mainnet catch-up, after a forkchoice-driven unwind. The first re-executed block reads a storage slot that was first-written inside the unwound range. The overlay returns the post-target write, flipping the SSTORE cost from `SET (20000)` to `RESET (2900)` — **exactly a 17100-gas shortfall per affected slot**. - Block 24,898,955: `diff=-34200` (2 slots × 17100) - Block 24,899,403: `diff=-73829` (compound — several slots affected) Live trace instrumentation (not shipped) captured 3,082 `SD_STALE_READ` events between an unwind at `txNum=3454259398` and the resulting mismatch at block 24,899,403. ## Fix On `Unwind`, walk `sd.domains` and `sd.storage` and drop any `dataWithTxNum` whose `txNum > unwindToTxNum`. If a key's slice empties out, delete the key so the `unwindChangeset` fallback (or the underlying tx) supplies the pre-unwind answer. Runs under `sd.latestStateLock` so the transition is atomic to concurrent reads. Storage-btree mutations are staged after `Scan` to respect btree iterator rules. ## Regression test `TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey` in `db/state/execctx/domain_shared_test.go`: - writes a first-time storage value at `txNum=100` - calls `Unwind(50)` - asserts the overlay no longer returns the post-target write Test fails on pre-fix code with the exact error that mirrors the mainnet symptom; passes with this change. ## Test plan - [x] `go test -short ./db/state/...` — all pass - [x] `make lint` — 0 issues - [x] `make erigon` — builds clean - [ ] Manual sync verification: post-Fusaka mainnet with `chaindata/` wiped and `snapshots/` retained (same repro that produced block-24,899,403 mismatch) — sync progresses past the catch-up / first-forkchoice-unwind window without a gas mismatch. ## Known adjacent issues, out of scope DB-layer siblings of this bug exist separately and are *not* addressed here: 1. `db/state/domain.go:1317` — on-disk unwind currently conflates `nil` ("different step, skip") and `[]byte{}` ("key was absent, write tombstone") via `if len(value) > 0`, so first-time writes in the unwound range leave no restoring tombstone. 2. `db/state/domain.go:1665` — `getLatestFromDb` discards deletion markers at a step within file range, so the caller falls through to `getLatestFromFiles`, which has no concept of deletions and returns stale pre-deletion data. Both were previously addressed by erigontech#20483 and reverted by erigontech#20509 while regressions were investigated. They need their own narrower fixes with dedicated regression guards and should be staged as separate PRs so they're independently revertible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.