[r3.4] db/state: prune TemporalMemBatch overlay entries past unwindToTxNum (#20625)#21538
Merged
Merged
Conversation
…TxNum (#20625) ## Summary Fix a post-unwind stale-read in the in-memory domain overlay that causes gas-used mismatches on post-Fusaka mainnet catch-up. `TemporalMemBatch` stores per-key overlay writes as `[]dataWithTxNum`, each entry stamped with its write `txNum`. `getLatest` returns `dataWithTxNums[len-1]` — the most recently appended entry — without comparing it against `sd.unwindToTxNum`. `Unwind()` only recorded `unwindToTxNum` + an `unwindChangeset`; it never touched `sd.domains` / `sd.storage`. Since `unwindChangeset` is consulted only when the overlay misses, any key still present in the overlay kept returning a pre-unwind write made *inside* the unwound `txNum` range. ## Observed symptom Post-Fusaka mainnet catch-up, after a forkchoice-driven unwind. The first re-executed block reads a storage slot that was first-written inside the unwound range. The overlay returns the post-target write, flipping the SSTORE cost from `SET (20000)` to `RESET (2900)` — **exactly a 17100-gas shortfall per affected slot**. - Block 24,898,955: `diff=-34200` (2 slots × 17100) - Block 24,899,403: `diff=-73829` (compound — several slots affected) Live trace instrumentation (not shipped) captured 3,082 `SD_STALE_READ` events between an unwind at `txNum=3454259398` and the resulting mismatch at block 24,899,403. ## Fix On `Unwind`, walk `sd.domains` and `sd.storage` and drop any `dataWithTxNum` whose `txNum > unwindToTxNum`. If a key's slice empties out, delete the key so the `unwindChangeset` fallback (or the underlying tx) supplies the pre-unwind answer. Runs under `sd.latestStateLock` so the transition is atomic to concurrent reads. Storage-btree mutations are staged after `Scan` to respect btree iterator rules. ## Regression test `TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey` in `db/state/execctx/domain_shared_test.go`: - writes a first-time storage value at `txNum=100` - calls `Unwind(50)` - asserts the overlay no longer returns the post-target write Test fails on pre-fix code with the exact error that mirrors the mainnet symptom; passes with this change. ## Test plan - [x] `go test -short ./db/state/...` — all pass - [x] `make lint` — 0 issues - [x] `make erigon` — builds clean - [ ] Manual sync verification: post-Fusaka mainnet with `chaindata/` wiped and `snapshots/` retained (same repro that produced block-24,899,403 mismatch) — sync progresses past the catch-up / first-forkchoice-unwind window without a gas mismatch. ## Known adjacent issues, out of scope DB-layer siblings of this bug exist separately and are *not* addressed here: 1. `db/state/domain.go:1317` — on-disk unwind currently conflates `nil` ("different step, skip") and `[]byte{}` ("key was absent, write tombstone") via `if len(value) > 0`, so first-time writes in the unwound range leave no restoring tombstone. 2. `db/state/domain.go:1665` — `getLatestFromDb` discards deletion markers at a step within file range, so the caller falls through to `getLatestFromFiles`, which has no concept of deletions and returns stale pre-deletion data. Both were previously addressed by #20483 and reverted by #20509 while regressions were investigated. They need their own narrower fixes with dedicated regression guards and should be staged as separate PRs so they're independently revertible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 2698b38)
AskAlexSharov
approved these changes
May 31, 2026
yperbasis
added a commit
that referenced
this pull request
Jun 1, 2026
Adds the **v3.4.3** section to `ChangeLog.md`, covering the user-facing changes merged to `release/3.4` since v3.4.2, and sets the v3.4.2 header to its release date (2026-05-22). **Bugfixes** - #21538 — second fix for the post-reorg `gas used mismatch` / state-leak still hitting v3.4.2 users - #21507 — `debug_getModifiedAccountsByHash` / `ByNumber` now match Geth semantics - #21389 — `--rpc.logs.maxresults` (documented in 3.4.0) now takes effect via the CLI **Improvements** - #21502 — fail-fast on oversized `engine_newPayload` backward download (less per-slot log spam / wasted fetches) Docs-only / internal PRs (#21451, #21408) are intentionally omitted. Version bump tracked separately in #21547. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-pick of #20625 to
release/3.4.Why
Addresses #21515 — the
gas used mismatch/ state-leak after a tip reorg that recurs on v3.4.2. #21157 (already onrelease/3.4) fixed only the diffset-lookup-by-wrong-hash part of the bug. This is the second of the three reported sub-bugs: a stale read in theTemporalMemBatchin-memory overlay.Unwind()recordedunwindToTxNum+ anunwindChangesetbut never prunedsd.domains/sd.storage, sogetLatestkept returning a write made inside the unwoundtxNumrange — flipping anSSTOREfromSET (20000)toRESET (2900), i.e. the ~17100-gas-per-slot shortfall users reported (diffs 71016 / 73638).The third sub-bug (#20710) is already on
release/3.4as #20716 (104e6d1a97).Adaptations
None — clean cherry-pick, no
release/3.4-specific changes were needed.Verification on release/3.4 + this patch
go build ./db/state/...— cleango vet ./db/state/...— cleango tool golangci-lint run ./db/state/...— 0 issuesTestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey— passesgo test -short ./db/state/...— all pass