[r34] db/state: preserve per-step unwind entries so Flush reverts every orphan#20716
Merged
Conversation
…han (#20710) ## Summary `TemporalMemBatch.Unwind` stored the incoming changeset in `sd.unwindChangeset`, a map keyed by `change.Key[:len(change.Key)-8]` — the real key with the 8-byte step suffix stripped. That shape is what the `getLatest` fallback needs (one pre-unwind value per real key), but `Flush` was re-exporting the same collapsed map to `tx.Unwind` → `AggregatorRoTx.Unwind` → `DomainRoTx.unwind`. Because only one entry per real key survived the collapse, per-step `DeleteCurrent` calls fired for only one step per key and orphan domain-values entries were left behind at every other step. On mainnet this manifested as the commitment domain drifting one step ahead of every other domain after a sequence of forkchoice-driven unwinds. `SeekCommitment` then returned a stale `blockNum` above `TxNums.Last()`, execution started past the truly-executed tip, and the next block was rejected with `"nonce too high"`. Fix: keep a separate `unwindChangesetRaw` (step-preserving) alongside the collapsed map. `Flush` replays the raw list against `tx.Unwind`; `getLatest` continues to consult the collapsed map. `Merge` carries (and `ClearRam` clears) both copies. ## Why On a mainnet node, `integration print_stages` showed: ``` accounts progress(txnum) 3460929701 step 8859 storage progress(txnum) 3460929699 step 8859 code progress(txnum) 3460929625 step 8859 commitment progress(txnum) 3460937500 step 8860 <-- ahead by one full step receipt progress(txnum) 3460929700 step 8859 ``` A direct read of the commitment values table turned up 22,038 orphan step-8860 entries plus the `"state"` key pointing at `blockNum=24923992` after execution had been unwound to 24923939. The observable symptom was an endless `invalid block, txnIdx=2, nonce too high: address 0x13c12Fe9…, tx: 14 state: 13` retry loop every ~12s. ## Repro New test `TestSharedDomain_RepeatedUnwindAcrossStepBoundary` writes accounts+commitment across three steps, runs three (execute, flush, unwind-to-earlier-step, flush) cycles and asserts that - the commitment `"state"` key's decoded `blockNum ≤ unwindTarget`, and - the commitment values table has no entries with `step > unwindTarget/stepSize`. On main this fails with `step=2 bn=29 / step=1 bn=19` dups still present after every unwind; with the fix the only remaining dups are at step 0 (the unwind-target step) and the latest points at the restored block. `TestSharedDomain_UnwindAcrossStepBoundary` is a simpler single-unwind regression guard for the same invariant. ## Notes - The `Merge` path now merges `unwindChangesetRaw` with the same newer/older directionality as `unwindChangeset`. It uses `changeset.MergeDiffSets`, which already dedupes by full `(key, step)` — so this doesn't reintroduce the collapse bug. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it> (cherry picked from commit 2813e8f)
TestSharedDomain_RepeatedUnwindAcrossStepBoundary depends on the overlay-pruning semantics introduced by PR #20625 (prune TemporalMemBatch overlay entries past unwindToTxNum), which is only present on main. release/3.4 uses a different approach (stagedsync.ClearLatestCache + flush unwind changeset to DB before re-execution), so the test cannot run unmodified. Skip it; the single-unwind variants still guard the per-step orphan-entry invariant that #20710 fixes. Also drop the ctx argument from the sd1.Merge call — release/3.4's SharedDomains.Merge signature does not take a context.
AskAlexSharov
approved these changes
Apr 21, 2026
AskAlexSharov
pushed a commit
that referenced
this pull request
May 31, 2026
…TxNum (#20625) (#21538) Cherry-pick of #20625 to `release/3.4`. ## Why Addresses #21515 — the `gas used mismatch` / state-leak after a tip reorg that recurs on v3.4.2. #21157 (already on `release/3.4`) fixed only the diffset-lookup-by-wrong-hash part of the bug. This is the second of the three reported sub-bugs: a stale read in the `TemporalMemBatch` in-memory overlay. `Unwind()` recorded `unwindToTxNum` + an `unwindChangeset` but never pruned `sd.domains` / `sd.storage`, so `getLatest` kept returning a write made *inside* the unwound `txNum` range — flipping an `SSTORE` from `SET (20000)` to `RESET (2900)`, i.e. the ~17100-gas-per-slot shortfall users reported (diffs 71016 / 73638). The third sub-bug (#20710) is already on `release/3.4` as #20716 (`104e6d1a97`). ## Adaptations None — clean cherry-pick, no `release/3.4`-specific changes were needed. ## Verification on release/3.4 + this patch - `go build ./db/state/...` — clean - `go vet ./db/state/...` — clean - `go tool golangci-lint run ./db/state/...` — 0 issues - regression test `TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey` — passes - `go test -short ./db/state/...` — all pass Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-pick of #20710 to release/3.4.
r3.4-specific adaptations
sd1.Mergecall in the new test droppedctx— release/3.4's signature isMerge(uint64, *SharedDomains, uint64).TestSharedDomain_RepeatedUnwindAcrossStepBoundaryis skipped: it depends on the overlay-pruning semantics from PR db/state: prune TemporalMemBatch overlay entries past unwindToTxNum #20625 (not present in release/3.4). The single-unwind variants (TestSharedDomain_UnwindAcrossStepBoundary,TestSharedDomain_MergeUnwindAcrossStepBoundary,TestSharedDomain_UnwindWithDeleteAcrossStepBoundary) still guard the per-step orphan-entry invariant this PR fixes.