db/state: preserve per-step unwind entries so Flush reverts every orphan#20710
Merged
Conversation
TemporalMemBatch.Unwind stored the incoming changeset in sd.unwindChangeset, a map keyed by change.Key[:len(change.Key)-8] — i.e. the real key with the step suffix stripped. That's the shape getLatest needs for its fallback lookup (one pre-unwind value per real key), but Flush was re-exporting the same collapsed map to tx.Unwind and into DomainRoTx.unwind — so only one entry per real key survived, no matter how many (key, step) pairs the original diff carried. Each entry drives a DupSort DeleteCurrent at a specific step. When a key was touched inside more than one step of the unwound range (e.g. a commitment branch written to step N and N+1 during forward execution), only one step's dup got deleted and the rest remained as orphans. For the commitment domain this left the "state" key pointing at a blockNum beyond the unwind target, and on the next open SeekCommitment returned that stale block — so execution started past TxNums.Last() and rejected the next block with "nonce too high" (observed on mainnet after three forkchoice-driven ~54-block unwinds: 22k orphan step-8860 entries plus a state key frozen at blockNum=24923992). Fix: keep a separate unwindChangesetRaw alongside the collapsed map. Flush replays the raw (step-preserving) list against tx.Unwind; getLatest still consults the collapsed map. Merge and ClearRam carry and clear the raw copy too. New tests cover single-unwind across a step boundary and three successive unwind cycles; both now verify the commitment "state" blockNum and the values-table step tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes an unwind/flush corruption case where per-step domain-value entries were being collapsed to a single entry per “real key”, causing Flush() to miss deleting orphan entries at other steps (observed as the commitment domain drifting ahead of other domains after repeated FCU-driven unwinds).
Changes:
- Preserve step-specific unwind diffs by storing a step-preserving
unwindChangesetRawalongside the collapsedunwindChangeset. - Update
Flush()to replay unwinds using the raw (step-preserving) diffs so every orphan(key, step)entry is reverted. - Add regression tests covering single and repeated unwinds across a step boundary to ensure commitment state/values don’t remain ahead of the unwind target.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| db/state/temporal_mem_batch.go | Adds unwindChangesetRaw, merges/clears it appropriately, and uses it in Flush() to unwind all step entries. |
| db/state/execctx/domain_shared_test.go | Adds new regression tests for commitment/domain unwind correctness across step boundaries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Both post-unwind commitment-state checks were wrapped in `if len(stateVal)
>= 16 { ... }`. If a future bug ever deleted or truncated the state key,
the test would silently pass — exactly the failure mode these tests try
to catch. Switch to `require.GreaterOrEqual(len, 16)` so the assertion
fails loudly instead of being skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dd Merge-path test - Flush: pass sd.unwindChangesetRaw directly instead of dereferencing into a local (sort is in-place on the shared backing array either way). - Merge: document that MergeDiffSets requires both raw changesets sorted by Key, and note where the invariant is established upstream. - Add TestSharedDomain_MergeUnwindAcrossStepBoundary, a regression test that splits a step-boundary-crossing unwind between two SharedDomains and calls SharedDomains.Merge before Flush. Verified to fail against the pre-fix Flush path (commitment state decoded blockNum=9 against unwindTarget=4) and pass with the raw-list replay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mh0lt
reviewed
Apr 21, 2026
mh0lt
left a comment
Contributor
There was a problem hiding this comment.
Its for a cross step unwind of a live value ? It would probably be worth doing this also for a delete (nil). As these have slightly different behaviour.
…ent-unwind-divergence
mh0lt
approved these changes
Apr 21, 2026
Addresses PR feedback that the raw-changeset Flush fix was only covered for live-value writes; deletes use the same DomainUpdate path but emit a tombstone dup (^step || nil, 8 bytes) whose side effects on the DupSort sort order are easy to reason about incorrectly. Scenario: write addr in step 0, overwrite in step 0, delete in step 1, then unwind to a block inside step 0 via doms.Unwind() + doms.Flush(). The collapsed-map Flush would replay only the step-0 restore diff and leave the step-1 tombstone in MDBX — and since ^1 sorts before ^0 in DupSort, SeekExact would return the tombstone, getLatestFromDb would interpret the 8-byte value as "absent", and the account would look deleted instead of restored to its block-0 value. Verified: FAILS against a reverted Flush (uses the collapsed map) with "post-unwind: addr must be restored (got empty slice — step-1 tombstone was not cleaned up)", PASSES against the raw-list Flush. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AskAlexSharov
pushed a commit
that referenced
this pull request
Apr 21, 2026
…ry orphan (#20716) Cherry-pick of #20710 to release/3.4. ## r3.4-specific adaptations - `sd1.Merge` call in the new test dropped `ctx` — release/3.4's signature is `Merge(uint64, *SharedDomains, uint64)`. - `TestSharedDomain_RepeatedUnwindAcrossStepBoundary` is skipped: it depends on the overlay-pruning semantics from PR #20625 (not present in release/3.4). The single-unwind variants (`TestSharedDomain_UnwindAcrossStepBoundary`, `TestSharedDomain_MergeUnwindAcrossStepBoundary`, `TestSharedDomain_UnwindWithDeleteAcrossStepBoundary`) still guard the per-step orphan-entry invariant this PR fixes. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
AskAlexSharov
pushed a commit
that referenced
this pull request
May 31, 2026
…TxNum (#20625) (#21538) Cherry-pick of #20625 to `release/3.4`. ## Why Addresses #21515 — the `gas used mismatch` / state-leak after a tip reorg that recurs on v3.4.2. #21157 (already on `release/3.4`) fixed only the diffset-lookup-by-wrong-hash part of the bug. This is the second of the three reported sub-bugs: a stale read in the `TemporalMemBatch` in-memory overlay. `Unwind()` recorded `unwindToTxNum` + an `unwindChangeset` but never pruned `sd.domains` / `sd.storage`, so `getLatest` kept returning a write made *inside* the unwound `txNum` range — flipping an `SSTORE` from `SET (20000)` to `RESET (2900)`, i.e. the ~17100-gas-per-slot shortfall users reported (diffs 71016 / 73638). The third sub-bug (#20710) is already on `release/3.4` as #20716 (`104e6d1a97`). ## Adaptations None — clean cherry-pick, no `release/3.4`-specific changes were needed. ## Verification on release/3.4 + this patch - `go build ./db/state/...` — clean - `go vet ./db/state/...` — clean - `go tool golangci-lint run ./db/state/...` — 0 issues - regression test `TestSharedDomain_UnwindDoesNotRestoreOverlayForNewKey` — passes - `go test -short ./db/state/...` — all pass Co-authored-by: Mark Holt <135143369+mh0lt@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TemporalMemBatch.Unwindstored the incoming changeset insd.unwindChangeset, a map keyed bychange.Key[:len(change.Key)-8]— the real key with the 8-byte step suffix stripped. That shape is what thegetLatestfallback needs (one pre-unwind value per real key), butFlushwas re-exporting the same collapsed map totx.Unwind→AggregatorRoTx.Unwind→DomainRoTx.unwind. Because only one entry per real key survived the collapse, per-stepDeleteCurrentcalls fired for only one step per key and orphan domain-values entries were left behind at every other step.On mainnet this manifested as the commitment domain drifting one step ahead of every other domain after a sequence of forkchoice-driven unwinds.
SeekCommitmentthen returned a staleblockNumaboveTxNums.Last(), execution started past the truly-executed tip, and the next block was rejected with"nonce too high".Fix: keep a separate
unwindChangesetRaw(step-preserving) alongside the collapsed map.Flushreplays the raw list againsttx.Unwind;getLatestcontinues to consult the collapsed map.Mergecarries (andClearRamclears) both copies.Why
On a mainnet node,
integration print_stagesshowed:A direct read of the commitment values table turned up 22,038 orphan step-8860 entries plus the
"state"key pointing atblockNum=24923992after execution had been unwound to 24923939. The observable symptom was an endlessinvalid block, txnIdx=2, nonce too high: address 0x13c12Fe9…, tx: 14 state: 13retry loop every ~12s.Repro
New test
TestSharedDomain_RepeatedUnwindAcrossStepBoundarywrites accounts+commitment across three steps, runs three (execute, flush, unwind-to-earlier-step, flush) cycles and asserts that"state"key's decodedblockNum ≤ unwindTarget, andstep > unwindTarget/stepSize.On main this fails with
step=2 bn=29 / step=1 bn=19dups still present after every unwind; with the fix the only remaining dups are at step 0 (the unwind-target step) and the latest points at the restored block.TestSharedDomain_UnwindAcrossStepBoundaryis a simpler single-unwind regression guard for the same invariant.Notes
Mergepath now mergesunwindChangesetRawwith the same newer/older directionality asunwindChangeset. It useschangeset.MergeDiffSets, which already dedupes by full(key, step)— so this doesn't reintroduce the collapse bug.