db/state: fix collation/pruning race and unwind visibility in BuildFilesInBackground#20445
Conversation
When a storage slot is zeroed (SSTORE→0), the DB stores an empty-value entry. If snapshot files later advance past that entry's step, getLatestFromDb discarded it (returned found=false) because of the lastTxNumOfStep < files.EndTxNum() guard. The caller then fell through to getLatestFromFiles, which scans files newest-to-oldest. Since snapshot files represent deletions as absent keys (no tombstone), the scan reached an older file containing the pre-deletion value and returned stale data. This caused EIP-2200 SSTORE gas mispricing: GetCommittedState returned a non-zero "original" value for a deleted slot, taking the SSTORE_RESET path (2,900 gas) instead of SSTORE_SET (20,000 gas), producing gasUsed deficits that fail block validation. Fix: treat DB deletion entries (len(v)==0) as authoritative regardless of step age. The DB returns the most-recent entry per key (inverted-step ordering), so an empty value means the key was deleted and no newer write exists. Fixes #20169 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return v ([]byte{}) instead of nil to preserve the empty-vs-nil
distinction. DeleteWithPrev and PutWithPrev(key, []byte{}, ...) are
indistinguishable at the DB level — both store empty values. Returning
nil broke TestDomain_GetAfterAggregation which writes legitimate empty
values and expects to read them back as []byte{}.
The critical part of the fix remains: found=true prevents the
fallthrough to getLatestFromFiles. For storage gas pricing, []byte{} is
correctly treated as zero by ReadAccountStorage (checks len(enc) > 0).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…otstrapping Two additional fixes for the gas mismatch issue (#20169): 1. **Preserve deletion entries during domain pruning** (db/state/domain.go): The background file collation (`buildFilesInBackground`) runs concurrently with block execution. When it opens its MDBX read-transaction before the execution's CommitCycle flushes a deletion entry, the resulting domain file silently misses that deletion. Pruning then removes the DB entry because the file "covers" that step — leaving no record of the deletion in either the DB or the files. Subsequent reads fall through to an older file and return a stale pre-deletion value, causing EIP-2200 SSTORE gas mispricing. Fix: route Prune() through oldPrune() and skip deletion entries (empty values) during pruning. Combined with the getLatestFromDb fix from the previous commit, this ensures deletion evidence is never lost regardless of collation/execution timing. 2. **Tolerate domain-ahead-of-blocks during InsertBlocks** (inserters.go): When pre-verified state snapshots extend past the block snapshot boundary, NewSharedDomains returns ErrBehindCommitment. InsertBlocks only needs to accumulate block headers/bodies/TDs in the overlay — it does not need a valid commitment state. Allow it to proceed so the ForkChoice stage can run execution and close the TxNums gap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
955c24c to
e1475aa
Compare
The background file builder (buildFilesInBackground) runs concurrently with block execution. Its collation opens a read-transaction that can snapshot the DB before the execution's CommitCycle flushes all writes for a step. The resulting file silently misses those entries. Pruning then removes them from the DB (the file "covers" the step), and the values are lost — causing both stale reads for deleted keys and missing reads for existing keys. See #20169. Fix: the execution layer publishes its last committed txNum via Aggregator.SetLastCommittedTxNum after each CommitCycle flush+commit. The file builder checks this before collating each step: if lastCommittedTxNum < firstTxNum(step+1), the step may have uncommitted data and collation is deferred. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously initialCycle switched to false only when a stage-loop iteration completed in under 5 minutes. This meant the switch happened very close to the chain tip, leaving only ~50 blocks with changesets. Any reorg request beyond that depth failed with "bad forkchoice" because the node had no changesets to unwind with. Add an explicit check: if the execution stage has reached within MAX_REORG_DEPTH (default 96) blocks of the header-chain tip, switch initialCycle to false immediately. This ensures at least reorgBlockDepth blocks always have changesets, regardless of iteration timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
326bef9 to
5d257a0
Compare
OldPrune() is never called in production code. The deletion-entry preservation for the active prune path is handled via the IsDeletionEntry callback in TableScanningPrune. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirror the serial executor fix: when the parallel executor is within MaxReorgDepth of the batch end, switch initialCycle to false so changesets are generated. Without this the parallel executor can reach the tip with no changesets, making reorgs impossible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TestDomain_DeletionSurvivesPruning: verifies that deletion entries (empty values) in the DB are preserved by the scanning-prune path's IsDeletionEntry callback, even after the step is collated into a file. - TestDomain_NonDeletionEntrySurvivesPruning: verifies that regular (non-empty) values ARE pruned from the DB after collation into a file, ensuring the deletion-preservation logic doesn't block normal pruning. - TestAggregator_LastCommittedTxNumGuard: verifies the guard logic that prevents buildFilesInBackground from collating a step whose data may not be fully committed (lastCommittedTxNum < firstTxNum(step+1)). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment was marked as low quality.
This comment was marked as low quality.
Address review feedback (AskAlexSharov): remove `lastCommittedTxNum` state from the Aggregator. Instead, the background file builder reads the latest committed txNum directly from the commitment domain state in the DB via `GetLatest(kv.CommitmentDomain, KeyCommitmentState)`. This avoids adding statefulness to the Aggregator and uses the existing SharedDomains.ComputeCommitment write as the synchronization point. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lesInBackground Port of #20445 to release/3.4. Fixes two related bugs causing gas mismatches during execution (#20169). Bug 1: Collation/pruning race — BuildFilesInBackground opened a new read-transaction per domain/index, so execution could commit step S+1 data between reads. The collated file for step S then contained wrong values; after pruning, GetLatest returned stale file values. Fix: restructure buildFiles into two phases — Phase 1 collates all domains and indices using a single MDBX read-transaction; Phase 2 builds files in parallel from the collations (no DB access). Also adds a committedTxNum guard: don't collate step S until ComputeCommitment has confirmed all data through end of step S is flushed. Bug 2: Unwind entry visibility — after a reorg, restored entries were tagged with the natural step, which may already be covered by files. getLatestFromDb discarded those entries and fell through to stale file values. Fix: pass Aggregator.EndTxNumMinimax() into unwind and tag restored entries with max(naturalStep, currentFilesEndStep). Also: export DecodeTxBlockNums; fix minUnwindale typo; short-value guard.
Commit 835dd9b in ethpandaops/ethereum-package introduced GpuConfig, a Starlark built-in that requires Kurtosis CLI >=1.18.1. CI installs an older version so Starlark eval fails. Pin to e07503d (the commit just before the regression) until CI is upgraded to Kurtosis 1.18.1.
Commit 835dd9b on main introduced GpuConfig, a Starlark built-in that requires Kurtosis CLI >=1.18.1. CI installs an older version so Starlark eval fails at startup. Pin to the 6.1.0 release tag which has GLOAS support but not the breaking GpuConfig. Unpin once CI is on 1.18.1.
Tracing results: deterministic gas mismatch at block 24,887,755Setup
Per-tx gas comparison
Root causeTraced tx[9] (first diverging tx) on the real node. Found the specific corrupt value:
The stale value was confirmed by SSTORE gas cost: canonical charges 20,000 (SET: slot was zero → non-zero), our node charges 2,900 (RESET: slot was already non-zero). No earlier tx (0–8) in the block touches this slot. The stale value comes directly from Why
|
|
|
|
@AskAlexSharov OK. I'm reverting the unwind fix and keeping only the collation fix. Let's merge the collation fix and do further fixes in separate PRs. |
| // tx-scoped snapshot. BuildFilesInBackground may have filed new steps | ||
| // since this AggregatorRoTx was opened, and getLatestFromDb will use | ||
| // the current view — so the unwind must tag entries beyond it. | ||
| currentFilesEndStep := at.a.EndTxNumMinimax() / at.StepSize() |
|
current unwind fix looks ok: https://github.com/erigontech/erigon/pull/20445/changes#r3091413460 |
…BuildFilesInBackground (#20594) Port of #20445 to release/3.4. ## Summary Fixes two related bugs in the domain state layer that cause gas mismatches during execution (#20169). ### Bug 1: Collation/pruning race `BuildFilesInBackground` collates domain files by reading values from the DB via per-worker read-transactions opened at collation time. Between the moment a step is deemed ready and the collation reads, execution can commit new batches that overwrite step S values with step S+1 data. The collated file for step S then contains wrong values. After pruning removes the DB entries, `GetLatest` returns the stale file values, causing SSTORE gas mispricing. **Fix:** Restructure `buildFiles` into two phases: 1. **Phase 1 (sequential, single read-tx):** Collate all domains and indices using one MDBX read-transaction. All collations see the exact same DB snapshot — zero race window. 2. **Phase 2 (parallel, no DB access):** Build files from collations. This is the expensive part and remains fully parallel. Additionally, add a committed txNum guard: don't collate step S until `ComputeCommitment` has confirmed all data through the end of step S is flushed. ### Bug 2: Unwind entry visibility after filing After a reorg, the unwind restores domain entries tagged with a step derived from the unwind-target txNum. If `BuildFilesInBackground` has filed that step, `getLatestFromDb` discards the restored entry (step covered by files) and falls through to `getLatestFromFiles`, returning the stale end-of-step value instead of the changeset-restored value. **Fix:** Pass `Aggregator.EndTxNumMinimax()` into the unwind and tag restored entries with `max(naturalStep, currentFilesEndStep)`. ### Changes - `db/state/aggregator.go`: single-tx collation + parallel file building; committed txNum guard; `stepFullyCommitted` helper; pass current file boundary to unwind - `db/state/domain.go`: bump unwind step tag past current filed range - `db/state/aggregator_test.go`: `TestAggregator_CommittedTxNumGuard` - `db/state/domain_test.go`: `TestDomain_CollationIsolatedFromLaterSteps`, `TestDomain_UnwindRestoredEntryVisibility` - `execution/commitment/commitmentdb/commitment_context.go`: export `DecodeTxBlockNums`; fix `minUnwindale` typo; short-value length guard
Summary
Fixes two related bugs in the domain state layer that cause gas mismatches during execution (#20169).
Bug 1: Collation/pruning race
BuildFilesInBackgroundcollates domain files by reading values from the DB via per-worker read-transactions opened at collation time. Between the moment a step is deemed ready and the collation reads, execution can commit new batches that overwrite step S values with step S+1 data. The collated file for step S then contains wrong values. After pruning removes the DB entries,GetLatestreturns the stale file values, causing SSTORE gas mispricing (20,000 vs 2,900 = 17,100 gas per affected slot).Fix: Restructure
buildFilesinto two phases:Additionally, add a committed txNum guard: don't collate step S until
ComputeCommitmenthas confirmed all data through the end of step S is flushed.Bug 2: Unwind entry visibility after filing
After a reorg, the unwind restores domain entries tagged with a step derived from the unwind-target txNum. If
BuildFilesInBackgroundhas filed that step,getLatestFromDbdiscards the restored entry (step covered by files) and falls through togetLatestFromFiles, returning the stale end-of-step value instead of the changeset-restored value.The fix must use the current (atomically published) file boundary from
Aggregator.EndTxNumMinimax(), not the tx-scopedDomainRoTx.FirstStepNotInFiles()— becauseBuildFilesInBackgroundmay have filed new steps since the tx was opened, andgetLatestFromDbwill see those via the current view.Fix: Pass
Aggregator.EndTxNumMinimax()into the unwind and tag restored entries withmax(naturalStep, currentFilesEndStep).Changes
db/state/aggregator.go: single-tx collation + parallel file building; committed txNum guard;stepFullyCommittedhelper; pass current file boundary to unwinddb/state/domain.go: bump unwind step tag past current filed rangedb/state/aggregator_test.go:TestAggregator_CommittedTxNumGuarddb/state/domain_test.go:TestDomain_CollationIsolatedFromLaterSteps,TestDomain_UnwindRestoredEntryVisibilityexecution/commitment/commitmentdb/commitment_context.go: exportDecodeTxBlockNums; fixminUnwindaletypo; short-value length guardTest plan
make lintpassesTestDomain_CollationIsolatedFromLaterSteps— step S collation ignores step S+1 dataTestDomain_UnwindRestoredEntryVisibility— unwind-restored entries survivegetLatestFromDbafter filing (fails without fix)TestAggregator_CommittedTxNumGuard— guard blocks premature file building🤖 Generated with Claude Code