[pull] main from erigontech:main#9
Merged
Merged
Conversation
## Summary Parallel commitment calculation now runs in its own goroutine, in parallel with execution and apply. The previous architecture serialized commitment behind `ApplyStateWrites` in the apply goroutine; this PR reinstates the three-stage pipeline. Everything else in the PR is supplementary infrastructure to make this work correctly under load. ## Headline change: parallel commitment **The calculator is a third concurrent stage** consuming the same `applyResult` stream as the apply goroutine via fan-out: | Stage | Goroutine | Owns | |---|---|---| | **execLoop / workers** | producer | `ApplyStateWrites` via `BlockStateCache`, finalize via IBS, Flush before sending | | **Apply goroutine** | index-only consumer | `ApplyTxIndexes`, accumulator, receipt + postValidator, `ProcessBAL`, changeset | | **Commitment calculator** | parallel consumer | Commitment-domain writes in `sd.mem` via own roTx + `asOfStateReader`; publishes on `rootResults` | End-of-batch trigger: `triggerBatchCommitment(ctx)` fires from execLoop at three sites (`sizeEst > batchLimit`, `blockNum >= maxBlockNum`, `StopAfterBlock`); apply goroutine consumes `rootResults` for root check + `lastCommittedBlockNum` bump. **BAL hash and state root are independent** - `ProcessBAL` runs concurrent with commitment, not gated on it. Shutdown: execLoop closes `commitResults` first, `applyResults` second; apply goroutine drains `rootResults` on `!ok`. **Apply loop has no `ctx.Done` case** by design - execLoop owns shutdown sequencing. ## Changeset ### Headline (commitment calculator) - **41c3b64a3b** `stagedsync: port commitment calculator architecture forward onto merged branch` Restores pre-merge `exec3_parallel.go` + `exec3.go` (1803 line delta in exec3_parallel.go) - calculator goroutine, `commitResults` + `rootResults` channels, `triggerBatchCommitment` driver, fan-out via `sendResult`, four domain-flag toggles when calculator is authoritative (`SetDisableInlineTouchKey`, `SetInMemHistoryReads`, `EnableTrieWarmup=false`, `SetSkipStepBoundaryCommitment`), per-block `BlockStateCache` allocation, overlay-backed `blockTx` for executeBlocks, step-alignment check + `lastFrozenTxNum` gating. Adapters: `BlockGasUsed` -> `BlockRegularGasUsed` (main split for EIP-8037), re-added `VersionMap.StorageKeys()` for self-destruct delete emission. ### Supplementary (openTxs=1 invariant for clean MDBX GC) For commitment to run cleanly in parallel with apply, MDBX needs `openTxs=1` at commit time so freelist pages get reclaimed. These fixes close every observed concurrent-reader path: - **640bce87ec** `execmodule/forkchoice: release bgRoTx before RW commit to drop openTxs 2->1` - **63cb4e4e99** `execmodule: two more openTxs=2 eliminations - ValidateChain + CommitCycle` - **bb7bd2f8a3** `db/kv: GatedRoDB wrapper + BlockRetire commit-gate wiring` Closes the residual ~5% openTxs=2 events that came from snapshot retirement reads. New `kv.NewGatedRoDB(inner, gate)` wraps an RoDB so each `db.View()` acquires `gate.RLock()`. `BlockRetire.SetCommitGate(gate)` plumbs in the Aggregator's existing CommitGate, transparently gating all retirement reads (DumpBlocks -> DumpHeaders/Bodies/Txs -> BigChunks -> db.View). ### At-tip MDBX growth fixes (the bug that made this PR feel incomplete until validated) - **16ab0346c1** `execmodule, db/kv/mdbx: track lastFlushedCommitmentTxNum on FCU + gate per-commit log` The aggregator's collation safety cap was only being updated from `ProcessFrozenBlocks` (initial-cycle path). During normal FCU at-tip operation the cap stayed at its initial value forever, contributing to the at-tip MDBX growth issue. Fixed by reading `KeyCommitmentState` from the just-committed RW tx and calling `SetLastFlushedCommitmentTxNum` after every FCU commit. Also gated the per-commit `[mdbx] commit` Info log behind `MDBX_TRACE_TX` (was producing 525+ log lines per 27h at tip). - **071e9f98b9** `execmodule, db/state: kick CollateAndPruneIfNeeded on FCU + adaptive prune budget` `CollateAndPruneIfNeeded` (which kicks `BuildFilesInBackground`) was only invoked from `StageLoopIteration`. At chain tip, blocks flow through the FCU path so files were never built, prune had nothing to mark stale, and MDBX accumulated indefinitely. Run 13 demonstrated this: 27h at tip, file count stuck at 7 per domain, CommitmentVals 11.79 GB, total live data 21.84 GB on a 26 GB chaindata. Fixed by calling `agg.CollateAndPruneIfNeeded(...)` from `runForkchoicePrune`. Also adaptive prune budget: base = SecondsPerSlot/3 (= 4s on mainnet), +200ms per 100 prunable steps, capped at 2/3 of slot. ### Diagnostics - **0683cde973** `db/kv/mdbx: tx-lifecycle tracer (OPEN/COMMIT/ROLLBACK with traceID + stack)` - **22f65cf71b** `db/kv/mdbx: env-gated tx-lifecycle tracer + concurrent-tx dump` Permanent diagnostic gated behind `MDBX_TRACE_TX=true` env var. Zero overhead when disabled. When openTxs>1 at commit, dumps the stacks of all other live txs so any new concurrent-reader callsite is identified immediately. ### Cleanup - **b48c16eb23** `stagedsync, commitment, db/state, execution/protocol: remove leftover debug Printfs` Removed 67 unconditional debug Printfs from investigation work (FINALIZE_CHECK, REQUESTS dumps, FLUSH_CHECK, SEEK_CHECK, CALC_*, FLOW, LIFECYCLE, COINBASE_*, etc). Net -683 lines. Kept env-gated MDBX_TRACE_TX paths and conditional `if dbg.TraceXxx` infrastructure. ### Plus the prior branch history ~80 earlier commits on the architecture, BlockOverlay integration, channel ownership fixes, prune scheduling, etc. - supporting work that the commitment-calculator path depends on. ## Validation ### Run 12 - calculator architecture, no retirement gate, no FCU-collation fix - 7h35m at chain tip, 37 calculator publish cycles, 146 commits - 95.2% openTxs=1, 4.8% openTxs=2 (residual snapshot-retirement, addressed by `bb7bd2f8a3`) - Zero panics / wrong trie root / FATAL throughout ### Run 13 - calculator + retirement gate + tracer enabled (uncovered the at-tip growth bug) - 27h at tip; demonstrated the file-count-stuck-at-7 / CommitmentVals=11.79 GB / live=21.84 GB issue - Diagnosed root cause: collation never invoked from FCU path ### Run 14 - existing 26 GB chaindata + both at-tip-growth fixes - 11m: files 7 -> 10 per domain (collation kicked in immediately) - 33m: file count merged 10 -> 8, CommitmentVals 11.79 -> 9.38 GB - 95m: stable. Live data 21.84 -> 12.04 GB. Reclaimable 3.5 -> 14 GB. - Confirmed both fixes work against accumulated state ### Run 15 - PRISTINE chaindata baseline (the verdict run) - +15m: DB on disk **5.52 GB**. CommitmentVals 2.24 GB. Live 4.20 GB. - +77m: DB **6.59 GB** (~1 GB/h growth in steady state). CommitmentVals **2.45 GB** (plateaued). - New file `v2.0-X.8880-8888.kv` built across all 4 domains - file build at tip proven working. - File count merged 8 -> 6. - Within the 5-7 GB target band; no runaway accumulation. ## Scope and untested cases Validated by runs 12 + 13 + 14 + 15: chain-tip operation (small per-batch sizes), moderate initial sync (~1300-block batches), at-tip steady-state DB-size behavior. Calculator publishes correct roots at every batch boundary observed. **Not yet exercised**: bulk replay with batch sizes >2000 blocks (e.g. resync from far-behind). The "Bulk replay: re-execute 10k blocks" item in the test plan covers this. No reason to believe it's broken; just no live evidence yet. ## Architecture invariants (preserve these) For reviewers + future agents touching this code: 1. Worker-side `ApplyStateWrites` (NOT in apply goroutine) 2. Flush in execLoop before blockResult sent 3. Apply goroutine: NO `ctx.Done` case 4. Apply goroutine: NO inline `ComputeCommitment` 5. Apply goroutine: NO `blockApplied` send 6. Apply goroutine `*blockResult` handler: BAL + changeset + postValidator + RecentReceipts (concurrent with commitment) 7. Apply goroutine `rootResults` handler: root check + `lastCommittedBlockNum` bump (ONLY place these happen) 8. ExecLoop defer: close `commitResults` first, `applyResults` second 9. Apply goroutine drain-`rootResults` pattern on `!ok` 10. `triggerBatchCommitment` at three end-condition sites in execLoop, `return nil` after each 11. `BlockStateCache` allocated per-block in `executeBlocks`, passed via `TxTask` 12. Overlay-backed `blockTx` built once at `executeBlocks` entry 13. Step alignment check + `lastFrozenTxNum` at `executeBlocks` entry ## Test plan - [x] Run 12: 7h35m tip-chase soak - clean - [x] Run 13: 27h soak - exposed at-tip growth issue (root cause: FCU never kicked CollateAndPruneIfNeeded) - [x] Run 14: existing chaindata + fixes - shrunk live data 21.84 -> 12.04 GB, file step advanced 8878 -> 8887 - [x] Run 15: pristine chaindata - DB stays at 5.5-6.6 GB at steady state, file build at tip proven (8888+ reached) - [ ] Bulk replay: re-execute 10k blocks from a known-good baseline (the only untested batch-size regime) - [ ] SIGINT shutdown test: clean exit at every batch boundary - [ ] Race detector on parallel path - [ ] `exec3_finalize_test.go` revival --------- Co-authored-by: Mark Holt <erigon@dev-bm-e3-ethmainnet-n4.erigon.io> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com> Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
`WarmupCache.enabled` was always `true` — `Enable()`/`IsEnabled()` were never called in production code. Every hot-path method was paying an atomic load for nothing; all call sites already guard with `if cache != nil`. - Remove `enabled atomic.Bool` field and `Enable`/`IsEnabled` methods - Remove the `c.enabled.Load()` guard from every `Put*`/`Get*`/`Evict*` method - Remove the corresponding test (`TestWarmupCache_Enable`)
…s and eth_call (#20949) This PR fixes two inconsistencies in RPC behavior. First, in `eth_estimateGas`, the validation for blob fee caps was not working properly when a `blobBaseFee` block override was provided. This happened because estimate mode was skipping that validation when `NoBaseFee` was enabled. The fix ensures that even in estimate mode, blob fee caps are still properly validated when the transaction includes blobs. Second, in `eth_call`, the gas price calculation for EIP-1559 was using the original block header’s base fee instead of the overridden `BaseFeePerGas`. This could lead to incorrect `GASPRICE` and related fee calculations when block overrides were used. The fix makes `eth_call` construct its message and block context using the overridden header whenever header-based fee fields are changed. In simple terms, if a caller overrides block fee settings, the RPC now consistently uses those overridden values everywhere. The regression tests added cover: * Proper gas limit bounding when overridden in `eth_estimateGas` * Rejection of invalid blob base fee overrides * Correct impact of base fee overrides on `GASPRICE` calculations --------- Co-authored-by: lupin012 <58134934+lupin012@users.noreply.github.com> Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )
This change is