Skip to content

[pull] main from erigontech:main#9

Merged
pull[bot] merged 3 commits into
Dustin4444:mainfrom
erigontech:main
May 5, 2026
Merged

[pull] main from erigontech:main#9
pull[bot] merged 3 commits into
Dustin4444:mainfrom
erigontech:main

Conversation

@pull

@pull pull Bot commented May 5, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )


This change is Reviewable

mh0lt and others added 3 commits May 5, 2026 08:24
## Summary

Parallel commitment calculation now runs in its own goroutine, in
parallel with execution and apply. The previous architecture serialized
commitment behind `ApplyStateWrites` in the apply goroutine; this PR
reinstates the three-stage pipeline.

Everything else in the PR is supplementary infrastructure to make this
work correctly under load.

## Headline change: parallel commitment

**The calculator is a third concurrent stage** consuming the same
`applyResult` stream as the apply goroutine via fan-out:

| Stage | Goroutine | Owns |
|---|---|---|
| **execLoop / workers** | producer | `ApplyStateWrites` via
`BlockStateCache`, finalize via IBS, Flush before sending |
| **Apply goroutine** | index-only consumer | `ApplyTxIndexes`,
accumulator, receipt + postValidator, `ProcessBAL`, changeset |
| **Commitment calculator** | parallel consumer | Commitment-domain
writes in `sd.mem` via own roTx + `asOfStateReader`; publishes on
`rootResults` |

End-of-batch trigger: `triggerBatchCommitment(ctx)` fires from execLoop
at three sites (`sizeEst > batchLimit`, `blockNum >= maxBlockNum`,
`StopAfterBlock`); apply goroutine consumes `rootResults` for root check
+ `lastCommittedBlockNum` bump. **BAL hash and state root are
independent** - `ProcessBAL` runs concurrent with commitment, not gated
on it.

Shutdown: execLoop closes `commitResults` first, `applyResults` second;
apply goroutine drains `rootResults` on `!ok`. **Apply loop has no
`ctx.Done` case** by design - execLoop owns shutdown sequencing.

## Changeset

### Headline (commitment calculator)
- **41c3b64a3b** `stagedsync: port commitment calculator architecture
forward onto merged branch`
Restores pre-merge `exec3_parallel.go` + `exec3.go` (1803 line delta in
exec3_parallel.go) - calculator goroutine, `commitResults` +
`rootResults` channels, `triggerBatchCommitment` driver, fan-out via
`sendResult`, four domain-flag toggles when calculator is authoritative
(`SetDisableInlineTouchKey`, `SetInMemHistoryReads`,
`EnableTrieWarmup=false`, `SetSkipStepBoundaryCommitment`), per-block
`BlockStateCache` allocation, overlay-backed `blockTx` for
executeBlocks, step-alignment check + `lastFrozenTxNum` gating.
Adapters: `BlockGasUsed` -> `BlockRegularGasUsed` (main split for
EIP-8037), re-added `VersionMap.StorageKeys()` for self-destruct delete
emission.

### Supplementary (openTxs=1 invariant for clean MDBX GC)

For commitment to run cleanly in parallel with apply, MDBX needs
`openTxs=1` at commit time so freelist pages get reclaimed. These fixes
close every observed concurrent-reader path:

- **640bce87ec** `execmodule/forkchoice: release bgRoTx before RW commit
to drop openTxs 2->1`
- **63cb4e4e99** `execmodule: two more openTxs=2 eliminations -
ValidateChain + CommitCycle`
- **bb7bd2f8a3** `db/kv: GatedRoDB wrapper + BlockRetire commit-gate
wiring`
Closes the residual ~5% openTxs=2 events that came from snapshot
retirement reads. New `kv.NewGatedRoDB(inner, gate)` wraps an RoDB so
each `db.View()` acquires `gate.RLock()`.
`BlockRetire.SetCommitGate(gate)` plumbs in the Aggregator's existing
CommitGate, transparently gating all retirement reads (DumpBlocks ->
DumpHeaders/Bodies/Txs -> BigChunks -> db.View).

### At-tip MDBX growth fixes (the bug that made this PR feel incomplete
until validated)

- **16ab0346c1** `execmodule, db/kv/mdbx: track
lastFlushedCommitmentTxNum on FCU + gate per-commit log`
The aggregator's collation safety cap was only being updated from
`ProcessFrozenBlocks` (initial-cycle path). During normal FCU at-tip
operation the cap stayed at its initial value forever, contributing to
the at-tip MDBX growth issue. Fixed by reading `KeyCommitmentState` from
the just-committed RW tx and calling `SetLastFlushedCommitmentTxNum`
after every FCU commit. Also gated the per-commit `[mdbx] commit` Info
log behind `MDBX_TRACE_TX` (was producing 525+ log lines per 27h at
tip).
- **071e9f98b9** `execmodule, db/state: kick CollateAndPruneIfNeeded on
FCU + adaptive prune budget`
`CollateAndPruneIfNeeded` (which kicks `BuildFilesInBackground`) was
only invoked from `StageLoopIteration`. At chain tip, blocks flow
through the FCU path so files were never built, prune had nothing to
mark stale, and MDBX accumulated indefinitely. Run 13 demonstrated this:
27h at tip, file count stuck at 7 per domain, CommitmentVals 11.79 GB,
total live data 21.84 GB on a 26 GB chaindata. Fixed by calling
`agg.CollateAndPruneIfNeeded(...)` from `runForkchoicePrune`. Also
adaptive prune budget: base = SecondsPerSlot/3 (= 4s on mainnet), +200ms
per 100 prunable steps, capped at 2/3 of slot.

### Diagnostics
- **0683cde973** `db/kv/mdbx: tx-lifecycle tracer (OPEN/COMMIT/ROLLBACK
with traceID + stack)`
- **22f65cf71b** `db/kv/mdbx: env-gated tx-lifecycle tracer +
concurrent-tx dump`
Permanent diagnostic gated behind `MDBX_TRACE_TX=true` env var. Zero
overhead when disabled. When openTxs>1 at commit, dumps the stacks of
all other live txs so any new concurrent-reader callsite is identified
immediately.

### Cleanup
- **b48c16eb23** `stagedsync, commitment, db/state, execution/protocol:
remove leftover debug Printfs`
Removed 67 unconditional debug Printfs from investigation work
(FINALIZE_CHECK, REQUESTS dumps, FLUSH_CHECK, SEEK_CHECK, CALC_*, FLOW,
LIFECYCLE, COINBASE_*, etc). Net -683 lines. Kept env-gated
MDBX_TRACE_TX paths and conditional `if dbg.TraceXxx` infrastructure.

### Plus the prior branch history
~80 earlier commits on the architecture, BlockOverlay integration,
channel ownership fixes, prune scheduling, etc. - supporting work that
the commitment-calculator path depends on.

## Validation

### Run 12 - calculator architecture, no retirement gate, no
FCU-collation fix
- 7h35m at chain tip, 37 calculator publish cycles, 146 commits
- 95.2% openTxs=1, 4.8% openTxs=2 (residual snapshot-retirement,
addressed by `bb7bd2f8a3`)
- Zero panics / wrong trie root / FATAL throughout

### Run 13 - calculator + retirement gate + tracer enabled (uncovered
the at-tip growth bug)
- 27h at tip; demonstrated the file-count-stuck-at-7 /
CommitmentVals=11.79 GB / live=21.84 GB issue
- Diagnosed root cause: collation never invoked from FCU path

### Run 14 - existing 26 GB chaindata + both at-tip-growth fixes
- 11m: files 7 -> 10 per domain (collation kicked in immediately)
- 33m: file count merged 10 -> 8, CommitmentVals 11.79 -> 9.38 GB
- 95m: stable. Live data 21.84 -> 12.04 GB. Reclaimable 3.5 -> 14 GB.
- Confirmed both fixes work against accumulated state

### Run 15 - PRISTINE chaindata baseline (the verdict run)
- +15m: DB on disk **5.52 GB**. CommitmentVals 2.24 GB. Live 4.20 GB.
- +77m: DB **6.59 GB** (~1 GB/h growth in steady state). CommitmentVals
**2.45 GB** (plateaued).
- New file `v2.0-X.8880-8888.kv` built across all 4 domains - file build
at tip proven working.
- File count merged 8 -> 6.
- Within the 5-7 GB target band; no runaway accumulation.

## Scope and untested cases

Validated by runs 12 + 13 + 14 + 15: chain-tip operation (small
per-batch sizes), moderate initial sync (~1300-block batches), at-tip
steady-state DB-size behavior. Calculator publishes correct roots at
every batch boundary observed.

**Not yet exercised**: bulk replay with batch sizes >2000 blocks (e.g.
resync from far-behind). The "Bulk replay: re-execute 10k blocks" item
in the test plan covers this. No reason to believe it's broken; just no
live evidence yet.

## Architecture invariants (preserve these)

For reviewers + future agents touching this code:
1. Worker-side `ApplyStateWrites` (NOT in apply goroutine)
2. Flush in execLoop before blockResult sent
3. Apply goroutine: NO `ctx.Done` case
4. Apply goroutine: NO inline `ComputeCommitment`
5. Apply goroutine: NO `blockApplied` send
6. Apply goroutine `*blockResult` handler: BAL + changeset +
postValidator + RecentReceipts (concurrent with commitment)
7. Apply goroutine `rootResults` handler: root check +
`lastCommittedBlockNum` bump (ONLY place these happen)
8. ExecLoop defer: close `commitResults` first, `applyResults` second
9. Apply goroutine drain-`rootResults` pattern on `!ok`
10. `triggerBatchCommitment` at three end-condition sites in execLoop,
`return nil` after each
11. `BlockStateCache` allocated per-block in `executeBlocks`, passed via
`TxTask`
12. Overlay-backed `blockTx` built once at `executeBlocks` entry
13. Step alignment check + `lastFrozenTxNum` at `executeBlocks` entry

## Test plan

- [x] Run 12: 7h35m tip-chase soak - clean
- [x] Run 13: 27h soak - exposed at-tip growth issue (root cause: FCU
never kicked CollateAndPruneIfNeeded)
- [x] Run 14: existing chaindata + fixes - shrunk live data 21.84 ->
12.04 GB, file step advanced 8878 -> 8887
- [x] Run 15: pristine chaindata - DB stays at 5.5-6.6 GB at steady
state, file build at tip proven (8888+ reached)
- [ ] Bulk replay: re-execute 10k blocks from a known-good baseline (the
only untested batch-size regime)
- [ ] SIGINT shutdown test: clean exit at every batch boundary
- [ ] Race detector on parallel path
- [ ] `exec3_finalize_test.go` revival

---------

Co-authored-by: Mark Holt <erigon@dev-bm-e3-ethmainnet-n4.erigon.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
`WarmupCache.enabled` was always `true` — `Enable()`/`IsEnabled()` were
never called in production code. Every hot-path method was paying an
atomic load for nothing; all call sites already guard with `if cache !=
nil`.

- Remove `enabled atomic.Bool` field and `Enable`/`IsEnabled` methods
- Remove the `c.enabled.Load()` guard from every `Put*`/`Get*`/`Evict*`
method
- Remove the corresponding test (`TestWarmupCache_Enable`)
…s and eth_call (#20949)

This PR fixes two inconsistencies in RPC behavior.

First, in `eth_estimateGas`, the validation for blob fee caps was not
working properly when a `blobBaseFee` block override was provided. This
happened because estimate mode was skipping that validation when
`NoBaseFee` was enabled. The fix ensures that even in estimate mode,
blob fee caps are still properly validated when the transaction includes
blobs.

Second, in `eth_call`, the gas price calculation for EIP-1559 was using
the original block header’s base fee instead of the overridden
`BaseFeePerGas`. This could lead to incorrect `GASPRICE` and related fee
calculations when block overrides were used. The fix makes `eth_call`
construct its message and block context using the overridden header
whenever header-based fee fields are changed.

In simple terms, if a caller overrides block fee settings, the RPC now
consistently uses those overridden values everywhere.

The regression tests added cover:

* Proper gas limit bounding when overridden in `eth_estimateGas`
* Rejection of invalid blob base fee overrides
* Correct impact of base fee overrides on `GASPRICE` calculations

---------

Co-authored-by: lupin012 <58134934+lupin012@users.noreply.github.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
@pull pull Bot locked and limited conversation to collaborators May 5, 2026
@pull pull Bot added the ⤵️ pull label May 5, 2026
@pull pull Bot merged commit efd3a36 into Dustin4444:main May 5, 2026
1 check was pending
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants