execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks#21240
Conversation
…into worktree-opt-point-eval
|
do not merge yet, verifying with benchmarkoor on the real suite runs first |
|
@yperbasis @mh0lt results from running these changes on bal-devnet-3 benchmarkoor tests in question:
from #21241 (comment) |
There was a problem hiding this comment.
Pull request overview
Fixes a parallel-execution retry-storm bug in the BAL (EIP-7928) read validator: validateRead was using BalancePath as a proxy for "account created by a prior tx", but BalancePath also fires for ordinary gas-paying balance updates and BAL pre-population — causing every same-sender tx to be falsely invalidated. The fix swaps the cross-check to IncarnationPath, which is written only by CreateAccount / SelfDestruct. Also adds a KZG warmup at node start and per-request profiling plumbing to engine_x runner.
Changes:
- Replace
BalancePathcross-check withIncarnationPathcross-check in theAddressPath/MVReadResultNonearm ofvalidateRead, with three new unit tests (BAL no-conflict, no-BAL conflict still detected, prior-creation still detected). - Eagerly initialise KZG crypto context in a goroutine during node
New()so the first block doesn't pay the init cost. - Add
EngineXTestRunnerOption/RequestProfileHooktoengineapitesterand wire CPU + fgprof per-request profile flags into theevm engine_xCLI.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| execution/state/versionmap.go | Swap BalancePath for IncarnationPath cross-check on AddressPath storage-fallback reads. |
| execution/state/versionmap_test.go | Three new tests covering BAL no-conflict, no-BAL conflict detection, and prior-account-creation invalidation via IncarnationPath. |
| node/eth/backend.go | Asynchronously warm up KZG context at startup. |
| execution/engineapi/engineapitester/engine_x_test_runner.go | Add functional options and RequestProfileHook plumbing around processNewPayload / processFcu. |
| cmd/evm/enginexrunner.go | Add --pprof.cpu / --fgprof flags (requiring --workers=1) and per-request profile hook implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
LGTM on the core fix — replacing the BalancePath cross-check with IncarnationPath is the right narrowing, and the benchmark wins on bal-devnet-3 corroborate it. A few non-blocking notes:
-
PR body mentions a bench file that's not in the diff.
execution/engineapi/amsterdam_bal_point_eval_bench_test.gois described as added (gated byEXEC3_PARALLEL=true), but it's not in the file list. Either it was forgotten or the description is stale. -
Stale test docstring.
versionmap_test.go:449says "Status: red until validateRead grows an IncarnationPath cross-check…". Since the check is added in this same PR, the test is green — worth rewording to describe the invariant being asserted rather than the red-state history. -
The "AddressPath filtered out of the per-tx flush" rationale is slightly off. The new comment at
versionmap.go:441-453(and the matching test comments) say "under HasBAL the worker's AddressPath write is filtered out of the per-tx flush (BAL doesn't list AddressPath, so normalize drops it)". Tracing the flow:IntraBlockState.VersionedWrites(false)keeps AddressPath for non-SD accounts, andnormalizeWriteSet(which does drop AddressPath, atexec3_parallel.go:3406) runs at finalize — not on the per-tx flush to the VersionMap. The fix is still correct (IncarnationPath is strictly more specific than BalancePath, and the SD case is already caught by the SelfDestructPath cross-check earlier in the arm), but the stated reason for needing the cross-check at all could be tightened. -
TestValidateRead_PriorAccountCreation_DetectedViaIncarnationPathsimulates a hypothetical state. It sets up IncarnationPath in the VM without AddressPath. In production for non-SD CreateAccount, the worker also flushes AddressPath, so the MVReadResultDone arm (already covered byTestValidateRead_HasBAL_NoBypassForAddressPath) catches that case. The new test is still useful as a guard on the cross-check semantics, but the framing as "the gap PR #19628 was trying to plug" is a bit hypothetical given the actual flush behavior.
default **Previous run** `1779109056_a101d4ed_erigon-bal-full` — `numWorkers = runtime.NumCPU() / 2 = 8` (on this 16-core box). **New run** `1779246695_cc99a1f0_erigon-bal-full` — `numWorkers = runtime.NumCPU() = 16`. Both runs use the same `erigon-local:traced` image with the KZG async warmup + BAL `tx.Apply` read fix; only the worker count changed. Sorted by speedup desc. | Test | NumCPU/2 (MGas/s) | NumCPU (MGas/s) | Speedup | |---|---:|---:|---:| | `test_mod_arithmetic-MULMOD-mod191-120M` | 386.8 | 593.3 | **1.53×** | | `test_mod-SMOD-mod255-120M` | 538.6 | 794.3 | 1.47× | | `test_mod-SMOD-mod63-120M` | 639.6 | 929.2 | 1.45× | | `test_point_evaluation-120M` | 238.5 | 343.2 | 1.44× | | `test_mod-MOD-mod127-120M` | 532.9 | 711.1 | 1.33× | | `test_mod-MOD-mod191-120M` | 631.7 | 821.4 | 1.30× | | `test_mod_arithmetic-ADDMOD-mod191-120M` | 602.4 | 782.5 | 1.30× | | `test_mod-SMOD-mod127-120M` | 521.6 | 668.8 | 1.28× | | `test_mod-MOD-mod255-120M` | 639.2 | 803.9 | 1.26× | | `test_ecrecover-120M` | 362.0 | 455.0 | 1.26× | | `test_blake2f_benchmark-rounds6-120M` | 282.2 | 352.8 | 1.25× | | `test_mod-SMOD-mod191-120M` | 584.2 | 720.8 | 1.23× | | `test_blake2f_benchmark-rounds24-120M` | 305.6 | 356.1 | 1.17× | | `test_p256verify-120M` | 478.0 | 556.2 | 1.16× | | `test_blake2f_benchmark-rounds1-120M` | 293.2 | 336.3 | 1.15× | | `test_mod-MOD-mod63-120M` | 837.2 | 872.6 | 1.04× | | `test_blake2f_benchmark-rounds12-120M` | 318.0 | 313.9 | 0.99× | | **average MGas/s** | **481.8** | **612.4** | **1.27×** | Doubling the default worker count gives **~27% throughput improvement on average**, with the biggest wins on heavier mod-arithmetic and SMOD tests (45–53%) and the smallest impact on blake2f_rounds12 (basically noise).
|
another gain: #21296 Benchmarkoor: NumCPU/2 (previous) vs NumCPU (new) for
|
| Test | NumCPU/2 (MGas/s) | NumCPU (MGas/s) | Speedup |
|---|---|---|---|
test_mod_arithmetic-MULMOD-mod191-120M |
386.8 | 593.3 | 1.53× |
test_mod-SMOD-mod255-120M |
538.6 | 794.3 | 1.47× |
test_mod-SMOD-mod63-120M |
639.6 | 929.2 | 1.45× |
test_point_evaluation-120M |
238.5 | 343.2 | 1.44× |
test_mod-MOD-mod127-120M |
532.9 | 711.1 | 1.33× |
test_mod-MOD-mod191-120M |
631.7 | 821.4 | 1.30× |
test_mod_arithmetic-ADDMOD-mod191-120M |
602.4 | 782.5 | 1.30× |
test_mod-SMOD-mod127-120M |
521.6 | 668.8 | 1.28× |
test_mod-MOD-mod255-120M |
639.2 | 803.9 | 1.26× |
test_ecrecover-120M |
362.0 | 455.0 | 1.26× |
test_blake2f_benchmark-rounds6-120M |
282.2 | 352.8 | 1.25× |
test_mod-SMOD-mod191-120M |
584.2 | 720.8 | 1.23× |
test_blake2f_benchmark-rounds24-120M |
305.6 | 356.1 | 1.17× |
test_p256verify-120M |
478.0 | 556.2 | 1.16× |
test_blake2f_benchmark-rounds1-120M |
293.2 | 336.3 | 1.15× |
test_mod-MOD-mod63-120M |
837.2 | 872.6 | 1.04× |
test_blake2f_benchmark-rounds12-120M |
318.0 | 313.9 | 0.99× |
| average MGas/s | 481.8 | 612.4 | 1.27× |
Doubling the default worker count gives ~27% throughput improvement on average, with the biggest wins on heavier mod-arithmetic and SMOD tests (45–53%) and the smallest impact on blake2f_rounds12 (basically noise).
Notable adaptation: the merge brought in PR #21240 ("optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks"), which replaced the BalancePath cross-check in validateRead's AddressPath / MVReadResultNone arm with an IncarnationPath cross-check to avoid a same-sender BAL retry storm. moksha has already removed IncarnationPath as part of the incarnation cleanup (#12440). The substitute on this branch is CreateContractPath — written only by the CREATE / CREATE2 branch of IBS.CreateAccount, never by UpdateAccountData and never by BAL pre-population, so it preserves PR #21240's performance fix: - BAL same-sender retry storm: still gone (CreateContractPath isn't pre-populated by BAL, isn't emitted by routine balance/nonce updates). - PR #19628's regression (stale AddressPath storage-fallback after prior-tx contract creation) is still caught. Gap vs upstream: EOA-creation-via-CALL+value (IBS.CreateAccount with contractCreation=false) emits BalancePath only, no CreateContractPath on moksha — so this cross-check doesn't fire for that scenario. Downstream value-tiebreaker validation on BalancePath/NoncePath catches real EOA-balance staleness, and the practical impact is limited (EOAs have no storage, so a stale AddressPath nil read of an EOA can't produce divergent slot reads). Renamed TestValidateRead_PriorAccountCreation_DetectedViaIncarnationPath → TestValidateRead_PriorContractCreation_DetectedViaCreateContractPath to reflect the moksha signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
for #14522 |
Point-evaluation precompile: BAL parallel-exec optimisations
Performance
Same-sender Amsterdam BAL parallel
newPayloadvalidate, EEST 150 M point_evaluation workload (linear projection from a 9 × 5 M synthesised in-process run):invalid=0retries on the user-tx block (wasinvalid=8, ~47 % retry rate).Issue & root cause
validateReadinexecution/state/versionmap.goinvalidated every same-sender tx under BAL pre-pop, forcing serial re-execution and collapsing parallel throughput.The
AddressPath/MVReadResultNonearm usedBalancePathas a proxy for "a prior tx created this account":BalancePathis written by three events, only one of which is a creation:CreateAccount(the real bug PR Fix bal-devnet-2 system address filter and AccessedAddresses propagation #19628 was trying to catch).UpdateAccountDataon a pre-existing account — fires for every gas-paying same-sender tx.Under same-sender BAL workloads (the EEST point_evaluation cachable benchmark), the BAL pre-populates
BalancePathatTxIdx = 0..N-1for the shared sender. Every tx after tx 0 then false-invalidated itsAddressPathstorage-fallback read, producing the retry storm.Fix
Replace the
BalancePathcross-check with anIncarnationPathcross-check.IncarnationPathis the specific signal for account-existence change: written only byCreateAccountandSelfDestruct, never byUpdateAccountData, never byWriteChanges(BAL pre-pop). The existingversionedReadstorage-zero-out path already usesIncarnationPaththis way forStoragePath.This also subsumes the secondary "precompile-touch records phantom AddressPath read" retry-storm path: precompiles are never
CreateAccount'd orSelfDestruct'd, so aTouchAccounton0x01-0x0ano longer trips invalidation under the new cross-check. No EVM-level change needed.Correctness
CreateAccountwritesIncarnationPath, so the stale-AddressPath-after-account-creation case is still caught. New unit testTestValidateRead_PriorAccountCreation_DetectedViaIncarnationPathasserts this (failed without the cross-check, passes with theIncarnationPathform).SelfDestructPathcross-check remains in place at the start of the arm.MVReadResultDonearm's value-tiebreaker, which is untouched by this change. New unit testTestNoBAL_SameSenderTxs_DetectsConflictsexercises the 9-tx-same-sender flow and asserts 8 invalidations.WriteChanges(BAL pre-pop) doesn't touchIncarnationPath, so for pure balance/nonce/code/storage updates the cross-check stays silent. New unit testTestBALPrePop_SameSenderTxs_NoConflictsconfirms zero conflicts on the synthesised 9-tx BAL workload.Tests added
execution/state/versionmap_test.go:TestBALPrePop_SameSenderTxs_NoConflicts— same-sender BAL retry-storm reproducer; green after the fix.TestValidateRead_PriorAccountCreation_DetectedViaIncarnationPath— PR Fix bal-devnet-2 system address filter and AccessedAddresses propagation #19628 regression coverage; red without the cross-check, green with theIncarnationPathversion.TestNoBAL_SameSenderTxs_DetectsConflicts— confirms the no-BAL conflict-detection path still fires.execution/engineapi/amsterdam_bal_point_eval_bench_test.go(new file, gated byEXEC3_PARALLEL=true) — in-process Amsterdam BAL benchmark that produced the performance numbers above. Mirrors the EEST point_evaluation workload (9 txs × ~5 M gas calling KZG-loop contract) end-to-end viaMockCl.BuildNewPayload+InsertNewPayload.