Skip to content

[bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks#21241

Merged
taratorio merged 4 commits into
bal-devnet-3from
bal-devnet-3-precompile-perf
May 19, 2026
Merged

[bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks#21241
taratorio merged 4 commits into
bal-devnet-3from
bal-devnet-3-precompile-perf

Conversation

@taratorio

Copy link
Copy Markdown
Member

cherry-pick of #21240 for bal-devnet-3 branch

need to verify with benchmarkoor first, do not merge for now

@taratorio taratorio requested review from mh0lt and yperbasis as code owners May 18, 2026 02:09
@taratorio taratorio changed the title cherry-pick bal parallel exec conflict resolution fix [bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks May 18, 2026
@taratorio taratorio changed the title [bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks [DO-NOT-MERGE][bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks May 18, 2026
@taratorio taratorio changed the title [DO-NOT-MERGE][bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks [bal-devnet-3] execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks May 19, 2026
@taratorio

Copy link
Copy Markdown
Member Author

Benchmarkoor: Baseline (before fixes) vs After fixes

Baseline run: 1779064698_0ecabe5f_erigon-bal-full — before any of our fixes; default cpuset_count: 6 config; default lazy KZG init.

After run: 1779109056_a101d4ed_erigon-bal-full — with the BAL tx.Apply read fix, KZG async warmup at backend.New(), and cpuset_count: 6

Image: erigon-local:traced for both. 17 tests, 2 messages each (engine_forkchoiceUpdatedV3 + engine_newPayloadV5). All values are engine_newPayloadV5.last MGas/s. Sorted by speedup desc.

Test Baseline (MGas/s) After fixes (MGas/s) Speedup
test_point_evaluation-120M 43.7 238.5 5.45×
test_mod-MOD-mod127-120M 113.8 532.9 4.68×
test_p256verify-120M 137.9 478.0 3.47×
test_mod-MOD-mod191-120M 184.5 631.7 3.42×
test_mod-MOD-mod63-120M 246.5 837.2 3.40×
test_mod-SMOD-mod191-120M 172.1 584.2 3.39×
test_ecrecover-120M 107.9 362.0 3.35×
test_mod-SMOD-mod127-120M 159.6 521.6 3.27×
test_mod-MOD-mod255-120M 198.3 639.2 3.22×
test_mod_arithmetic-MULMOD-mod191-120M 133.3 386.8 2.90×
test_mod_arithmetic-ADDMOD-mod191-120M 208.7 602.4 2.89×
test_mod-SMOD-mod255-120M 192.6 538.6 2.80×
test_mod-SMOD-mod63-120M 242.7 639.6 2.64×
test_blake2f_benchmark-rounds12-120M 138.8 318.0 2.29×
test_blake2f_benchmark-rounds24-120M 133.7 305.6 2.29×
test_blake2f_benchmark-rounds1-120M 135.4 293.2 2.16×
test_blake2f_benchmark-rounds6-120M 139.5 282.2 2.02×
average MGas/s 158.2 481.8 3.05×
total wall-time 15.24s 4.95s 3.08×

Where the speedup comes from

  • test_point_evaluation jumps 5.45× (43.7 → 238.5 MGas/s) primarily from the KZG async warmup in backend.New() — the very first block no longer pays the ~1.2s InitKZGCtx() cost on a worker goroutine.
  • The remaining ~2.0–4.7× wins on test_mod, test_p256verify, test_ecrecover, test_blake2f_benchmark, test_mod_arithmetic come from the BAL tx.Apply read fix (PR execution: optimise parallel exec with BALs for same-sender conflicts in precompile benchmarks #21240's same-sender retry storm fix only landed because the BAL bytes are now visible to the validation overlay)

@taratorio taratorio merged commit 93fafcc into bal-devnet-3 May 19, 2026
1 check failed
@taratorio taratorio deleted the bal-devnet-3-precompile-perf branch May 19, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant