Skip to content

[bal-devnet-3] perf: exec parallel workers=numcpu#21296

Merged
taratorio merged 6 commits into
bal-devnet-3from
bal-devnet-3-precompile-perf
May 20, 2026
Merged

[bal-devnet-3] perf: exec parallel workers=numcpu#21296
taratorio merged 6 commits into
bal-devnet-3from
bal-devnet-3-precompile-perf

Conversation

@taratorio

Copy link
Copy Markdown
Member

Benchmarkoor: NumCPU/2 (previous) vs NumCPU (new) for Exec3Workers default

Previous run 1779109056_a101d4ed_erigon-bal-fullnumWorkers = runtime.NumCPU() / 2 = 8 (on this 16-core box).
New run 1779246695_cc99a1f0_erigon-bal-fullnumWorkers = runtime.NumCPU() = 16.

Both runs use the same erigon-local:traced image with the KZG async warmup + BAL tx.Apply read fix; only the worker count changed. Sorted by speedup desc.

Test NumCPU/2 (MGas/s) NumCPU (MGas/s) Speedup
test_mod_arithmetic-MULMOD-mod191-120M 386.8 593.3 1.53×
test_mod-SMOD-mod255-120M 538.6 794.3 1.47×
test_mod-SMOD-mod63-120M 639.6 929.2 1.45×
test_point_evaluation-120M 238.5 343.2 1.44×
test_mod-MOD-mod127-120M 532.9 711.1 1.33×
test_mod-MOD-mod191-120M 631.7 821.4 1.30×
test_mod_arithmetic-ADDMOD-mod191-120M 602.4 782.5 1.30×
test_mod-SMOD-mod127-120M 521.6 668.8 1.28×
test_mod-MOD-mod255-120M 639.2 803.9 1.26×
test_ecrecover-120M 362.0 455.0 1.26×
test_blake2f_benchmark-rounds6-120M 282.2 352.8 1.25×
test_mod-SMOD-mod191-120M 584.2 720.8 1.23×
test_blake2f_benchmark-rounds24-120M 305.6 356.1 1.17×
test_p256verify-120M 478.0 556.2 1.16×
test_blake2f_benchmark-rounds1-120M 293.2 336.3 1.15×
test_mod-MOD-mod63-120M 837.2 872.6 1.04×
test_blake2f_benchmark-rounds12-120M 318.0 313.9 0.99×
average MGas/s 481.8 612.4 1.27×

Doubling the default worker count gives ~27% throughput improvement on average, with the biggest wins on heavier mod-arithmetic and SMOD tests (45–53%) and the smallest impact on blake2f_rounds12 (basically noise).

@taratorio taratorio requested review from mh0lt and yperbasis May 20, 2026 03:34
@taratorio taratorio merged commit 9de957d into bal-devnet-3 May 20, 2026
1 check failed
@taratorio taratorio deleted the bal-devnet-3-precompile-perf branch May 20, 2026 03:35
taratorio added a commit that referenced this pull request May 20, 2026
default

**Previous run** `1779109056_a101d4ed_erigon-bal-full` — `numWorkers =
runtime.NumCPU() / 2 = 8` (on this 16-core box).
**New run** `1779246695_cc99a1f0_erigon-bal-full` — `numWorkers =
runtime.NumCPU() = 16`.

Both runs use the same `erigon-local:traced` image with the KZG async
warmup + BAL `tx.Apply` read fix; only the worker count changed. Sorted
by speedup desc.

| Test | NumCPU/2 (MGas/s) | NumCPU (MGas/s) | Speedup |
|---|---:|---:|---:|
| `test_mod_arithmetic-MULMOD-mod191-120M` | 386.8 | 593.3 | **1.53×** |
| `test_mod-SMOD-mod255-120M` | 538.6 | 794.3 | 1.47× |
| `test_mod-SMOD-mod63-120M` | 639.6 | 929.2 | 1.45× |
| `test_point_evaluation-120M` | 238.5 | 343.2 | 1.44× |
| `test_mod-MOD-mod127-120M` | 532.9 | 711.1 | 1.33× |
| `test_mod-MOD-mod191-120M` | 631.7 | 821.4 | 1.30× |
| `test_mod_arithmetic-ADDMOD-mod191-120M` | 602.4 | 782.5 | 1.30× |
| `test_mod-SMOD-mod127-120M` | 521.6 | 668.8 | 1.28× |
| `test_mod-MOD-mod255-120M` | 639.2 | 803.9 | 1.26× |
| `test_ecrecover-120M` | 362.0 | 455.0 | 1.26× |
| `test_blake2f_benchmark-rounds6-120M` | 282.2 | 352.8 | 1.25× |
| `test_mod-SMOD-mod191-120M` | 584.2 | 720.8 | 1.23× |
| `test_blake2f_benchmark-rounds24-120M` | 305.6 | 356.1 | 1.17× |
| `test_p256verify-120M` | 478.0 | 556.2 | 1.16× |
| `test_blake2f_benchmark-rounds1-120M` | 293.2 | 336.3 | 1.15× |
| `test_mod-MOD-mod63-120M` | 837.2 | 872.6 | 1.04× |
| `test_blake2f_benchmark-rounds12-120M` | 318.0 | 313.9 | 0.99× |
| **average MGas/s** | **481.8** | **612.4** | **1.27×** |

Doubling the default worker count gives **~27% throughput improvement on
average**, with the biggest wins on heavier mod-arithmetic and SMOD
tests (45–53%) and the smallest impact on blake2f_rounds12 (basically
noise).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants