common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases by mh0lt · Pull Request #21516 · erigontech/erigon

mh0lt · 2026-05-29T21:33:33Z

Summary

Adds an opt-in profiling surface for the parallel execution stack. Two pieces, both default-off so this is a no-functional-change PR when the env knob is unset.

1. ERIGON_PERF_PROFILES=true env knob — at package init in common/dbg, enables runtime.SetBlockProfileRate(1) and runtime.SetMutexProfileFraction(1), populating /debug/pprof/{block,mutex} for blocking and contention analysis. Default false matches today's behaviour.

2. pprof goroutine labels on the parallel exec hot path — fires unconditionally (cheap pointer writes to G-local label storage), but only useful when the CPU profiler is on. Labels:

label	location
`phase=pe-exec`	`parallelExecutor.exec` is wrapped in `pprof.Do(...)`, so all child goroutines inherit via context
`sub=exec-worker`	`(*Worker).Run` per-task workers
`sub=exec-loop`	`parallelExecutor.execLoop` block scheduler
`sub=calculator`	`commitmentCalculator.loop` commitment computation

These make it possible to filter /debug/pprof/profile to the parallel-exec phase via the pprof tags axis and separate dispatch from EVM from commitment without code-side wall-clock instrumentation.

Why

Parallel-exec perf work needs to attribute CPU to four buckets — dispatch overhead, EVM execution, IO reads, and in-memory writes/version-map — to know where each optimisation lands. Without phase/sub labels, every pprof read mixes pe-exec CPU with txpool, p2p, GC, snapshot-build, etc. With this PR, one tags query separates them cleanly.

Validation

Pulled two 30s CPU profiles against a mainnet node executing live with ERIGON_PERF_PROFILES=true.

Catchup window (5000-block big-jump, 257% CPU)

```
phase: Total 67.04s of 77.70s (86.28%)
67.04s (86.28%): pe-exec

sub: exec-worker 49.13s (63.23%)
exec-loop 13.53s (17.41%)
calculator 1.47s ( 1.89%)
```

Tip window (NewPayload at slot tip, 39.6% CPU, mostly idle)

```
phase: Total 1.27s of 11.92s (10.65%)
1.27s (10.65%): pe-exec

sub: calculator 0.62s (5.20%)
exec-worker 0.47s (3.94%)
exec-loop 0.14s (1.17%)
```

Different regimes flip which sub dominates — at catchup workers saturate, at tip commitment is the largest slice. The labels split both cleanly. CPU under phase=pe-exec with no sub is the apply-loop main goroutine (per-block result handling, ~3.7% in catchup, ~3% at tip).

Test plan

`make erigon` clean
`make lint` clean (two passes — linter is non-deterministic)
Live on a mainnet node — process stable 6h+, no behaviour change vs main, RSS steady
`/debug/pprof/profile` tags axis populated as documented in catchup and tip windows
`/debug/pprof/{block,mutex}` populated when env knob is on

…llel exec phases Adds an opt-in profiling surface for the parallel execution stack, default-off. ERIGON_PERF_PROFILES=true enables runtime.SetBlockProfileRate(1) and SetMutexProfileFraction(1) at package init in common/dbg, populating /debug/pprof/{block,mutex} for blocking and contention analysis. Wraps parallelExecutor.exec in pprof.Do(phase=pe-exec) so child goroutines inherit the phase tag via context. Adds sub-labels on the goroutines: - exec-worker (exec.Worker.Run per-task workers) - exec-loop (parallelExecutor.execLoop block scheduler) - calculator (commitmentCalculator.loop commitment computation) Pure-additive: no behavior change when env unset (default false), and only cheap label sets at goroutine entry when on.

…ctor (L2 on top of #21516) Lifts the typed-vio refactor (AddressEntry struct with per-AccountPath typed fields, generic WriteCell[T], typed VersionedRead/Write[T], typed PerPath ReadSet/WriteSet on IBS, per-T sync.Pool) onto current main on top of the L1 profiling instrumentation merged as #21516. Replaces the untyped any-boxed WriteCell/ReadSet shape across: - execution/state/versionmap.go: VersionMap.s now map[Address]*AddressEntry with typed btree.Map[int, *WriteCell[T]] per AccountPath - execution/state/versionedio.go: typed VersionedRead[T] / VersionedWrite[T] primitives + per-T pools; AnyVersionedWrite interface for collections - execution/state/intra_block_state.go: typed PerPath{Read,Write}Set fields on IBS + recordVR/recordVW helpers - execution/state/state_object.go, journal.go, rw_v3.go, read_paths.go (new): consumers updated to typed shape - execution/stagedsync/exec3_*.go, committer.go, calc_state.go, exec3_filter.go: VersionedWrites consumers migrated to AnyVersionedWrite iteration; commitment touch path uses VersionedWrites.TouchUpdates over the typed shape - execution/exec/block_assembler.go, txtask.go: PerPathReadSet/WriteSet threading - execution/types/accounts/code.go (new): canonical accounts.Code value type used by CodePath WriteCell[T] Shims used to isolate from the non-vio refactors landed on mh/perf-all-followups that vio is NOT structurally dependent on: - execution/cache/state_cache.go: PutCode(hash, code) pass-through (no weak-pointer code-cache landing required) - execution/commitment/vio_shims.go: no-op Record{Sstore*, HasStorageMiss} counters (commitment-metrics landing not required) - common/dbg: BAL{DrivenCommitment, ShadowCompute} stubbed false (BAL-driven commitment landing not required) - execution/balcache: stub package with CachedBlockAccessList returning no-cached (BAL cache landing not required) - db/state/aggregator_vio_shim.go: SetMaxCollationTxNum no-op - execution/state/prewarm_shim.go: PrewarmBlockStateCacheFromBAL no-op - execution/state/access_set_shim.go: AccessSet type + IBS.AccessedAddresses returning nil (access tracking folded into ReadSet under typed vio; api-compat shim for txtask + block_assembler) - execution/state/vio_test_shim.go: VersionedIO.RecordAccesses no-op used only by execution/tests/blockgen chain_makers.go - types.CodeChange.Hash literal stripped (typed-CodeChange landing not required; main's CodeChange has only Index+Bytecode) - mdgas.MdGas refund flattened back to uint64 across IBS/journal/ state_object (multi-dim gas landing not required) - Tracing reasons restored on IBS.SetNonce/SetCode + stateObject.SetNonce/SetCode signatures (main has them; chain-end dropped them — restored so main's evm/vm/aura/bor call sites work) - TouchPlainKeyDirect call sites in versionedio.go and calc_state.go rewritten to main's untyped (string, *Update) signature (typed-commitment-handle landing not required) - Hash.Bytes() → Hash[:] across stagedsync (no Hash.Bytes method on main) - td declared *uint256.Int to match main's chainReader.GetTd return type Foundation for tip-perf alloc reduction in the parallel-exec hot path (versionedRead[T] CPU + WriteSet.Set/ReadSet.Set allocs). Builds clean end-to-end; lint and on-tip characterisation follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mh0lt requested review from AskAlexSharov and yperbasis as code owners May 29, 2026 21:33

AskAlexSharov approved these changes May 30, 2026

View reviewed changes

AskAlexSharov added this pull request to the merge queue May 30, 2026

Merged via the queue into main with commit a192da3 May 30, 2026
90 checks passed

AskAlexSharov deleted the mh/perf-profiling-labels branch May 30, 2026 05:21

mh0lt mentioned this pull request May 31, 2026

execution/state: typed-vio refactor (L2 — typed AddressEntry + WriteCell[T] + sync.Pool) #21536

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases#21516

common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases#21516
AskAlexSharov merged 1 commit into
mainfrom
mh/perf-profiling-labels

mh0lt commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mh0lt commented May 29, 2026

Summary

Why

Validation

Catchup window (5000-block big-jump, 257% CPU)

Tip window (NewPayload at slot tip, 39.6% CPU, mostly idle)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants