common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases#21516
Merged
Conversation
…llel exec phases
Adds an opt-in profiling surface for the parallel execution stack, default-off.
ERIGON_PERF_PROFILES=true enables runtime.SetBlockProfileRate(1) and
SetMutexProfileFraction(1) at package init in common/dbg, populating
/debug/pprof/{block,mutex} for blocking and contention analysis.
Wraps parallelExecutor.exec in pprof.Do(phase=pe-exec) so child goroutines
inherit the phase tag via context. Adds sub-labels on the goroutines:
- exec-worker (exec.Worker.Run per-task workers)
- exec-loop (parallelExecutor.execLoop block scheduler)
- calculator (commitmentCalculator.loop commitment computation)
Pure-additive: no behavior change when env unset (default false), and only
cheap label sets at goroutine entry when on.
AskAlexSharov
approved these changes
May 30, 2026
Open
4 tasks
mh0lt
added a commit
that referenced
this pull request
May 31, 2026
…ctor (L2 on top of #21516) Lifts the typed-vio refactor (AddressEntry struct with per-AccountPath typed fields, generic WriteCell[T], typed VersionedRead/Write[T], typed PerPath ReadSet/WriteSet on IBS, per-T sync.Pool) onto current main on top of the L1 profiling instrumentation merged as #21516. Replaces the untyped any-boxed WriteCell/ReadSet shape across: - execution/state/versionmap.go: VersionMap.s now map[Address]*AddressEntry with typed btree.Map[int, *WriteCell[T]] per AccountPath - execution/state/versionedio.go: typed VersionedRead[T] / VersionedWrite[T] primitives + per-T pools; AnyVersionedWrite interface for collections - execution/state/intra_block_state.go: typed PerPath{Read,Write}Set fields on IBS + recordVR/recordVW helpers - execution/state/state_object.go, journal.go, rw_v3.go, read_paths.go (new): consumers updated to typed shape - execution/stagedsync/exec3_*.go, committer.go, calc_state.go, exec3_filter.go: VersionedWrites consumers migrated to AnyVersionedWrite iteration; commitment touch path uses VersionedWrites.TouchUpdates over the typed shape - execution/exec/block_assembler.go, txtask.go: PerPathReadSet/WriteSet threading - execution/types/accounts/code.go (new): canonical accounts.Code value type used by CodePath WriteCell[T] Shims used to isolate from the non-vio refactors landed on mh/perf-all-followups that vio is NOT structurally dependent on: - execution/cache/state_cache.go: PutCode(hash, code) pass-through (no weak-pointer code-cache landing required) - execution/commitment/vio_shims.go: no-op Record{Sstore*, HasStorageMiss} counters (commitment-metrics landing not required) - common/dbg: BAL{DrivenCommitment, ShadowCompute} stubbed false (BAL-driven commitment landing not required) - execution/balcache: stub package with CachedBlockAccessList returning no-cached (BAL cache landing not required) - db/state/aggregator_vio_shim.go: SetMaxCollationTxNum no-op - execution/state/prewarm_shim.go: PrewarmBlockStateCacheFromBAL no-op - execution/state/access_set_shim.go: AccessSet type + IBS.AccessedAddresses returning nil (access tracking folded into ReadSet under typed vio; api-compat shim for txtask + block_assembler) - execution/state/vio_test_shim.go: VersionedIO.RecordAccesses no-op used only by execution/tests/blockgen chain_makers.go - types.CodeChange.Hash literal stripped (typed-CodeChange landing not required; main's CodeChange has only Index+Bytecode) - mdgas.MdGas refund flattened back to uint64 across IBS/journal/ state_object (multi-dim gas landing not required) - Tracing reasons restored on IBS.SetNonce/SetCode + stateObject.SetNonce/SetCode signatures (main has them; chain-end dropped them — restored so main's evm/vm/aura/bor call sites work) - TouchPlainKeyDirect call sites in versionedio.go and calc_state.go rewritten to main's untyped (string, *Update) signature (typed-commitment-handle landing not required) - Hash.Bytes() → Hash[:] across stagedsync (no Hash.Bytes method on main) - td declared *uint256.Int to match main's chainReader.GetTd return type Foundation for tip-perf alloc reduction in the parallel-exec hot path (versionedRead[T] CPU + WriteSet.Set/ReadSet.Set allocs). Builds clean end-to-end; lint and on-tip characterisation follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in profiling surface for the parallel execution stack. Two pieces, both default-off so this is a no-functional-change PR when the env knob is unset.
1.
ERIGON_PERF_PROFILES=trueenv knob — at package init incommon/dbg, enablesruntime.SetBlockProfileRate(1)andruntime.SetMutexProfileFraction(1), populating/debug/pprof/{block,mutex}for blocking and contention analysis. Defaultfalsematches today's behaviour.2. pprof goroutine labels on the parallel exec hot path — fires unconditionally (cheap pointer writes to G-local label storage), but only useful when the CPU profiler is on. Labels:
phase=pe-execparallelExecutor.execis wrapped inpprof.Do(...), so all child goroutines inherit via contextsub=exec-worker(*Worker).Runper-task workerssub=exec-loopparallelExecutor.execLoopblock schedulersub=calculatorcommitmentCalculator.loopcommitment computationThese make it possible to filter
/debug/pprof/profileto the parallel-exec phase via the pprof tags axis and separate dispatch from EVM from commitment without code-side wall-clock instrumentation.Why
Parallel-exec perf work needs to attribute CPU to four buckets — dispatch overhead, EVM execution, IO reads, and in-memory writes/version-map — to know where each optimisation lands. Without phase/sub labels, every pprof read mixes pe-exec CPU with txpool, p2p, GC, snapshot-build, etc. With this PR, one tags query separates them cleanly.
Validation
Pulled two 30s CPU profiles against a mainnet node executing live with
ERIGON_PERF_PROFILES=true.Catchup window (5000-block big-jump, 257% CPU)
```
phase: Total 67.04s of 77.70s (86.28%)
67.04s (86.28%): pe-exec
sub: exec-worker 49.13s (63.23%)
exec-loop 13.53s (17.41%)
calculator 1.47s ( 1.89%)
```
Tip window (NewPayload at slot tip, 39.6% CPU, mostly idle)
```
phase: Total 1.27s of 11.92s (10.65%)
1.27s (10.65%): pe-exec
sub: calculator 0.62s (5.20%)
exec-worker 0.47s (3.94%)
exec-loop 0.14s (1.17%)
```
Different regimes flip which sub dominates — at catchup workers saturate, at tip commitment is the largest slice. The labels split both cleanly. CPU under
phase=pe-execwith no sub is the apply-loop main goroutine (per-block result handling, ~3.7% in catchup, ~3% at tip).Test plan