Skip to content

common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases#21516

Merged
AskAlexSharov merged 1 commit into
mainfrom
mh/perf-profiling-labels
May 30, 2026
Merged

common/dbg, execution: PERF_PROFILES env knob + pprof labels for parallel exec phases#21516
AskAlexSharov merged 1 commit into
mainfrom
mh/perf-profiling-labels

Conversation

@mh0lt

@mh0lt mh0lt commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in profiling surface for the parallel execution stack. Two pieces, both default-off so this is a no-functional-change PR when the env knob is unset.

1. ERIGON_PERF_PROFILES=true env knob — at package init in common/dbg, enables runtime.SetBlockProfileRate(1) and runtime.SetMutexProfileFraction(1), populating /debug/pprof/{block,mutex} for blocking and contention analysis. Default false matches today's behaviour.

2. pprof goroutine labels on the parallel exec hot path — fires unconditionally (cheap pointer writes to G-local label storage), but only useful when the CPU profiler is on. Labels:

label location
phase=pe-exec parallelExecutor.exec is wrapped in pprof.Do(...), so all child goroutines inherit via context
sub=exec-worker (*Worker).Run per-task workers
sub=exec-loop parallelExecutor.execLoop block scheduler
sub=calculator commitmentCalculator.loop commitment computation

These make it possible to filter /debug/pprof/profile to the parallel-exec phase via the pprof tags axis and separate dispatch from EVM from commitment without code-side wall-clock instrumentation.

Why

Parallel-exec perf work needs to attribute CPU to four buckets — dispatch overhead, EVM execution, IO reads, and in-memory writes/version-map — to know where each optimisation lands. Without phase/sub labels, every pprof read mixes pe-exec CPU with txpool, p2p, GC, snapshot-build, etc. With this PR, one tags query separates them cleanly.

Validation

Pulled two 30s CPU profiles against a mainnet node executing live with ERIGON_PERF_PROFILES=true.

Catchup window (5000-block big-jump, 257% CPU)

```
phase: Total 67.04s of 77.70s (86.28%)
67.04s (86.28%): pe-exec

sub: exec-worker 49.13s (63.23%)
exec-loop 13.53s (17.41%)
calculator 1.47s ( 1.89%)
```

Tip window (NewPayload at slot tip, 39.6% CPU, mostly idle)

```
phase: Total 1.27s of 11.92s (10.65%)
1.27s (10.65%): pe-exec

sub: calculator 0.62s (5.20%)
exec-worker 0.47s (3.94%)
exec-loop 0.14s (1.17%)
```

Different regimes flip which sub dominates — at catchup workers saturate, at tip commitment is the largest slice. The labels split both cleanly. CPU under phase=pe-exec with no sub is the apply-loop main goroutine (per-block result handling, ~3.7% in catchup, ~3% at tip).

Test plan

  • `make erigon` clean
  • `make lint` clean (two passes — linter is non-deterministic)
  • Live on a mainnet node — process stable 6h+, no behaviour change vs main, RSS steady
  • `/debug/pprof/profile` tags axis populated as documented in catchup and tip windows
  • `/debug/pprof/{block,mutex}` populated when env knob is on

…llel exec phases

Adds an opt-in profiling surface for the parallel execution stack, default-off.
ERIGON_PERF_PROFILES=true enables runtime.SetBlockProfileRate(1) and
SetMutexProfileFraction(1) at package init in common/dbg, populating
/debug/pprof/{block,mutex} for blocking and contention analysis.

Wraps parallelExecutor.exec in pprof.Do(phase=pe-exec) so child goroutines
inherit the phase tag via context. Adds sub-labels on the goroutines:
- exec-worker (exec.Worker.Run per-task workers)
- exec-loop (parallelExecutor.execLoop block scheduler)
- calculator (commitmentCalculator.loop commitment computation)

Pure-additive: no behavior change when env unset (default false), and only
cheap label sets at goroutine entry when on.
@AskAlexSharov AskAlexSharov added this pull request to the merge queue May 30, 2026
Merged via the queue into main with commit a192da3 May 30, 2026
90 checks passed
@AskAlexSharov AskAlexSharov deleted the mh/perf-profiling-labels branch May 30, 2026 05:21
mh0lt added a commit that referenced this pull request May 31, 2026
…ctor (L2 on top of #21516)

Lifts the typed-vio refactor (AddressEntry struct with per-AccountPath
typed fields, generic WriteCell[T], typed VersionedRead/Write[T], typed
PerPath ReadSet/WriteSet on IBS, per-T sync.Pool) onto current main on top
of the L1 profiling instrumentation merged as #21516.

Replaces the untyped any-boxed WriteCell/ReadSet shape across:
- execution/state/versionmap.go: VersionMap.s now map[Address]*AddressEntry
  with typed btree.Map[int, *WriteCell[T]] per AccountPath
- execution/state/versionedio.go: typed VersionedRead[T] / VersionedWrite[T]
  primitives + per-T pools; AnyVersionedWrite interface for collections
- execution/state/intra_block_state.go: typed PerPath{Read,Write}Set fields
  on IBS + recordVR/recordVW helpers
- execution/state/state_object.go, journal.go, rw_v3.go, read_paths.go (new):
  consumers updated to typed shape
- execution/stagedsync/exec3_*.go, committer.go, calc_state.go,
  exec3_filter.go: VersionedWrites consumers migrated to AnyVersionedWrite
  iteration; commitment touch path uses VersionedWrites.TouchUpdates over
  the typed shape
- execution/exec/block_assembler.go, txtask.go: PerPathReadSet/WriteSet
  threading
- execution/types/accounts/code.go (new): canonical accounts.Code value
  type used by CodePath WriteCell[T]

Shims used to isolate from the non-vio refactors landed on
mh/perf-all-followups that vio is NOT structurally dependent on:

- execution/cache/state_cache.go: PutCode(hash, code) pass-through (no
  weak-pointer code-cache landing required)
- execution/commitment/vio_shims.go: no-op Record{Sstore*, HasStorageMiss}
  counters (commitment-metrics landing not required)
- common/dbg: BAL{DrivenCommitment, ShadowCompute} stubbed false
  (BAL-driven commitment landing not required)
- execution/balcache: stub package with CachedBlockAccessList returning
  no-cached (BAL cache landing not required)
- db/state/aggregator_vio_shim.go: SetMaxCollationTxNum no-op
- execution/state/prewarm_shim.go: PrewarmBlockStateCacheFromBAL no-op
- execution/state/access_set_shim.go: AccessSet type +
  IBS.AccessedAddresses returning nil (access tracking folded into
  ReadSet under typed vio; api-compat shim for txtask + block_assembler)
- execution/state/vio_test_shim.go: VersionedIO.RecordAccesses no-op
  used only by execution/tests/blockgen chain_makers.go
- types.CodeChange.Hash literal stripped (typed-CodeChange landing not
  required; main's CodeChange has only Index+Bytecode)
- mdgas.MdGas refund flattened back to uint64 across IBS/journal/
  state_object (multi-dim gas landing not required)
- Tracing reasons restored on IBS.SetNonce/SetCode +
  stateObject.SetNonce/SetCode signatures (main has them; chain-end
  dropped them — restored so main's evm/vm/aura/bor call sites work)
- TouchPlainKeyDirect call sites in versionedio.go and calc_state.go
  rewritten to main's untyped (string, *Update) signature
  (typed-commitment-handle landing not required)
- Hash.Bytes() → Hash[:] across stagedsync (no Hash.Bytes method on main)
- td declared *uint256.Int to match main's chainReader.GetTd return type

Foundation for tip-perf alloc reduction in the parallel-exec hot path
(versionedRead[T] CPU + WriteSet.Set/ReadSet.Set allocs). Builds clean
end-to-end; lint and on-tip characterisation follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants