Skip to content

consensus: standardize slow block JSON output for cross-client metrics#11400

Merged
Marchhill merged 60 commits into
masterfrom
feature/execution-metrics-standardization
May 26, 2026
Merged

consensus: standardize slow block JSON output for cross-client metrics#11400
Marchhill merged 60 commits into
masterfrom
feature/execution-metrics-standardization

Conversation

@Marchhill

@Marchhill Marchhill commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Updates #10288

Summary

Implement standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research.

This change is part of the Cross-Client Execution Metrics initiative proposed by Gary Rong and CPerezz.

Motivation

Standardized execution metrics are critical for:

  • Cross-client performance comparison
  • Network health monitoring
  • Data-driven protocol research

Real-world example: The EIP-7907 analysis used execution metrics to measure code read latency, per-call overhead scaling, and block execution breakdown. Without standardized metrics across clients, such analysis cannot be validated cross-client.

References

Configuration

Two new flags in Blocks config section:

Flag Default Description
Blocks.SlowBlockThresholdMs -1 (disabled) Threshold in ms for logging slow blocks. Set to 1000 to log blocks slower than 1s. Set to 0 to log all blocks.
Blocks.SlowBlockPerTxThresholdMs -1 (disabled) Per-transaction threshold in ms. When enabled, individual transactions exceeding this threshold are included in the slow block JSON. Set to 0 to log all transactions.

Both flags default to -1 (disabled). When slow-block logging is disabled, the block-level delta capture, per-transaction timing, and JSON serialization paths are skipped.

The exported execution counters added by this PR are controlled by the compile-time NETHERMIND_NO_EXECUTION_METRICS symbol, not by the runtime slow-block flags. Default builds keep those counters enabled for Prometheus; benchmark builds can define NETHERMIND_NO_EXECUTION_METRICS to let the JIT fold the gated counter/timer paths away.

Example usage

# Log blocks slower than 1s with all transactions
--Blocks.SlowBlockThresholdMs=1000 --Blocks.SlowBlockPerTxThresholdMs=0

# Log blocks slower than 500ms, only transactions slower than 50ms
--Blocks.SlowBlockThresholdMs=500 --Blocks.SlowBlockPerTxThresholdMs=50

# Log every block, no per-tx breakdown
--Blocks.SlowBlockThresholdMs=0

JSON Output Format

{
  "level": "warn",
  "msg": "Slow block",
  "parallel_execution": false,
  "block": {
    "number": 22075123,
    "hash": "0xabc...",
    "gas_used": 29850000,
    "gas_limit": 30000000,
    "tx_count": 142,
    "blob_count": 6
  },
  "timing": {
    "execution_ms": 52.1,
    "evm_ms": 45.2,
    "blooms_ms": 1.3,
    "receipts_root_ms": 3.8,
    "commit_ms": 2.1,
    "storage_merkle_ms": 18.4,
    "state_root_ms": 5.2,
    "state_hash_ms": 23.6,
    "total_ms": 82.0
  },
  "throughput": {
    "mgas_per_sec": 364.1
  },
  "state_reads": {
    "accounts": 322,
    "storage_slots": 3201,
    "code": 35,
    "code_bytes": 482310
  },
  "state_writes": {
    "accounts": 312,
    "accounts_deleted": 0,
    "storage_slots": 1805,
    "storage_slots_deleted": 12,
    "code": 3,
    "code_bytes": 8420,
    "eip7702_delegations_set": 0,
    "eip7702_delegations_cleared": 0
  },
  "cache": {
    "account": { "hits": 1520, "misses": 322, "hit_rate": 82.52 },
    "storage": { "hits": 2800, "misses": 401, "hit_rate": 87.48 },
    "code": { "hits": 180, "misses": 35, "hit_rate": 83.72 }
  },
  "evm": {
    "opcodes": 1542000,
    "sload": 3201,
    "sstore": 1805,
    "calls": 2412,
    "empty_calls": 48,
    "creates": 3,
    "self_destructs": 0,
    "contracts_analyzed": 118,
    "cached_contracts_used": 97
  },
  "transactions": [
    {
      "index": 42,
      "hash": "0xdef...",
      "gas_used": 8200000,
      "execution_ms": 12.3,
      "type": "Legacy",
      "to": "0x7a25..."
    },
    {
      "index": 107,
      "hash": "0x123...",
      "gas_used": 5100000,
      "execution_ms": 8.7,
      "type": "EIP1559",
      "to": "0xdead..."
    }
  ]
}

State read field definitions

The state_reads fields are derived from existing cache counters where possible so the hot path does not record the same event twice:

Field Definition
state_reads.accounts Account trie cache misses, equal to cache.account.misses.
state_reads.storage_slots Storage trie accesses served through the storage trie cache, equal to cache.storage.hits + cache.storage.misses.
state_reads.code Code cache misses, equal to cache.code.misses.
state_reads.code_bytes Total bytes loaded for code cache misses.

EVM field definitions

Field Definition
evm.cached_contracts_used Existing code cache hit counter for the block, equal to cache.code.hits; retained because it is part of the original cross-client metrics spec.

Timing field definitions

Field Definition
total_ms Wall-clock block processing time
execution_ms total - state_hash - commit (backwards-compatible)
evm_ms total - state_hash - commit - blooms - receipts_root (pure EVM)
blooms_ms Bloom filter calculation (parallel)
receipts_root_ms Receipts Patricia trie root computation
commit_ms State commit without root recalculation
storage_merkle_ms Storage trie merkleization (Commit(commitRoots: true))
state_root_ms State root recalculation + commit tree
state_hash_ms storage_merkle + state_root (total merkleization)

The transactions array is only present when SlowBlockPerTxThresholdMs >= 0.

Implementation

  • Account and storage read counts are derived from existing Db.Metrics trie cache/read counters.
  • Code cache misses and code bytes read are tracked internally for slow-block JSON; they are not exported as duplicate Prometheus counters.
  • Write counters and EIP-7702 delegation counters are compile-time gated via ExecutionMetricsFlag.
  • Timing uses Stopwatch.GetTimestamp() only at coarse block-phase boundaries and for optional per-transaction timing.
  • JSON is serialized via Utf8JsonWriter + ArrayBufferWriter<byte> on the ThreadPool, not the block-processing thread.
  • Per-transaction timing uses pooled ArrayPoolList<long> storage and one Stopwatch.GetTimestamp() per transaction when enabled.
  • Delta computation and JSON output are queued to ThreadPool via UnsafeQueueUserWorkItem.

CPerezz and others added 30 commits January 21, 2026 12:00
Implements execution metrics following the cross-client specification:
https://github.com/ethereum/execution-specs/blob/main/docs/execution-metrics-spec.md

- Add slow block JSON logging (threshold: 1000ms)
- Include timing, throughput, and EVM operation counts
- Leverage existing SLOAD/SSTORE/CALL metrics infrastructure
- Output format matches cross-client standardization spec
Implement standardized JSON format for slow block logging to enable
cross-client performance analysis and protocol research.

This change is part of the Cross-Client Execution Metrics initiative
proposed by Gary Rong and CPerezz: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ

The standardized metrics enabled data-driven analysis like the EIP-7907
research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850

JSON format includes:
- block: number, hash, gas_used, tx_count
- timing: execution_ms, total_ms
- throughput: mgas_per_sec
- state_reads: accounts, storage_slots, code, code_bytes
- state_writes: accounts, storage_slots
- cache: account/storage/code hits, misses, hit_rate
- evm: sload, sstore, calls, creates

Also adds ThreadLocal accessors to Db.Metrics for per-block cache
statistics tracking without global synchronization overhead.
Convert timing calculations from long to double to preserve
sub-millisecond precision in slow block JSON output.

Changes:
- ProcessingStats: timing variables changed to double with float division
- Math.Round(..., 3) for consistent 3 decimal place output
- ZeroContentionCounter: add Increment(long) overload for type safety
Add tracking for EIP-7702 delegation set/cleared operations as part of
the cross-client execution metrics standardization effort.

New metrics in Nethermind.Evm.Metrics:
- Eip7702DelegationsSet: Number of EIP-7702 delegations set
- Eip7702DelegationsCleared: Number of EIP-7702 delegations cleared

Both metrics include thread-local variants for use in slow block
logging. The slow block JSON output now includes:
- state_writes.eip7702_delegations_set
- state_writes.eip7702_delegations_cleared

These fields will be 0 for pre-Pectra blocks (per spec).
ContractsAnalysed tracks jump destination analysis, not cache misses.
CodeReads is incremented when code is loaded from DB (cache miss).
Instead of duplicating total_ms, calculate:
execution_ms = total_ms - state_read_ms - state_hash_ms - commit_ms

Falls back to total_ms if timing metrics not fully captured.
- Add IncrementCodeWrites/IncrementCodeBytesWritten in InsertCode
- Add IncrementAccountWrites in SetState
- Add IncrementStorageWrites in Set
Move IncrementAccountReads inside GetState to only count trie/DB reads,
matching storage read semantics. Previously counted all reads including
in-memory cache hits.
…metrics

Pipe timing data to EvmMetrics for state_read_ms, state_hash_ms, and commit_ms:

- WorldStateMetricsDecorator: pipe existing timing to EvmMetrics for
  RecalculateStateRoot, Commit (both overloads), and CommitTree
- StateProvider.GetState: add timing around DB reads (not cache hits)
- PersistentStorageProvider.LoadFromTree: add timing for storage reads

Uses Stopwatch.GetTimestamp()/GetElapsedTime() with TimeSpan.Ticks
(100ns precision) for minimal overhead (~10-20ns per measurement).
…tandardization

Add new metrics for tracking account and storage deletions during EVM execution:
- AccountDeleted: incremented when SetState is called with null account
- StorageDeleted: incremented when SSTORE sets a slot to zero
- Both metrics flow through ProcessingStats to slow block JSON output
Wire up existing but unused EIP-7702 delegation metric increment methods:
- IncrementEip7702DelegationsSet() when setting delegation (non-zero address)
- IncrementEip7702DelegationsCleared() when clearing delegation (zero address)
Add comprehensive integration tests for all cross-client execution metrics:
- EIP-7702 delegation tests (set, clear, multiple)
- Account metrics tests (reads, writes, deleted)
- Storage metrics tests (reads, writes, deleted)
- Code metrics tests (loaded, updated)
Allow customization of the slow block logging threshold via constructor
parameter. Default remains 1000ms for backward compatibility.

- Add slowBlockThresholdMs parameter to ProcessingStats constructor
- Use >= comparison so threshold=0 logs all blocks (useful for testing)
Add InternalsVisibleTo attributes to allow test projects to access
ThreadLocal metric values for verification in slow block tests.
Add comprehensive unit tests for slow block JSON output:
- Verify all 31 spec-required fields are present
- Test configurable threshold (0 = log all blocks)
- Validate field names match snake_case spec format
- Test timing, state read/write, and cache metrics
Dead code — the MetricsSnapshot and MetricsDelta classes were never
referenced anywhere in the codebase.
…cumulator

Replace 16 separate ZeroContentionCounter instances (each with its own
ThreadLocal<BoxedLong>) with a single ThreadLocal<ExecutionMetricsAccumulator>.
This addresses Ben's review feedback about the number of additional thread
locals on hot execution paths.

The consolidated accumulator reduces ThreadLocal overhead by requiring only
one TLS lookup per increment instead of one per counter. Prometheus exposure
is preserved via SumExecutionMetric() which iterates the shared ThreadLocal's
Values collection. Call sites now use direct field increments
(e.g. ThreadExecutionMetrics.AccountReads++) instead of method calls.
…simplify test spec

- Replace anonymous object in LogSlowBlock with named record types for
  structured JSON serialization (SlowBlockLog, SlowBlockInfo, etc.)
- Simplify spec retrieval in MetricsIntegrationTests to use Prague.Instance
  directly instead of resolving via MainnetSpecProvider
…flicts with master

- Replace ExecutionMetricsAccumulator with ZeroContentionCounter pattern
- Remove Stopwatch.GetTimestamp() from per-read hot paths (StateProvider, PersistentStorageProvider)
- Use Utf8JsonWriter for zero-allocation slow block JSON serialization
- Collapse BlockData Start/Current pairs into pre-computed deltas
- Adapt to master's refactored CodeInfoRepository, WrappedWorldState, IProcessingStats
- Condense tests from ~1600 to ~530 lines
…block log

Add per-phase timing (blooms_ms, receipts_root_ms, storage_merkle_ms,
state_root_ms, evm_ms) and extended EVM/block fields (opcodes,
self_destructs, empty_calls, contracts_analyzed, cached_contracts_used,
blob_count, gas_limit) to the slow block JSON output.

execution_ms retains its original definition for backwards compatibility.
All new timing is captured via the same ZeroContentionCounter pattern
used by existing metrics — ~25μs overhead per block.
Wire slow block threshold to IBlocksConfig so it can be set via config
file or CLI. Default: 1000ms (log blocks slower than 1s).
Set to 0 to log all blocks, -1 to disable slow block logging entirely.

When disabled (-1), CaptureStartStats and delta computation are skipped
on the block-processing thread — zero overhead from slow block analysis.
Slow block logging is opt-in. Set Blocks.SlowBlockThresholdMs=1000
to enable logging blocks slower than 1s.
New config flag Blocks.SlowBlockPerTxThresholdMs (default -1 = disabled).
When enabled alongside SlowBlockThresholdMs, each transaction exceeding
the per-tx threshold is included individually in the slow block JSON:

  "transactions": [
    { "index": 42, "hash": "0x...", "gas_used": 8200000,
      "execution_ms": 12.3, "type": "Legacy", "to": "0x..." }
  ]

Uses a thread-static PerTxTimingCollector with one Stopwatch.GetTimestamp
per tx (~20ns). When disabled, zero overhead — the collector is not even
checked in the tx loop.
@claude

claude Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Marchhill's task in 3m 35s —— View job


PR Review: consensus: standardize slow block JSON output for cross-client metrics

Tasks:

  • Post initial comment
  • Read rule files and gather context
  • Read changed files and assess current state
  • Evaluate remaining blockers from Round 5 (LukaszRozmej's items)
  • Check for new issues
  • Finalize review summary

Summary

All five blockers from LukaszRozmej's 2026-05-15 review have been addressed in the latest commit (c8e6afa). No new Medium or High issues found. This PR is ready to merge.


LukaszRozmej Blockers — All Resolved

# Finding Status
1 Move SlowBlockLogEntry out of Nethermind.Core.Test ✅ Fixed — renamed + moved to Nethermind.Consensus.Test/Processing/SlowBlockLogEntry.cs
2 Document parallel-execution caveat for breakdown timings ✅ Fixed — <remarks> added to LogSlowBlock (ProcessingStats.cs:598–609) documenting that EVM/state-I/O counters accumulate across all parallel workers
3 Resolve cached_contracts_used / cache.code.hits duplication ✅ Fixed (Round 5)
4 Remove or wire up ExecutionMetricsFlag.IsActive ✅ Fixed — now a proper #if NETHERMIND_NO_EXECUTION_METRICS gate; in default builds counters fire unconditionally, define the symbol at build time for zero overhead
5 Fix "zero overhead when disabled" claim or gate counter increments ✅ Fixed — compile-time NETHERMIND_NO_EXECUTION_METRICS symbol elides all new Interlocked calls via JIT constant-folding + AggressiveInlining

Also fixed from Round 5 (Low):

  • ProcessingStats ctor default discrepancy — lower-level ctor now defaults to slowBlockThresholdMs = -1 (disabled) with an explanatory <remarks>, consistent with production defaults.

All Prior Findings — Confirmed Fixed

Finding Round Status
Storage cache hit rate bug (StateTreeCache subtracted from wrong start) 1 ✅ Fixed
Per-tx timing all-zeros in parallel mode ([ThreadStatic] → process-wide static) 1 ✅ Fixed
StorageWrites counted TSTORE 1 ✅ Fixed
Interface pollution (default no-op methods on IBlockTransactionsExecutor) 1 ✅ Fixed
State I/O metrics contaminated by prewarmer 3 ✅ Fixed — MainThread* variants used
cache.code.hits numerator/denominator asymmetry 4 ✅ Fixed — MainThreadCodeDbCache
Account/storage trie cache prewarmer contamination 4 ✅ Fixed — DbMetrics.MainThread* variants
PerTxTimingCollector._ticksPerTx plain heap array 4 ✅ Fixed — ArrayPoolList<long>
cached_contracts_usedcache.code.hits 5 ✅ Fixed — removed from JSON

Outstanding Low Items (acknowledged, non-blocking)

  • PerTxTimingCollector statics not volatile_enabled, _ticksPerTx, and _count are plain static fields. Prepare() sets _ticksPerTx on the block-processing thread; parallel workers read it in Record() without a formal barrier. Safe on x86/.NET (TSO hardware + ThreadPool dispatch guarantees ordering). Class doc now explicitly states the single-block assumption.
  • Thread.Sleep(50) polling in ProcessingStatsTests.cs:46–50 — polling up to 5s for each assertion; fragile under load. A ManualResetEventSlim would be deterministic but is a test-quality concern only.
  • Missing parallel path test — no coverage of ProcessTransactionsParallel for the per-tx timing collector; the static-per-block assumption is not tested under concurrent workers.
  • MetricsIntegrationTests uses combined Metrics.AccountReads (not MainThread*) — susceptible to counter contamination from parallel test runs; Is.GreaterThanOrEqualTo softens this but weakens the spec.
  • Test fixture duplicationMetricsIntegrationTests and SlowBlockIntegrationTests share ~30 lines of setup boilerplate; per AGENTS.md this should be factored into a shared helper.

Positive Notes

All structural quality from prior reviews remains fully intact:

  • ExecutionMetricsFlag + MetricsTimer<TSink> pattern: JIT-foldable, clean zero-overhead compile path.
  • ArrayPoolList<long> for per-tx ticks with ownership transfer and pool return in BlockDataPolicy.Return.
  • ArrayBufferWriter<byte> + Utf8JsonWriter zero-allocation JSON path.
  • Object pool with proper reference nulling in Return.
  • Deltas pre-computed on block-processing thread before UnsafeQueueUserWorkItem dispatch.
  • StopTxTimer in finally — timing captured even for failing transactions.
  • Default interface methods for timing hooks — no-op inheritance without stub boilerplate.

| Branch

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces standardized “slow block” JSON logging (intended for cross-client execution metrics) by adding new execution counters (state/code reads+writes, EIP-7702 delegation set/clear) and block-phase timing, plus configuration flags to enable slow-block and per-transaction reporting.

Changes:

  • Add new EVM/state execution counters and block-phase timing sinks used by slow-block JSON logging.
  • Add slow-block JSON emission in ProcessingStats with optional per-transaction timing capture.
  • Add new Blocks config flags and integration tests validating metrics and JSON schema.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Nethermind/Nethermind.State/StateProvider.cs Adds account/code read/write metric increments during state access and code insertion.
src/Nethermind/Nethermind.State/PersistentStorageProvider.cs Adds storage read/write metric increments via storage provider hooks.
src/Nethermind/Nethermind.State/PartialStorageProviderBase.cs Makes Set virtual to allow metric-instrumented overrides.
src/Nethermind/Nethermind.Evm/Metrics.cs Adds new execution counters + compile-time flag gating.
src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.Storage.cs Adds storage-deletion metric increments on SSTORE-to-zero.
src/Nethermind/Nethermind.Evm/CodeInfoRepository.cs Adds code read/byte counters and EIP-7702 delegation set/clear counters.
src/Nethermind/Nethermind.Evm.Test/MetricsIntegrationTests.cs New tests asserting new EVM counters are incremented during tx processing.
src/Nethermind/Nethermind.Db/Metrics.cs Exposes main-thread-only counters for block-level delta computation.
src/Nethermind/Nethermind.Core/Metric/MetricsTimer.cs Adds reusable timer helper for tick-based metric sinks.
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs Adds slow-block JSON logging, config-driven thresholds, and per-tx timing snapshotting.
src/Nethermind/Nethermind.Consensus/Processing/PerTxTimingCollector.cs Adds process-wide collector for per-transaction execution timing.
src/Nethermind/Nethermind.Consensus/Processing/IBlockProcessor.cs Extends executor interface with optional per-tx timing hooks (default no-ops).
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.ParallelBlockValidationTransactionsExecutor.cs Wires per-tx timers into parallel tx execution path.
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.cs Wraps block phases with MetricsTimer sinks for timing breakdown counters.
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.BlockValidationTransactionsExecutor.cs Wires per-tx timers into sequential tx execution path.
src/Nethermind/Nethermind.Consensus.Test/Processing/SlowBlockLogEntry.cs Adds typed model for deserializing slow-block JSON in tests.
src/Nethermind/Nethermind.Consensus.Test/Processing/SlowBlockIntegrationTests.cs Adds integration tests validating metrics flow into slow-block JSON.
src/Nethermind/Nethermind.Consensus.Test/Processing/ProcessingStatsTests.cs Adds schema/threshold/timing consistency tests for slow-block JSON.
src/Nethermind/Nethermind.Config/IBlocksConfig.cs Adds SlowBlockThresholdMs and SlowBlockPerTxThresholdMs config items.
src/Nethermind/Nethermind.Config/BlocksConfig.cs Implements new config properties with default -1 (disabled).
Comments suppressed due to low confidence (6)

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:351

  • Slow-block threshold comparison truncates microseconds to whole milliseconds (processingMicros / 1000). This can miss blocks that are just over the configured threshold due to integer floor. Compare in microseconds (e.g. processingMicros >= thresholdMs * 1000) or use a double ms value consistently.
            // Log slow blocks in JSON format for cross-client performance analysis
            // Only log when slow block threshold is enabled (>= 0)
            if (_slowBlockThresholdMs >= 0)
            {
                long processingMs = data.ProcessingMicroseconds / 1000;
                if (processingMs >= _slowBlockThresholdMs)
                {
                    LogSlowBlock(block, data, mgasPerSec);
                }

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:176

  • PerTxTimingCollector is a static process-wide switch, but CaptureStartStats only ever enables it (SetEnabled(true)) and never disables it when SlowBlockPerTxThresholdMs < 0. If per-tx logging is enabled once (or another ProcessingStats instance enables it), later instances/blocks with per-tx disabled can still record timings and LogSlowBlock will treat a -1 threshold as 'log all' (negative ticks threshold). SetEnabled should be called with the current config value (true/false) each block, or make the collector instance-scoped.
            // Enable per-tx timing on the current block-processing thread.
            // Must be set here (not in Start()) because the async processing loop
            // can resume on a different ThreadPool thread after each await.
            if (_slowBlockPerTxThresholdMs >= 0)
            {
                PerTxTimingCollector.SetEnabled(true);
            }

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:699

  • The slow-block JSON 'evm' object omits cached-contract/cache-usage info even though the PR description schema includes it (e.g. cached_contracts_used) and ProcessingStats already tracks CurrentCachedContractsUsed/StartCachedContractsUsed. If cross-client standardization expects this field, add it to the JSON and the test schema/model accordingly.
                    writer.WriteStartObject("evm");
                    writer.WriteNumber("opcodes", data.CurrentOpCodes - data.StartOpCodes);
                    writer.WriteNumber("sload", data.CurrentSLoadOps - data.StartSLoadOps);
                    writer.WriteNumber("sstore", data.CurrentSStoreOps - data.StartSStoreOps);
                    writer.WriteNumber("calls", data.CurrentCallOps - data.StartCallOps);
                    writer.WriteNumber("empty_calls", data.CurrentEmptyCalls - data.StartEmptyCalls);
                    writer.WriteNumber("creates", data.CurrentCreatesOps - data.StartCreateOps);
                    writer.WriteNumber("self_destructs", data.CurrentSelfDestructOps - data.StartSelfDestructOps);
                    writer.WriteNumber("contracts_analyzed", data.CurrentContractsAnalyzed - data.StartContractsAnalyzed);
                    writer.WriteEndObject();

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:739

  • LogSlowBlock swallows exceptions and logs only ex.Message at debug level, losing the stack trace and exception details. Log the exception object (or include stack trace) to make failures diagnosable, especially since slow-block logging runs on the ThreadPool and issues may be intermittent.
            catch (Exception ex)
            {
                if (_logger.IsDebug) _logger.Debug($"Error logging slow block: {ex.Message}");
            }

src/Nethermind/Nethermind.State/StateProvider.cs:733

  • AccountReads is only incremented on cache misses (when the account isn't in _blockChanges). On cache hits you increment StateTreeCacheHits but not AccountReads, so slow-block JSON 'state_reads.accounts' ends up representing only misses and will duplicate cache.account.misses rather than total logical reads. If the intent is total reads (as implied by having separate cache hit/miss counts), increment AccountReads on both hit and miss paths (or rename the JSON field to clarify it's DB/state-trie reads only).
        ref ChangeTrace accountChanges = ref CollectionsMarshal.GetValueRefOrAddDefault(_blockChanges, addressAsKey, out bool exists);
        if (!exists)
        {
            Metrics.IncrementStateTreeReads();
            EvmMetrics.IncrementAccountReads();
            Account? account = _tree.Get(address);

            accountChanges = new(account, account);
        }
        else
        {
            Metrics.IncrementStateTreeCacheHits();
        }

src/Nethermind/Nethermind.Evm/Metrics.cs:171

  • These new execution metrics use shared Interlocked increments. Under ParallelExecution, worker threads inherit IsBlockProcessingThread=true (see ProcessingStats remarks), so multiple workers will contend on the same '_main*' counters. If the goal is minimal overhead for cross-client metrics, consider per-thread counters (e.g., ThreadLocal/ThreadStatic/striped counters) and aggregating at snapshot time to reduce contention.
    [CounterMetric]
    [Description("Number of account reads during execution.")]
    public static long AccountReads => _mainAccountReads + _otherAccountReads;
    private static long _mainAccountReads;
    private static long _otherAccountReads;
    // Exposed for ProcessingStats so block-level deltas exclude background prewarmer activity.
    internal static long MainThreadAccountReads => _mainAccountReads;
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    internal static void IncrementAccountReads()
    {
        if (!ExecutionMetricsFlag.IsActive) return;
        Interlocked.Increment(ref IsBlockProcessingThread ? ref _mainAccountReads : ref _otherAccountReads);
    }

    [CounterMetric]
    [Description("Number of storage slot reads during execution.")]
    public static long StorageReads => _mainStorageReads + _otherStorageReads;
    private static long _mainStorageReads;
    private static long _otherStorageReads;
    internal static long MainThreadStorageReads => _mainStorageReads;
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    internal static void IncrementStorageReads()
    {
        if (!ExecutionMetricsFlag.IsActive) return;
        Interlocked.Increment(ref IsBlockProcessingThread ? ref _mainStorageReads : ref _otherStorageReads);
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Comment thread src/Nethermind/Nethermind.State/PersistentStorageProvider.cs Outdated
Comment thread src/Nethermind/Nethermind.Evm/Metrics.cs
Comment thread src/Nethermind/Nethermind.Evm/CodeInfoRepository.cs
@LukaszRozmej

Copy link
Copy Markdown
Member

Review — focused on correctness and performance

Verdict

Feature is well-designed and test coverage is excellent. However, the PR description's claim that "there is zero overhead on block processing when disabled" is not quite accurate. The new EVM-side metric counters fire unconditionally in default builds (gated only by compile-time ExecutionMetricsFlag, not by SlowBlockThresholdMs). On hot paths (SSTORE, SLOAD, account reads/writes), this adds one atomic Interlocked op per state operation, always on, even when the feature is configured to -1.


Performance concerns

1. Hot-path atomics added regardless of SlowBlockThresholdMs (HIGH)

ExecutionMetricsFlag.IsActive is a compile-time-folded constant, true by default. Only a recompile with NETHERMIND_NO_EXECUTION_METRICS disables it. So in production builds, even with SlowBlockThresholdMs = -1:

  • PersistentStorageProvider.SetIncrementStorageWrites() on every SSTORE that mutates
  • EvmInstructions.Storage.cs (both InstructionSStoreMetered and InstructionSStoreUnmetered) → IncrementStorageDeleted() on every SSTORE-to-zero
  • PersistentStorageProvider.LoadFromTreeIncrementStorageReads() on every uncached SLOAD
  • StateProvider.GetState (via WarmUp) → IncrementAccountReads() on every uncached account touch
  • StateProvider.SetStateIncrementAccountWrites() (+ IncrementAccountDeleted() when null) on every state mutation
  • CodeInfoRepository.GetCodeInfoIncrementCodeReads() + IncrementCodeBytesRead() on every code DB fetch

Each is Interlocked.Increment(ref IsBlockProcessingThread ? ref _main... : ref _other...). On x64 that's ~10–15 ns per call (uncontested atomic) plus a thread-static read. For the example block in the PR description (~1800 SSTOREs, ~3200 SLOADs, ~2000 account changes), that's ~70–100 µs/block of overhead even when the feature is "disabled" by config. The PR description quotes ~31 µs/block for slow-block capture; the real per-block tax in default builds is larger and unavoidable without recompiling.

Suggestion: Gate Increment* calls behind a runtime flag (e.g. _blocksConfig.SlowBlockThresholdMs >= 0 exposed as a volatile static), or set ExecutionMetricsFlag.IsActive from config at startup. If you keep the compile-time flag, please correct the PR description.

2. Per-tx try/finally is added unconditionally

BlockProcessor.BlockValidationTransactionsExecutor.ProcessTransaction now wraps transactionProcessor.ProcessTransaction(...) in try/finally. Even with _enableTxTimingMetrics = false, the JIT still emits the exception-filter table. Tiny per-tx cost but always on.

In the parallel executor, inner.StartTxTimer() / inner.StopTxTimer() are interface calls through IBlockProcessor.IBlockTransactionsExecutor (default interface methods). The JIT likely cannot devirtualize since inner is interface-typed at the call site — 2 vcalls per tx in the parallel hot path.

3. LogSlowBlock recomputes blob count

Transaction.GetBlobCount() is summed both in UpdateStats (existing) and again in LogSlowBlock (new). Cheap, but redundant — store the blob count on BlockData and reuse.


Correctness concerns

1. PerTxTimingCollector is process-wide static

_ticksPerTx is a static field, written on the block-processing thread (Prepare), read/written on parallel ThreadPool workers (Record). It is not volatile. Workers see writes only because ParallelUnbalancedWork.For provides the implicit synchronisation barrier. This works today but is fragile — please document the dependency explicitly. Same applies to BlockValidationTransactionsExecutor._enableTxTimingMetrics.

2. PerTxTimingCollector.SetEnabled(true) is sticky

CaptureStartStats calls SetEnabled(true) per block when _slowBlockPerTxThresholdMs >= 0, but there is no corresponding SetEnabled(false). If the config is hot-reloaded or in a multi-ProcessingStats scenario, the static stays true. The Prepare/Snapshot flow tolerates this, but the API is asymmetric.

3. Stopwatch ticks vs TimeSpan ticks

MetricsTimer<T>.Dispose uses Stopwatch.GetElapsedTime(_start).Ticks which returns TimeSpan ticks (100 ns) — dividing by TimeSpan.TicksPerMillisecond later is correct. Worth a doc comment on IncrementStateHashTime(long ticks) clarifying the unit is TimeSpan ticks, not Stopwatch ticks, to avoid future regressions.

4. Negative-time fallback hides a real bug

double executionMs = totalMs - stateHashMs - commitMs;
if (executionMs < 0) executionMs = totalMs;

If this branch is ever taken, the log entry silently lies (claims execution ≥ total). At minimum, debug-log when clamping so the bug surfaces.

5. BlockData pool — new Delta* fields not reset on Return

BlockDataPolicy.Return resets PerTxTicks and the existing fields but not the new DeltaAccountReads, DeltaStorageReads, etc. Safe only because UpdateStats either writes all of them (threshold ≥ 0) or never reads them (threshold < 0). If the threshold ever changes at runtime, stale deltas leak. Belt-and-braces fix: reset them too.

6. Cross-client metric semantics under parallel execution

The LogSlowBlock <remarks> correctly notes that under ParallelExecution the EVM and state-read/write counters sum across workers while wall-clock timings remain per-block. Good documentation, but this is a real caveat for "cross-client" comparability — Geth/Reth don't run txs in parallel by default. Consumers will see inflated counter values relative to wall-clock ms. Worth surfacing in the JSON itself (e.g. parallel_execution: true) so analysers can normalise.


Minor

  • private bool _enableTxTimingMetrics = false;= false is redundant.
  • Double blank line after StopTxTimer.
  • IBlockProcessor.IBlockTransactionsExecutor gains 3 new default interface methods — public-API change and a small ABI risk for plugins implementing the interface.
  • MetricsTimer<T> as a parameterless readonly ref struct is nice — stack-only, inlinable, scope-based timing.

Suggested fixes before merge

  1. Either gate hot-path Increment* by a runtime flag tied to SlowBlockThresholdMs >= 0, or correct the "zero overhead when disabled" claim in the PR body and docs.
  2. Document the synchronisation dependency of PerTxTimingCollector / _enableTxTimingMetrics on ParallelUnbalancedWork.For.
  3. Reset the new Delta* fields in BlockDataPolicy.Return.
  4. Add a parallel_execution boolean to the JSON so cross-client analyses can normalise counter values.
  5. Tiny: store blob count in BlockData; remove = false init; debug-log the executionMs < 0 fallback.

- Document ExecutionMetricsFlag's compile-time gate and unconditional default-build cost
- Cache BlobCount on BlockData; reuse in LogSlowBlock instead of re-summing transactions
- Reset all Delta* fields in BlockDataPolicy.Return
- Emit parallel_execution flag in the slow-block JSON (wired via BlocksConfig)
- Debug-log when executionMs/evmMs clamp negative (surfaces inconsistent sub-phase deltas)
- Document that the timing increments take TimeSpan ticks, not Stopwatch ticks
- Document PerTxTimingCollector / _enableTxTimingMetrics reliance on the parallel join barrier
- Drop redundant = false initializer and double blank line in BlockValidationTransactionsExecutor

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Marchhill

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Marchhill's task in 5m 50s —— View job


PR Review: consensus: standardize slow block JSON output for cross-client metrics

Tasks:

  • Post initial comment
  • Read rule files and gather context
  • Read changed files and assess current state
  • Evaluate LukaszRozmej's blocker items (from 2026-05-20 review)
  • Check for new issues
  • Finalize review summary

Summary

This is round 7. LukaszRozmej posted a review at 2026-05-20 14:15 UTC with 5 "Suggested fixes before merge." Marchhill committed two fixes today (56776b6 at 11:46 and ebd66dd at 16:30, the latter post-review) that address all 5 blockers. No new Critical or High issues found.


LukaszRozmej's Blockers (from 2026-05-20) — All Resolved

# Finding Status
1 Gate hot-path Increment* or correct "zero overhead" claim ✅ Fixed — ExecutionMetricsFlag doc comment now explicitly states "regardless of SlowBlockThresholdMs" and documents the ~10–15 ns/call cost. PR number references also removed from code comments (56776b6).
2 Document PerTxTimingCollector / _enableTxTimingMetrics synchronisation dependency ✅ Fixed — PerTxTimingCollector.cs remarks now documents the ParallelUnbalancedWork.For join-barrier contract; BlockValidationTransactionsExecutor._enableTxTimingMetrics field also has an inline comment cross-referencing the contract (ebd66dd).
3 Reset new Delta* fields in BlockDataPolicy.Return ✅ Fixed — All 18 Delta* fields are reset in BlockDataPolicy.Return (ProcessingStats.cs:817–841).
4 Add parallel_execution boolean to slow-block JSON ✅ Fixed — writer.WriteBoolean("parallel_execution", _parallelExecution) at ProcessingStats.cs:661; wired from BlocksConfig.ParallelExecution; SlowBlockLogEntry updated accordingly.
5 Tiny: store blob count in BlockData; remove = false init; debug-log negative clamp ✅ Fixed — data.BlobCount reused in LogSlowBlock; = false removed from _enableTxTimingMetrics; double blank line removed; both executionMs < 0 and evmMs < 0 clamping paths now debug-log with diagnostic context.

Additional items addressed this round

  • _slowBlockThresholdMs doc comment now says "Defaults to -1 (disabled)" instead of the previously inaccurate "default: 1000ms" (56776b6).
  • TimeSpan-ticks unit clarified with block comment in Metrics.cs (lines 310–313) explaining that these are 100 ns TimeSpan ticks, not raw Stopwatch timestamp ticks.
  • CodeReads and StorageReads semantics documented: both count only DB-backed reads (cache misses), and the inline comments now say so explicitly (56776b6).

All Prior Findings — Confirmed Fixed

Finding Round Status
Storage cache hit rate bug (StateTreeCache wrong source) 1 ✅ Fixed
Per-tx timing all-zeros in parallel mode 1 ✅ Fixed
StorageWrites counted TSTORE 1 ✅ Fixed
Interface pollution (default no-op methods) 1 ✅ Fixed
State I/O prewarmer contamination 3 ✅ Fixed
cache.code.hits numerator/denominator asymmetry 4 ✅ Fixed
Account/storage trie cache prewarmer contamination 4 ✅ Fixed
PerTxTimingCollector._ticksPerTx plain heap array 4 ✅ Fixed
cached_contracts_usedcache.code.hits 5 ✅ Fixed
SlowBlockLogEntry in wrong test assembly 5 ✅ Fixed
ProcessingStats ctor default discrepancy 5 ✅ Fixed
Parallel caveat not documented 5/6 ✅ Fixed

Remaining Low Items (non-blocking, acknowledged)

  • Test fixture duplicationMetricsIntegrationTests and SlowBlockIntegrationTests share ~30 lines of fixture setup (per AGENTS.md, this should be factored into a helper). Low impact since these are test files only.
  • Thread.Sleep(50) polling in tests — Fragile under load; a ManualResetEventSlim would be deterministic. Non-blocking test quality concern.
  • Missing parallel path test — No test covers ProcessTransactionsParallel for per-tx timing; the static-collector design and join-barrier contract are not tested under actual concurrent workers.
  • Exception logging drops stack traceLogSlowBlock catch block logs only ex.Message at Debug level (ProcessingStats.cs:760). Acceptable as a "don't fail block processing" guard, but a regressions from a serialization bug would be harder to diagnose.

Positive Notes (all still valid)

  • ExecutionMetricsFlag + MetricsTimer<TSink> pattern: JIT-foldable; overhead truly zero when built with NETHERMIND_NO_EXECUTION_METRICS.
  • ArrayPoolList<long> for per-tx ticks with ownership transfer and pool return in BlockDataPolicy.Return.
  • ArrayBufferWriter<byte> + Utf8JsonWriter zero-allocation JSON path.
  • Object pool with all field resets in Return (including all new Delta* fields).
  • Deltas pre-computed on block-processing thread before UnsafeQueueUserWorkItem dispatch.
  • StopTxTimer in finally: timing captured even for failing transactions.
  • Default interface methods for timing hooks: no stub boilerplate on implementors.

| Branch

Marchhill and others added 3 commits May 21, 2026 13:35
…eptions

- Replace SpinWait.SpinUntil polling in ProcessingStatsTests / SlowBlockIntegrationTests
  with a Monitor.Wait/PulseAll-based WaitableTestLogger; the test thread unblocks the
  instant the ThreadPool worker emits the slow-block JSON, no fixed sleeps under load.
- LogSlowBlock catch now logs at Error with the full Exception (stack trace included)
  instead of dropping it via ex.Message at Debug, matching the existing
  ExecuteFromThreadPool pattern. Block processing is still protected from log failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@benaadams

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Claude finished @benaadams's task in 6m 3s —— View job


PR Re-Review (Round 8): consensus: standardize slow block JSON output for cross-client metrics

Tasks:

  • Post initial comment
  • Read rule files and gather context
  • Read changed files and assess current state
  • Verify all prior findings are addressed
  • Analyze a03e6d1 (benaadams' new commit: "Remove duplicate execution read metrics")
  • Finalize review summary

Summary

All High/Medium findings from Rounds 1–7 remain confirmed fixed. The latest commit (a03e6d1b) by benaadams removes dedicated AccountReads, StorageReads, and CodeReads counter increments from Evm.Metrics and derives state_reads.* from existing cache statistics instead — a good performance improvement. This commit introduces one new Medium concern (PR description documentation now out of sync with implementation) and two Low items.


New Findings from a03e6d1b

Medium — PR description example is stale after read-counter removal

a03e6d1b made state_reads.* entirely derived from cache.* values:

// ProcessingStats.cs — post-commit
blockData.DeltaAccountReads = blockData.DeltaAccountCacheMisses;          // always == cache.account.misses
blockData.DeltaStorageReads = blockData.DeltaStorageCacheHits + blockData.DeltaStorageCacheMisses;
blockData.DeltaCodeReads    = blockData.DeltaCodeCacheMisses;              // always == cache.code.misses

The PR description example still shows:

"state_reads": { "accounts": 1842, ... },
"cache": { "account": { "hits": 1520, "misses": 322, ... } }

With the current implementation state_reads.accounts will always equal cache.account.misses (322 in that example, not 1842). The example implies state_reads.accounts = hits + misses = total logical reads, but the implementation outputs only DB-backed reads (misses). Since this PR's description functions as the cross-client specification, the example should be updated to reflect actual semantics — or state_reads.accounts / state_reads.code should be removed (they're now redundant with cache.account.misses / cache.code.misses).

Low — Asymmetric semantics across state_reads.* sub-fields

After the change:

  • state_reads.accounts = account trie DB cache misses (reads that go to the trie/DB)
  • state_reads.code = code DB cache misses
  • state_reads.storage_slots = all storage trie node accesses (hits + misses)

The three fields use different counting methods. The PR description doesn't document this asymmetry and consumers will likely expect consistent semantics across all three.

Low — MetricsIntegrationTests uses the combined counter after read-counter removal

The test was updated in a03e6d1b to use DbMetrics.StateTreeReads (= _main + _other) instead of the now-removed Metrics.MainThreadAccountReads:

// MetricsIntegrationTests.cs:80
long startReads = DbMetrics.StateTreeReads;  // combined, not MainThread*
...
Assert.That(DbMetrics.StateTreeReads - startReads, Is.GreaterThanOrEqualTo(2));

The test sets ProcessingThread.IsBlockProcessingThread = true, so the increments land in _main*. But another parallel test's prewarmer/non-main-thread activity could increment _other*, making the combined delta spuriously large. DbMetrics.MainThreadStateTreeReads would give a tighter, isolated delta.


Prior Findings — All Confirmed Fixed

Finding Round Status
Storage cache hit rate bug 1 ✅ Fixed
Per-tx timing all-zeros in parallel mode 1 ✅ Fixed
StorageWrites counted TSTORE 1 ✅ Fixed
Interface pollution (default no-op methods) 1 ✅ Fixed
State I/O prewarmer contamination 3 ✅ Fixed
cache.code.hits numerator/denominator asymmetry 4 ✅ Fixed
Account/storage trie cache prewarmer contamination 4 ✅ Fixed
PerTxTimingCollector._ticksPerTx plain heap array 4 ✅ Fixed
cached_contracts_usedcache.code.hits 5 ✅ Fixed
SlowBlockLogEntry in wrong test assembly 5 ✅ Fixed
ProcessingStats ctor default discrepancy 5 ✅ Fixed
Parallel caveat not documented 5–6 ✅ Fixed
All five LukaszRozmej blockers 5–7 ✅ Fixed
Unconditional IsActive => true dead toggle 6–7 ✅ Fixed — #if NETHERMIND_NO_EXECUTION_METRICS
Delta* fields not reset in BlockDataPolicy.Return 7 ✅ Fixed — all 18 fields reset
parallel_execution not in JSON 7 ✅ Fixed
Negative-time fallback silently lies 7 ✅ Fixed — now debug-logs clamp
PR description "zero overhead" claim 7 ✅ Fixed — ExecutionMetricsFlag docs now explicit
Duplicate AccountReads/StorageReads Interlocked overhead implicit ✅ Removed in a03e6d1b

Acknowledged Low (non-blocking, unchanged)

  • PerTxTimingCollector statics not volatile — class <remarks> explicitly documents the join-barrier dependency on ParallelUnbalancedWork.For; safe on x86/.NET.
  • Missing parallel path test — no coverage of ProcessTransactionsParallel for per-tx timing; single-block assumption is undocumented under actual concurrent workers.
  • Test fixture duplicationMetricsIntegrationTests / SlowBlockIntegrationTests share ~30 lines of setup boilerplate; AGENTS.md recommends factoring this out.

Positive Notes

Everything praised in prior rounds still holds. The a03e6d1b commit is a net improvement: removing three hot-path Interlocked.Increment calls per state access (two for account reads, one for storage reads) reduces per-block overhead with no loss of accuracy, since the DB-level Db.Metrics trie counters already track this information. WaitableTestLogger with Monitor.Wait/PulseAll (replacing the old Thread.Sleep(50) polling) is a clean deterministic test fix.


| Branch

@benaadams

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Claude finished @benaadams's task in 6m 9s —— View job


PR Re-Review (Round 9): consensus: standardize slow block JSON output for cross-client metrics

Tasks:

  • Post initial comment
  • Read rule files and gather context
  • Read changed files and assess current state vs Round 8
  • Evaluate new commits (68d6f79, 43a0b97, 2c66f53)
  • Check for new issues
  • Finalize review summary

Summary

Three new commits since Round 8. All prior High/Medium findings remain fixed. The new commits are improvements; no new High or Medium issues found. This PR is ready to merge.


New Commits — Analysis

2c66f538 — Use main-thread DB metrics in EVM metric tests ✅

Fixes the Round 8 Low finding. MetricsIntegrationTests now uses DbMetrics.MainThreadStateTreeReads and DbMetrics.MainThreadStorageTreeReads + MainThreadStorageTreeCache instead of combined counters, eliminating susceptibility to parallel-test-run contamination. Also adds InternalsVisibleTo("Nethermind.Evm.Test") to Nethermind.Db.

43a0b97c — Restore cached contracts slow block metric ✅

Restores evm.cached_contracts_used in the slow-block JSON. Both evm.cached_contracts_used and cache.code.hits still use the MainThreadCodeDbCache delta, so they remain equal at runtime. The PR description's "EVM field definitions" table explicitly documents this equality ("retained because it is part of the original cross-client metrics spec"), which is the right place for that note.

Minor: The JSON example in the PR body shows cached_contracts_used: 97 and cache.code.hits: 180 — these should be equal but differ in the example. The text in the table explains they're always equal, so consumers who read that table will understand, but the inconsistent example could confuse cross-client implementors who parse the example first.

68d6f79e — Use thread-static block processing attribution ✅

Key improvement. Switches ProcessingThread.IsBlockProcessingThread from AsyncLocal<bool> to [ThreadStatic], with a new test PreWarmCaches_DoesNotFlowIsBlockProcessingThread_IntoTask that validates the critical invariant.

Correctness: With AsyncLocal, tasks spawned by the prewarmer (via BlockCachePreWarmer.PreWarmCaches) inherited IsBlockProcessingThread = true from the calling thread's ExecutionContext. That caused prewarmer EVM metric increments to land in _main* counters — the root cause of the prewarmer contamination that previous rounds worked around with MainThread* accessor additions. With [ThreadStatic], the flag is not inherited by new tasks or threads: prewarmer tasks correctly see IsBlockProcessingThread = false and route their increments to _other*. The new test pins this behavior.

Performance: [ThreadStatic] reads are ~1–2 ns vs ~5–20 ns for AsyncLocal, benefiting all existing and new Increment* calls across every hot-path state operation.

BlockchainProcessor safety: The loop correctly saves/restores the flag in each iteration (bool previousMainThread = IsBlockProcessingThread; ... finally { IsBlockProcessingThread = previousMainThread; }). After await _blockQueue.Reader.WaitToReadAsync, the continuation may run on a fresh thread with IsBlockProcessingThread = false; the code immediately sets it to IsMainProcessor before any processing — no stale state window.


Round 8 Findings — Status

Finding Status
PR description example stale after a03e6d1b ✅ Non-issue — PR description already shows state_reads.accounts: 322 = cache.account.misses: 322; numbers are self-consistent
Asymmetric state_reads.* semantics (accounts/code = misses, storage = hits+misses) ✅ Documented in "State read field definitions" table; intentional by design
MetricsIntegrationTests uses combined counter ✅ Fixed in 2c66f538

All Prior Findings — All Confirmed Fixed

Finding Round Status
Storage cache hit rate bug 1
Per-tx timing all-zeros in parallel mode 1
StorageWrites counted TSTORE 1
Interface pollution (default no-op methods) 1
State I/O prewarmer contamination 3
cache.code.hits numerator/denominator asymmetry 4
Account/storage trie cache prewarmer contamination 4
PerTxTimingCollector._ticksPerTx plain heap array 4
cached_contracts_usedcache.code.hits (removed) 5 ✅ (restored with doc in 9)
SlowBlockLogEntry in wrong test assembly 5
ProcessingStats ctor default discrepancy 5
Parallel caveat not documented 5–6
All five LukaszRozmej Round 7 blockers 7
ExecutionMetricsFlag.IsActive dead toggle 7
Delta* fields not reset in BlockDataPolicy.Return 7
Negative-time clamp silently lies 7
Prewarmer contamination at source 8→9 ✅ Fixed in 68d6f79e
MetricsIntegrationTests combined counter 8 ✅ Fixed in 2c66f538

Remaining Low Items (non-blocking, acknowledged)

  • evm.cached_contracts_used example inconsistency — the JSON example in the PR body shows cached_contracts_used: 97 and cache.code.hits: 180, but at runtime they are always equal (same MainThreadCodeDbCache delta). The "EVM field definitions" table documents the equivalence; consider updating the example to show equal values.
  • PerTxTimingCollector statics not volatile — documented in class <remarks> as relying on the ParallelUnbalancedWork.For join-barrier; safe on x86/.NET.
  • Missing parallel path test — no test covers ProcessTransactionsParallel for per-tx timing under actual concurrent workers.
  • Test fixture duplicationMetricsIntegrationTests / SlowBlockIntegrationTests share ~30 lines of setup boilerplate.

Positive Notes

All structural quality from prior reviews still holds, and the [ThreadStatic] change in 68d6f79e makes the whole metrics attribution model materially cleaner: the MainThread* accessors now truly reflect main-thread-only work at the source, not just at the query boundary.
| Branch

- Factor the duplicated SpecProvider / WorldState / VM / TxProcessor / Ecdsa setup
  out of MetricsIntegrationTests + SlowBlockIntegrationTests into an IDisposable
  EvmTestHarness in Nethermind.Evm.Test/Helpers/. Add Nethermind.Evm.Test as a
  ProjectReference of Nethermind.Consensus.Test so the harness is visible to both
  fixtures.
- Add PerTxTimingCollectorTests covering ProcessTransactionsParallel's contract on
  the static collector: 512 concurrent ParallelUnbalancedWork.For workers writing
  unique values via Record, plus zero-tx prepare, disabled-collector returns null,
  ownership-transfer cleanliness across blocks, and out-of-range Record being
  silently dropped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Marchhill Marchhill merged commit 06761a4 into master May 26, 2026
562 of 1080 checks passed
@Marchhill Marchhill deleted the feature/execution-metrics-standardization branch May 26, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants