consensus: standardize slow block JSON output for cross-client metrics by Marchhill · Pull Request #11400 · NethermindEth/nethermind

Marchhill · 2026-04-29T09:28:20Z

Summary

Implement standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research.

This change is part of the Cross-Client Execution Metrics initiative proposed by Gary Rong and CPerezz.

Motivation

Standardized execution metrics are critical for:

Cross-client performance comparison
Network health monitoring
Data-driven protocol research

Real-world example: The EIP-7907 analysis used execution metrics to measure code read latency, per-call overhead scaling, and block execution breakdown. Without standardized metrics across clients, such analysis cannot be validated cross-client.

References

Original proposal: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ (Gary Rong)
EIP-7907 analysis: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850

Configuration

Two new flags in Blocks config section:

Flag	Default	Description
`Blocks.SlowBlockThresholdMs`	`-1` (disabled)	Threshold in ms for logging slow blocks. Set to `1000` to log blocks slower than 1s. Set to `0` to log all blocks.
`Blocks.SlowBlockPerTxThresholdMs`	`-1` (disabled)	Per-transaction threshold in ms. When enabled, individual transactions exceeding this threshold are included in the slow block JSON. Set to `0` to log all transactions.

Both flags default to -1 (disabled). When slow-block logging is disabled, the block-level delta capture, per-transaction timing, and JSON serialization paths are skipped.

The exported execution counters added by this PR are controlled by the compile-time NETHERMIND_NO_EXECUTION_METRICS symbol, not by the runtime slow-block flags. Default builds keep those counters enabled for Prometheus; benchmark builds can define NETHERMIND_NO_EXECUTION_METRICS to let the JIT fold the gated counter/timer paths away.

Example usage

# Log blocks slower than 1s with all transactions
--Blocks.SlowBlockThresholdMs=1000 --Blocks.SlowBlockPerTxThresholdMs=0

# Log blocks slower than 500ms, only transactions slower than 50ms
--Blocks.SlowBlockThresholdMs=500 --Blocks.SlowBlockPerTxThresholdMs=50

# Log every block, no per-tx breakdown
--Blocks.SlowBlockThresholdMs=0

JSON Output Format

{
  "level": "warn",
  "msg": "Slow block",
  "parallel_execution": false,
  "block": {
    "number": 22075123,
    "hash": "0xabc...",
    "gas_used": 29850000,
    "gas_limit": 30000000,
    "tx_count": 142,
    "blob_count": 6
  },
  "timing": {
    "execution_ms": 52.1,
    "evm_ms": 45.2,
    "blooms_ms": 1.3,
    "receipts_root_ms": 3.8,
    "commit_ms": 2.1,
    "storage_merkle_ms": 18.4,
    "state_root_ms": 5.2,
    "state_hash_ms": 23.6,
    "total_ms": 82.0
  },
  "throughput": {
    "mgas_per_sec": 364.1
  },
  "state_reads": {
    "accounts": 322,
    "storage_slots": 3201,
    "code": 35,
    "code_bytes": 482310
  },
  "state_writes": {
    "accounts": 312,
    "accounts_deleted": 0,
    "storage_slots": 1805,
    "storage_slots_deleted": 12,
    "code": 3,
    "code_bytes": 8420,
    "eip7702_delegations_set": 0,
    "eip7702_delegations_cleared": 0
  },
  "cache": {
    "account": { "hits": 1520, "misses": 322, "hit_rate": 82.52 },
    "storage": { "hits": 2800, "misses": 401, "hit_rate": 87.48 },
    "code": { "hits": 180, "misses": 35, "hit_rate": 83.72 }
  },
  "evm": {
    "opcodes": 1542000,
    "sload": 3201,
    "sstore": 1805,
    "calls": 2412,
    "empty_calls": 48,
    "creates": 3,
    "self_destructs": 0,
    "contracts_analyzed": 118,
    "cached_contracts_used": 97
  },
  "transactions": [
    {
      "index": 42,
      "hash": "0xdef...",
      "gas_used": 8200000,
      "execution_ms": 12.3,
      "type": "Legacy",
      "to": "0x7a25..."
    },
    {
      "index": 107,
      "hash": "0x123...",
      "gas_used": 5100000,
      "execution_ms": 8.7,
      "type": "EIP1559",
      "to": "0xdead..."
    }
  ]
}

State read field definitions

The state_reads fields are derived from existing cache counters where possible so the hot path does not record the same event twice:

Field	Definition
`state_reads.accounts`	Account trie cache misses, equal to `cache.account.misses`.
`state_reads.storage_slots`	Storage trie accesses served through the storage trie cache, equal to `cache.storage.hits + cache.storage.misses`.
`state_reads.code`	Code cache misses, equal to `cache.code.misses`.
`state_reads.code_bytes`	Total bytes loaded for code cache misses.

EVM field definitions

Field	Definition
`evm.cached_contracts_used`	Existing code cache hit counter for the block, equal to `cache.code.hits`; retained because it is part of the original cross-client metrics spec.

Timing field definitions

Field	Definition
`total_ms`	Wall-clock block processing time
`execution_ms`	`total - state_hash - commit` (backwards-compatible)
`evm_ms`	`total - state_hash - commit - blooms - receipts_root` (pure EVM)
`blooms_ms`	Bloom filter calculation (parallel)
`receipts_root_ms`	Receipts Patricia trie root computation
`commit_ms`	State commit without root recalculation
`storage_merkle_ms`	Storage trie merkleization (`Commit(commitRoots: true)`)
`state_root_ms`	State root recalculation + commit tree
`state_hash_ms`	`storage_merkle + state_root` (total merkleization)

The transactions array is only present when SlowBlockPerTxThresholdMs >= 0.

Implementation

Account and storage read counts are derived from existing Db.Metrics trie cache/read counters.
Code cache misses and code bytes read are tracked internally for slow-block JSON; they are not exported as duplicate Prometheus counters.
Write counters and EIP-7702 delegation counters are compile-time gated via ExecutionMetricsFlag.
Timing uses Stopwatch.GetTimestamp() only at coarse block-phase boundaries and for optional per-transaction timing.
JSON is serialized via Utf8JsonWriter + ArrayBufferWriter<byte> on the ThreadPool, not the block-processing thread.
Per-transaction timing uses pooled ArrayPoolList<long> storage and one Stopwatch.GetTimestamp() per transaction when enabled.
Delta computation and JSON output are queued to ThreadPool via UnsafeQueueUserWorkItem.

Implements execution metrics following the cross-client specification: https://github.com/ethereum/execution-specs/blob/main/docs/execution-metrics-spec.md - Add slow block JSON logging (threshold: 1000ms) - Include timing, throughput, and EVM operation counts - Leverage existing SLOAD/SSTORE/CALL metrics infrastructure - Output format matches cross-client standardization spec

Implement standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research. This change is part of the Cross-Client Execution Metrics initiative proposed by Gary Rong and CPerezz: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ The standardized metrics enabled data-driven analysis like the EIP-7907 research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850 JSON format includes: - block: number, hash, gas_used, tx_count - timing: execution_ms, total_ms - throughput: mgas_per_sec - state_reads: accounts, storage_slots, code, code_bytes - state_writes: accounts, storage_slots - cache: account/storage/code hits, misses, hit_rate - evm: sload, sstore, calls, creates Also adds ThreadLocal accessors to Db.Metrics for per-block cache statistics tracking without global synchronization overhead.

Convert timing calculations from long to double to preserve sub-millisecond precision in slow block JSON output. Changes: - ProcessingStats: timing variables changed to double with float division - Math.Round(..., 3) for consistent 3 decimal place output - ZeroContentionCounter: add Increment(long) overload for type safety

Add tracking for EIP-7702 delegation set/cleared operations as part of the cross-client execution metrics standardization effort. New metrics in Nethermind.Evm.Metrics: - Eip7702DelegationsSet: Number of EIP-7702 delegations set - Eip7702DelegationsCleared: Number of EIP-7702 delegations cleared Both metrics include thread-local variants for use in slow block logging. The slow block JSON output now includes: - state_writes.eip7702_delegations_set - state_writes.eip7702_delegations_cleared These fields will be 0 for pre-Pectra blocks (per spec).

ContractsAnalysed tracks jump destination analysis, not cache misses. CodeReads is incremented when code is loaded from DB (cache miss).

Instead of duplicating total_ms, calculate: execution_ms = total_ms - state_read_ms - state_hash_ms - commit_ms Falls back to total_ms if timing metrics not fully captured.

- Add IncrementCodeWrites/IncrementCodeBytesWritten in InsertCode - Add IncrementAccountWrites in SetState - Add IncrementStorageWrites in Set

Move IncrementAccountReads inside GetState to only count trie/DB reads, matching storage read semantics. Previously counted all reads including in-memory cache hits.

…metrics Pipe timing data to EvmMetrics for state_read_ms, state_hash_ms, and commit_ms: - WorldStateMetricsDecorator: pipe existing timing to EvmMetrics for RecalculateStateRoot, Commit (both overloads), and CommitTree - StateProvider.GetState: add timing around DB reads (not cache hits) - PersistentStorageProvider.LoadFromTree: add timing for storage reads Uses Stopwatch.GetTimestamp()/GetElapsedTime() with TimeSpan.Ticks (100ns precision) for minimal overhead (~10-20ns per measurement).

…tandardization Add new metrics for tracking account and storage deletions during EVM execution: - AccountDeleted: incremented when SetState is called with null account - StorageDeleted: incremented when SSTORE sets a slot to zero - Both metrics flow through ProcessingStats to slow block JSON output

Wire up existing but unused EIP-7702 delegation metric increment methods: - IncrementEip7702DelegationsSet() when setting delegation (non-zero address) - IncrementEip7702DelegationsCleared() when clearing delegation (zero address)

Add comprehensive integration tests for all cross-client execution metrics: - EIP-7702 delegation tests (set, clear, multiple) - Account metrics tests (reads, writes, deleted) - Storage metrics tests (reads, writes, deleted) - Code metrics tests (loaded, updated)

Allow customization of the slow block logging threshold via constructor parameter. Default remains 1000ms for backward compatibility. - Add slowBlockThresholdMs parameter to ProcessingStats constructor - Use >= comparison so threshold=0 logs all blocks (useful for testing)

Add InternalsVisibleTo attributes to allow test projects to access ThreadLocal metric values for verification in slow block tests.

Add comprehensive unit tests for slow block JSON output: - Verify all 31 spec-required fields are present - Test configurable threshold (0 = log all blocks) - Validate field names match snake_case spec format - Test timing, state read/write, and cache metrics

Dead code — the MetricsSnapshot and MetricsDelta classes were never referenced anywhere in the codebase.

…cumulator Replace 16 separate ZeroContentionCounter instances (each with its own ThreadLocal<BoxedLong>) with a single ThreadLocal<ExecutionMetricsAccumulator>. This addresses Ben's review feedback about the number of additional thread locals on hot execution paths. The consolidated accumulator reduces ThreadLocal overhead by requiring only one TLS lookup per increment instead of one per counter. Prometheus exposure is preserved via SumExecutionMetric() which iterates the shared ThreadLocal's Values collection. Call sites now use direct field increments (e.g. ThreadExecutionMetrics.AccountReads++) instead of method calls.

…simplify test spec - Replace anonymous object in LogSlowBlock with named record types for structured JSON serialization (SlowBlockLog, SlowBlockInfo, etc.) - Simplify spec retrieval in MetricsIntegrationTests to use Prague.Instance directly instead of resolving via MainnetSpecProvider

…flicts with master - Replace ExecutionMetricsAccumulator with ZeroContentionCounter pattern - Remove Stopwatch.GetTimestamp() from per-read hot paths (StateProvider, PersistentStorageProvider) - Use Utf8JsonWriter for zero-allocation slow block JSON serialization - Collapse BlockData Start/Current pairs into pre-computed deltas - Adapt to master's refactored CodeInfoRepository, WrappedWorldState, IProcessingStats - Condense tests from ~1600 to ~530 lines

Address PR review feedback from flcl42.

…block log Add per-phase timing (blooms_ms, receipts_root_ms, storage_merkle_ms, state_root_ms, evm_ms) and extended EVM/block fields (opcodes, self_destructs, empty_calls, contracts_analyzed, cached_contracts_used, blob_count, gas_limit) to the slow block JSON output. execution_ms retains its original definition for backwards compatibility. All new timing is captured via the same ZeroContentionCounter pattern used by existing metrics — ~25μs overhead per block.

Wire slow block threshold to IBlocksConfig so it can be set via config file or CLI. Default: 1000ms (log blocks slower than 1s). Set to 0 to log all blocks, -1 to disable slow block logging entirely. When disabled (-1), CaptureStartStats and delta computation are skipped on the block-processing thread — zero overhead from slow block analysis.

Slow block logging is opt-in. Set Blocks.SlowBlockThresholdMs=1000 to enable logging blocks slower than 1s.

New config flag Blocks.SlowBlockPerTxThresholdMs (default -1 = disabled). When enabled alongside SlowBlockThresholdMs, each transaction exceeding the per-tx threshold is included individually in the slow block JSON: "transactions": [ { "index": 42, "hash": "0x...", "gas_used": 8200000, "execution_ms": 12.3, "type": "Legacy", "to": "0x..." } ] Uses a thread-static PerTxTimingCollector with one Stopwatch.GetTimestamp per tx (~20ns). When disabled, zero overhead — the collector is not even checked in the tx loop.

claude · 2026-05-18T11:28:52Z

Claude finished @Marchhill's task in 3m 35s —— View job

PR Review: consensus: standardize slow block JSON output for cross-client metrics

Tasks:

Post initial comment
Read rule files and gather context
Read changed files and assess current state
Evaluate remaining blockers from Round 5 (LukaszRozmej's items)
Check for new issues
Finalize review summary

Summary

All five blockers from LukaszRozmej's 2026-05-15 review have been addressed in the latest commit (c8e6afa). No new Medium or High issues found. This PR is ready to merge.

LukaszRozmej Blockers — All Resolved

#	Finding	Status
1	Move `SlowBlockLogEntry` out of `Nethermind.Core.Test`	✅ Fixed — renamed + moved to `Nethermind.Consensus.Test/Processing/SlowBlockLogEntry.cs`
2	Document parallel-execution caveat for breakdown timings	✅ Fixed — `<remarks>` added to `LogSlowBlock` (`ProcessingStats.cs:598–609`) documenting that EVM/state-I/O counters accumulate across all parallel workers
3	Resolve `cached_contracts_used` / `cache.code.hits` duplication	✅ Fixed (Round 5)
4	Remove or wire up `ExecutionMetricsFlag.IsActive`	✅ Fixed — now a proper `#if NETHERMIND_NO_EXECUTION_METRICS` gate; in default builds counters fire unconditionally, define the symbol at build time for zero overhead
5	Fix "zero overhead when disabled" claim or gate counter increments	✅ Fixed — compile-time `NETHERMIND_NO_EXECUTION_METRICS` symbol elides all new `Interlocked` calls via JIT constant-folding + `AggressiveInlining`

Also fixed from Round 5 (Low):

ProcessingStats ctor default discrepancy — lower-level ctor now defaults to slowBlockThresholdMs = -1 (disabled) with an explanatory <remarks>, consistent with production defaults.

All Prior Findings — Confirmed Fixed

Finding	Round	Status
Storage cache hit rate bug (`StateTreeCache` subtracted from wrong start)	1	✅ Fixed
Per-tx timing all-zeros in parallel mode (`[ThreadStatic]` → process-wide static)	1	✅ Fixed
`StorageWrites` counted TSTORE	1	✅ Fixed
Interface pollution (default no-op methods on `IBlockTransactionsExecutor`)	1	✅ Fixed
State I/O metrics contaminated by prewarmer	3	✅ Fixed — `MainThread*` variants used
`cache.code.hits` numerator/denominator asymmetry	4	✅ Fixed — `MainThreadCodeDbCache`
Account/storage trie cache prewarmer contamination	4	✅ Fixed — `DbMetrics.MainThread*` variants
`PerTxTimingCollector._ticksPerTx` plain heap array	4	✅ Fixed — `ArrayPoolList<long>`
`cached_contracts_used` ≡ `cache.code.hits`	5	✅ Fixed — removed from JSON

Outstanding Low Items (acknowledged, non-blocking)

PerTxTimingCollector statics not volatile — _enabled, _ticksPerTx, and _count are plain static fields. Prepare() sets _ticksPerTx on the block-processing thread; parallel workers read it in Record() without a formal barrier. Safe on x86/.NET (TSO hardware + ThreadPool dispatch guarantees ordering). Class doc now explicitly states the single-block assumption.
Thread.Sleep(50) polling in ProcessingStatsTests.cs:46–50 — polling up to 5s for each assertion; fragile under load. A ManualResetEventSlim would be deterministic but is a test-quality concern only.
Missing parallel path test — no coverage of ProcessTransactionsParallel for the per-tx timing collector; the static-per-block assumption is not tested under concurrent workers.
MetricsIntegrationTests uses combined Metrics.AccountReads (not MainThread*) — susceptible to counter contamination from parallel test runs; Is.GreaterThanOrEqualTo softens this but weakens the spec.
Test fixture duplication — MetricsIntegrationTests and SlowBlockIntegrationTests share ~30 lines of setup boilerplate; per AGENTS.md this should be factored into a shared helper.

Positive Notes

All structural quality from prior reviews remains fully intact:

ExecutionMetricsFlag + MetricsTimer<TSink> pattern: JIT-foldable, clean zero-overhead compile path.
ArrayPoolList<long> for per-tx ticks with ownership transfer and pool return in BlockDataPolicy.Return.
ArrayBufferWriter<byte> + Utf8JsonWriter zero-allocation JSON path.
Object pool with proper reference nulling in Return.
Deltas pre-computed on block-processing thread before UnsafeQueueUserWorkItem dispatch.
StopTxTimer in finally — timing captured even for failing transactions.
Default interface methods for timing hooks — no-op inheritance without stub boilerplate.

| Branch

Copilot

Pull request overview

This PR introduces standardized “slow block” JSON logging (intended for cross-client execution metrics) by adding new execution counters (state/code reads+writes, EIP-7702 delegation set/clear) and block-phase timing, plus configuration flags to enable slow-block and per-transaction reporting.

Changes:

Add new EVM/state execution counters and block-phase timing sinks used by slow-block JSON logging.
Add slow-block JSON emission in ProcessingStats with optional per-transaction timing capture.
Add new Blocks config flags and integration tests validating metrics and JSON schema.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/Nethermind/Nethermind.State/StateProvider.cs	Adds account/code read/write metric increments during state access and code insertion.
src/Nethermind/Nethermind.State/PersistentStorageProvider.cs	Adds storage read/write metric increments via storage provider hooks.
src/Nethermind/Nethermind.State/PartialStorageProviderBase.cs	Makes `Set` virtual to allow metric-instrumented overrides.
src/Nethermind/Nethermind.Evm/Metrics.cs	Adds new execution counters + compile-time flag gating.
src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.Storage.cs	Adds storage-deletion metric increments on SSTORE-to-zero.
src/Nethermind/Nethermind.Evm/CodeInfoRepository.cs	Adds code read/byte counters and EIP-7702 delegation set/clear counters.
src/Nethermind/Nethermind.Evm.Test/MetricsIntegrationTests.cs	New tests asserting new EVM counters are incremented during tx processing.
src/Nethermind/Nethermind.Db/Metrics.cs	Exposes main-thread-only counters for block-level delta computation.
src/Nethermind/Nethermind.Core/Metric/MetricsTimer.cs	Adds reusable timer helper for tick-based metric sinks.
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs	Adds slow-block JSON logging, config-driven thresholds, and per-tx timing snapshotting.
src/Nethermind/Nethermind.Consensus/Processing/PerTxTimingCollector.cs	Adds process-wide collector for per-transaction execution timing.
src/Nethermind/Nethermind.Consensus/Processing/IBlockProcessor.cs	Extends executor interface with optional per-tx timing hooks (default no-ops).
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.ParallelBlockValidationTransactionsExecutor.cs	Wires per-tx timers into parallel tx execution path.
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.cs	Wraps block phases with `MetricsTimer` sinks for timing breakdown counters.
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.BlockValidationTransactionsExecutor.cs	Wires per-tx timers into sequential tx execution path.
src/Nethermind/Nethermind.Consensus.Test/Processing/SlowBlockLogEntry.cs	Adds typed model for deserializing slow-block JSON in tests.
src/Nethermind/Nethermind.Consensus.Test/Processing/SlowBlockIntegrationTests.cs	Adds integration tests validating metrics flow into slow-block JSON.
src/Nethermind/Nethermind.Consensus.Test/Processing/ProcessingStatsTests.cs	Adds schema/threshold/timing consistency tests for slow-block JSON.
src/Nethermind/Nethermind.Config/IBlocksConfig.cs	Adds `SlowBlockThresholdMs` and `SlowBlockPerTxThresholdMs` config items.
src/Nethermind/Nethermind.Config/BlocksConfig.cs	Implements new config properties with default `-1` (disabled).

Comments suppressed due to low confidence (6)

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:351

Slow-block threshold comparison truncates microseconds to whole milliseconds (processingMicros / 1000). This can miss blocks that are just over the configured threshold due to integer floor. Compare in microseconds (e.g. processingMicros >= thresholdMs * 1000) or use a double ms value consistently.

            // Log slow blocks in JSON format for cross-client performance analysis
            // Only log when slow block threshold is enabled (>= 0)
            if (_slowBlockThresholdMs >= 0)
            {
                long processingMs = data.ProcessingMicroseconds / 1000;
                if (processingMs >= _slowBlockThresholdMs)
                {
                    LogSlowBlock(block, data, mgasPerSec);
                }

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:176

PerTxTimingCollector is a static process-wide switch, but CaptureStartStats only ever enables it (SetEnabled(true)) and never disables it when SlowBlockPerTxThresholdMs < 0. If per-tx logging is enabled once (or another ProcessingStats instance enables it), later instances/blocks with per-tx disabled can still record timings and LogSlowBlock will treat a -1 threshold as 'log all' (negative ticks threshold). SetEnabled should be called with the current config value (true/false) each block, or make the collector instance-scoped.

            // Enable per-tx timing on the current block-processing thread.
            // Must be set here (not in Start()) because the async processing loop
            // can resume on a different ThreadPool thread after each await.
            if (_slowBlockPerTxThresholdMs >= 0)
            {
                PerTxTimingCollector.SetEnabled(true);
            }

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:699

The slow-block JSON 'evm' object omits cached-contract/cache-usage info even though the PR description schema includes it (e.g. cached_contracts_used) and ProcessingStats already tracks CurrentCachedContractsUsed/StartCachedContractsUsed. If cross-client standardization expects this field, add it to the JSON and the test schema/model accordingly.

                    writer.WriteStartObject("evm");
                    writer.WriteNumber("opcodes", data.CurrentOpCodes - data.StartOpCodes);
                    writer.WriteNumber("sload", data.CurrentSLoadOps - data.StartSLoadOps);
                    writer.WriteNumber("sstore", data.CurrentSStoreOps - data.StartSStoreOps);
                    writer.WriteNumber("calls", data.CurrentCallOps - data.StartCallOps);
                    writer.WriteNumber("empty_calls", data.CurrentEmptyCalls - data.StartEmptyCalls);
                    writer.WriteNumber("creates", data.CurrentCreatesOps - data.StartCreateOps);
                    writer.WriteNumber("self_destructs", data.CurrentSelfDestructOps - data.StartSelfDestructOps);
                    writer.WriteNumber("contracts_analyzed", data.CurrentContractsAnalyzed - data.StartContractsAnalyzed);
                    writer.WriteEndObject();

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:739

LogSlowBlock swallows exceptions and logs only ex.Message at debug level, losing the stack trace and exception details. Log the exception object (or include stack trace) to make failures diagnosable, especially since slow-block logging runs on the ThreadPool and issues may be intermittent.

            catch (Exception ex)
            {
                if (_logger.IsDebug) _logger.Debug($"Error logging slow block: {ex.Message}");
            }

src/Nethermind/Nethermind.State/StateProvider.cs:733

AccountReads is only incremented on cache misses (when the account isn't in _blockChanges). On cache hits you increment StateTreeCacheHits but not AccountReads, so slow-block JSON 'state_reads.accounts' ends up representing only misses and will duplicate cache.account.misses rather than total logical reads. If the intent is total reads (as implied by having separate cache hit/miss counts), increment AccountReads on both hit and miss paths (or rename the JSON field to clarify it's DB/state-trie reads only).

        ref ChangeTrace accountChanges = ref CollectionsMarshal.GetValueRefOrAddDefault(_blockChanges, addressAsKey, out bool exists);
        if (!exists)
        {
            Metrics.IncrementStateTreeReads();
            EvmMetrics.IncrementAccountReads();
            Account? account = _tree.Get(address);

            accountChanges = new(account, account);
        }
        else
        {
            Metrics.IncrementStateTreeCacheHits();
        }

src/Nethermind/Nethermind.Evm/Metrics.cs:171

These new execution metrics use shared Interlocked increments. Under ParallelExecution, worker threads inherit IsBlockProcessingThread=true (see ProcessingStats remarks), so multiple workers will contend on the same '_main*' counters. If the goal is minimal overhead for cross-client metrics, consider per-thread counters (e.g., ThreadLocal/ThreadStatic/striped counters) and aggregating at snapshot time to reduce contention.

    [CounterMetric]
    [Description("Number of account reads during execution.")]
    public static long AccountReads => _mainAccountReads + _otherAccountReads;
    private static long _mainAccountReads;
    private static long _otherAccountReads;
    // Exposed for ProcessingStats so block-level deltas exclude background prewarmer activity.
    internal static long MainThreadAccountReads => _mainAccountReads;
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    internal static void IncrementAccountReads()
    {
        if (!ExecutionMetricsFlag.IsActive) return;
        Interlocked.Increment(ref IsBlockProcessingThread ? ref _mainAccountReads : ref _otherAccountReads);
    }

    [CounterMetric]
    [Description("Number of storage slot reads during execution.")]
    public static long StorageReads => _mainStorageReads + _otherStorageReads;
    private static long _mainStorageReads;
    private static long _otherStorageReads;
    internal static long MainThreadStorageReads => _mainStorageReads;
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    internal static void IncrementStorageReads()
    {
        if (!ExecutionMetricsFlag.IsActive) return;
        Interlocked.Increment(ref IsBlockProcessingThread ? ref _mainStorageReads : ref _otherStorageReads);
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LukaszRozmej · 2026-05-20T14:15:05Z

Review — focused on correctness and performance

Verdict

Feature is well-designed and test coverage is excellent. However, the PR description's claim that "there is zero overhead on block processing when disabled" is not quite accurate. The new EVM-side metric counters fire unconditionally in default builds (gated only by compile-time ExecutionMetricsFlag, not by SlowBlockThresholdMs). On hot paths (SSTORE, SLOAD, account reads/writes), this adds one atomic Interlocked op per state operation, always on, even when the feature is configured to -1.

Performance concerns

1. Hot-path atomics added regardless of `SlowBlockThresholdMs` (HIGH)

ExecutionMetricsFlag.IsActive is a compile-time-folded constant, true by default. Only a recompile with NETHERMIND_NO_EXECUTION_METRICS disables it. So in production builds, even with SlowBlockThresholdMs = -1:

PersistentStorageProvider.Set → IncrementStorageWrites() on every SSTORE that mutates
EvmInstructions.Storage.cs (both InstructionSStoreMetered and InstructionSStoreUnmetered) → IncrementStorageDeleted() on every SSTORE-to-zero
PersistentStorageProvider.LoadFromTree → IncrementStorageReads() on every uncached SLOAD
StateProvider.GetState (via WarmUp) → IncrementAccountReads() on every uncached account touch
StateProvider.SetState → IncrementAccountWrites() (+ IncrementAccountDeleted() when null) on every state mutation
CodeInfoRepository.GetCodeInfo → IncrementCodeReads() + IncrementCodeBytesRead() on every code DB fetch

Each is Interlocked.Increment(ref IsBlockProcessingThread ? ref _main... : ref _other...). On x64 that's ~10–15 ns per call (uncontested atomic) plus a thread-static read. For the example block in the PR description (~1800 SSTOREs, ~3200 SLOADs, ~2000 account changes), that's ~70–100 µs/block of overhead even when the feature is "disabled" by config. The PR description quotes ~31 µs/block for slow-block capture; the real per-block tax in default builds is larger and unavoidable without recompiling.

Suggestion: Gate Increment* calls behind a runtime flag (e.g. _blocksConfig.SlowBlockThresholdMs >= 0 exposed as a volatile static), or set ExecutionMetricsFlag.IsActive from config at startup. If you keep the compile-time flag, please correct the PR description.

2. Per-tx `try/finally` is added unconditionally

BlockProcessor.BlockValidationTransactionsExecutor.ProcessTransaction now wraps transactionProcessor.ProcessTransaction(...) in try/finally. Even with _enableTxTimingMetrics = false, the JIT still emits the exception-filter table. Tiny per-tx cost but always on.

In the parallel executor, inner.StartTxTimer() / inner.StopTxTimer() are interface calls through IBlockProcessor.IBlockTransactionsExecutor (default interface methods). The JIT likely cannot devirtualize since inner is interface-typed at the call site — 2 vcalls per tx in the parallel hot path.

3. `LogSlowBlock` recomputes blob count

Transaction.GetBlobCount() is summed both in UpdateStats (existing) and again in LogSlowBlock (new). Cheap, but redundant — store the blob count on BlockData and reuse.

Correctness concerns

1. `PerTxTimingCollector` is process-wide static

_ticksPerTx is a static field, written on the block-processing thread (Prepare), read/written on parallel ThreadPool workers (Record). It is not volatile. Workers see writes only because ParallelUnbalancedWork.For provides the implicit synchronisation barrier. This works today but is fragile — please document the dependency explicitly. Same applies to BlockValidationTransactionsExecutor._enableTxTimingMetrics.

2. `PerTxTimingCollector.SetEnabled(true)` is sticky

CaptureStartStats calls SetEnabled(true) per block when _slowBlockPerTxThresholdMs >= 0, but there is no corresponding SetEnabled(false). If the config is hot-reloaded or in a multi-ProcessingStats scenario, the static stays true. The Prepare/Snapshot flow tolerates this, but the API is asymmetric.

3. `Stopwatch` ticks vs `TimeSpan` ticks

MetricsTimer<T>.Dispose uses Stopwatch.GetElapsedTime(_start).Ticks which returns TimeSpan ticks (100 ns) — dividing by TimeSpan.TicksPerMillisecond later is correct. Worth a doc comment on IncrementStateHashTime(long ticks) clarifying the unit is TimeSpan ticks, not Stopwatch ticks, to avoid future regressions.

4. Negative-time fallback hides a real bug

double executionMs = totalMs - stateHashMs - commitMs;
if (executionMs < 0) executionMs = totalMs;

If this branch is ever taken, the log entry silently lies (claims execution ≥ total). At minimum, debug-log when clamping so the bug surfaces.

5. `BlockData` pool — new `Delta*` fields not reset on `Return`

BlockDataPolicy.Return resets PerTxTicks and the existing fields but not the new DeltaAccountReads, DeltaStorageReads, etc. Safe only because UpdateStats either writes all of them (threshold ≥ 0) or never reads them (threshold < 0). If the threshold ever changes at runtime, stale deltas leak. Belt-and-braces fix: reset them too.

6. Cross-client metric semantics under parallel execution

The LogSlowBlock <remarks> correctly notes that under ParallelExecution the EVM and state-read/write counters sum across workers while wall-clock timings remain per-block. Good documentation, but this is a real caveat for "cross-client" comparability — Geth/Reth don't run txs in parallel by default. Consumers will see inflated counter values relative to wall-clock ms. Worth surfacing in the JSON itself (e.g. parallel_execution: true) so analysers can normalise.

Minor

private bool _enableTxTimingMetrics = false; — = false is redundant.
Double blank line after StopTxTimer.
IBlockProcessor.IBlockTransactionsExecutor gains 3 new default interface methods — public-API change and a small ABI risk for plugins implementing the interface.
MetricsTimer<T> as a parameterless readonly ref struct is nice — stack-only, inlinable, scope-based timing.

Suggested fixes before merge

Either gate hot-path Increment* by a runtime flag tied to SlowBlockThresholdMs >= 0, or correct the "zero overhead when disabled" claim in the PR body and docs.
Document the synchronisation dependency of PerTxTimingCollector / _enableTxTimingMetrics on ParallelUnbalancedWork.For.
Reset the new Delta* fields in BlockDataPolicy.Return.
Add a parallel_execution boolean to the JSON so cross-client analyses can normalise counter values.
Tiny: store blob count in BlockData; remove = false init; debug-log the executionMs < 0 fallback.

- Document ExecutionMetricsFlag's compile-time gate and unconditional default-build cost - Cache BlobCount on BlockData; reuse in LogSlowBlock instead of re-summing transactions - Reset all Delta* fields in BlockDataPolicy.Return - Emit parallel_execution flag in the slow-block JSON (wired via BlocksConfig) - Debug-log when executionMs/evmMs clamp negative (surfaces inconsistent sub-phase deltas) - Document that the timing increments take TimeSpan ticks, not Stopwatch ticks - Document PerTxTimingCollector / _enableTxTimingMetrics reliance on the parallel join barrier - Drop redundant = false initializer and double blank line in BlockValidationTransactionsExecutor Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Marchhill · 2026-05-20T15:31:43Z

@claude review

claude · 2026-05-20T15:40:06Z

Claude finished @Marchhill's task in 5m 50s —— View job

PR Review: consensus: standardize slow block JSON output for cross-client metrics

Tasks:

Post initial comment
Read rule files and gather context
Read changed files and assess current state
Evaluate LukaszRozmej's blocker items (from 2026-05-20 review)
Check for new issues
Finalize review summary

Summary

This is round 7. LukaszRozmej posted a review at 2026-05-20 14:15 UTC with 5 "Suggested fixes before merge." Marchhill committed two fixes today (56776b6 at 11:46 and ebd66dd at 16:30, the latter post-review) that address all 5 blockers. No new Critical or High issues found.

LukaszRozmej's Blockers (from 2026-05-20) — All Resolved

#	Finding	Status
1	Gate hot-path `Increment*` or correct "zero overhead" claim	✅ Fixed — `ExecutionMetricsFlag` doc comment now explicitly states "regardless of `SlowBlockThresholdMs`" and documents the ~10–15 ns/call cost. PR number references also removed from code comments (`56776b6`).
2	Document `PerTxTimingCollector` / `_enableTxTimingMetrics` synchronisation dependency	✅ Fixed — `PerTxTimingCollector.cs` remarks now documents the `ParallelUnbalancedWork.For` join-barrier contract; `BlockValidationTransactionsExecutor._enableTxTimingMetrics` field also has an inline comment cross-referencing the contract (`ebd66dd`).
3	Reset new `Delta*` fields in `BlockDataPolicy.Return`	✅ Fixed — All 18 `Delta*` fields are reset in `BlockDataPolicy.Return` (ProcessingStats.cs:817–841).
4	Add `parallel_execution` boolean to slow-block JSON	✅ Fixed — `writer.WriteBoolean("parallel_execution", _parallelExecution)` at ProcessingStats.cs:661; wired from `BlocksConfig.ParallelExecution`; `SlowBlockLogEntry` updated accordingly.
5	Tiny: store blob count in `BlockData`; remove `= false` init; debug-log negative clamp	✅ Fixed — `data.BlobCount` reused in `LogSlowBlock`; `= false` removed from `_enableTxTimingMetrics`; double blank line removed; both `executionMs < 0` and `evmMs < 0` clamping paths now debug-log with diagnostic context.

Additional items addressed this round

_slowBlockThresholdMs doc comment now says "Defaults to -1 (disabled)" instead of the previously inaccurate "default: 1000ms" (56776b6).
TimeSpan-ticks unit clarified with block comment in Metrics.cs (lines 310–313) explaining that these are 100 ns TimeSpan ticks, not raw Stopwatch timestamp ticks.
CodeReads and StorageReads semantics documented: both count only DB-backed reads (cache misses), and the inline comments now say so explicitly (56776b6).

All Prior Findings — Confirmed Fixed

Finding	Round	Status
Storage cache hit rate bug (`StateTreeCache` wrong source)	1	✅ Fixed
Per-tx timing all-zeros in parallel mode	1	✅ Fixed
`StorageWrites` counted TSTORE	1	✅ Fixed
Interface pollution (default no-op methods)	1	✅ Fixed
State I/O prewarmer contamination	3	✅ Fixed
`cache.code.hits` numerator/denominator asymmetry	4	✅ Fixed
Account/storage trie cache prewarmer contamination	4	✅ Fixed
`PerTxTimingCollector._ticksPerTx` plain heap array	4	✅ Fixed
`cached_contracts_used` ≡ `cache.code.hits`	5	✅ Fixed
`SlowBlockLogEntry` in wrong test assembly	5	✅ Fixed
`ProcessingStats` ctor default discrepancy	5	✅ Fixed
Parallel caveat not documented	5/6	✅ Fixed

Remaining Low Items (non-blocking, acknowledged)

Test fixture duplication — MetricsIntegrationTests and SlowBlockIntegrationTests share ~30 lines of fixture setup (per AGENTS.md, this should be factored into a helper). Low impact since these are test files only.
Thread.Sleep(50) polling in tests — Fragile under load; a ManualResetEventSlim would be deterministic. Non-blocking test quality concern.
Missing parallel path test — No test covers ProcessTransactionsParallel for per-tx timing; the static-collector design and join-barrier contract are not tested under actual concurrent workers.
Exception logging drops stack trace — LogSlowBlock catch block logs only ex.Message at Debug level (ProcessingStats.cs:760). Acceptable as a "don't fail block processing" guard, but a regressions from a serialization bug would be harder to diagnose.

Positive Notes (all still valid)

ExecutionMetricsFlag + MetricsTimer<TSink> pattern: JIT-foldable; overhead truly zero when built with NETHERMIND_NO_EXECUTION_METRICS.
ArrayPoolList<long> for per-tx ticks with ownership transfer and pool return in BlockDataPolicy.Return.
ArrayBufferWriter<byte> + Utf8JsonWriter zero-allocation JSON path.
Object pool with all field resets in Return (including all new Delta* fields).
Deltas pre-computed on block-processing thread before UnsafeQueueUserWorkItem dispatch.
StopTxTimer in finally: timing captured even for failing transactions.
Default interface methods for timing hooks: no stub boilerplate on implementors.

| Branch

…eptions - Replace SpinWait.SpinUntil polling in ProcessingStatsTests / SlowBlockIntegrationTests with a Monitor.Wait/PulseAll-based WaitableTestLogger; the test thread unblocks the instant the ThreadPool worker emits the slow-block JSON, no fixed sleeps under load. - LogSlowBlock catch now logs at Error with the full Exception (stack trace included) instead of dropping it via ex.Message at Debug, matching the existing ExecuteFromThreadPool pattern. Block processing is still protected from log failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

benaadams · 2026-05-26T06:51:33Z

@claude re-review

claude · 2026-05-26T06:51:54Z

Claude finished @benaadams's task in 6m 3s —— View job

PR Re-Review (Round 8): consensus: standardize slow block JSON output for cross-client metrics

Tasks:

Post initial comment
Read rule files and gather context
Read changed files and assess current state
Verify all prior findings are addressed
Analyze a03e6d1 (benaadams' new commit: "Remove duplicate execution read metrics")
Finalize review summary

Summary

All High/Medium findings from Rounds 1–7 remain confirmed fixed. The latest commit (a03e6d1b) by benaadams removes dedicated AccountReads, StorageReads, and CodeReads counter increments from Evm.Metrics and derives state_reads.* from existing cache statistics instead — a good performance improvement. This commit introduces one new Medium concern (PR description documentation now out of sync with implementation) and two Low items.

New Findings from `a03e6d1b`

Medium — PR description example is stale after read-counter removal

a03e6d1b made state_reads.* entirely derived from cache.* values:

// ProcessingStats.cs — post-commit
blockData.DeltaAccountReads = blockData.DeltaAccountCacheMisses;          // always == cache.account.misses
blockData.DeltaStorageReads = blockData.DeltaStorageCacheHits + blockData.DeltaStorageCacheMisses;
blockData.DeltaCodeReads    = blockData.DeltaCodeCacheMisses;              // always == cache.code.misses

The PR description example still shows:

"state_reads": { "accounts": 1842, ... },
"cache": { "account": { "hits": 1520, "misses": 322, ... } }

With the current implementation state_reads.accounts will always equal cache.account.misses (322 in that example, not 1842). The example implies state_reads.accounts = hits + misses = total logical reads, but the implementation outputs only DB-backed reads (misses). Since this PR's description functions as the cross-client specification, the example should be updated to reflect actual semantics — or state_reads.accounts / state_reads.code should be removed (they're now redundant with cache.account.misses / cache.code.misses).

Low — Asymmetric semantics across `state_reads.*` sub-fields

After the change:

state_reads.accounts = account trie DB cache misses (reads that go to the trie/DB)
state_reads.code = code DB cache misses
state_reads.storage_slots = all storage trie node accesses (hits + misses)

The three fields use different counting methods. The PR description doesn't document this asymmetry and consumers will likely expect consistent semantics across all three.

Low — `MetricsIntegrationTests` uses the combined counter after read-counter removal

The test was updated in a03e6d1b to use DbMetrics.StateTreeReads (= _main + _other) instead of the now-removed Metrics.MainThreadAccountReads:

// MetricsIntegrationTests.cs:80
long startReads = DbMetrics.StateTreeReads;  // combined, not MainThread*
...
Assert.That(DbMetrics.StateTreeReads - startReads, Is.GreaterThanOrEqualTo(2));

The test sets ProcessingThread.IsBlockProcessingThread = true, so the increments land in _main*. But another parallel test's prewarmer/non-main-thread activity could increment _other*, making the combined delta spuriously large. DbMetrics.MainThreadStateTreeReads would give a tighter, isolated delta.

Prior Findings — All Confirmed Fixed

Finding	Round	Status
Storage cache hit rate bug	1	✅ Fixed
Per-tx timing all-zeros in parallel mode	1	✅ Fixed
`StorageWrites` counted TSTORE	1	✅ Fixed
Interface pollution (default no-op methods)	1	✅ Fixed
State I/O prewarmer contamination	3	✅ Fixed
`cache.code.hits` numerator/denominator asymmetry	4	✅ Fixed
Account/storage trie cache prewarmer contamination	4	✅ Fixed
`PerTxTimingCollector._ticksPerTx` plain heap array	4	✅ Fixed
`cached_contracts_used` ≡ `cache.code.hits`	5	✅ Fixed
`SlowBlockLogEntry` in wrong test assembly	5	✅ Fixed
`ProcessingStats` ctor default discrepancy	5	✅ Fixed
Parallel caveat not documented	5–6	✅ Fixed
All five LukaszRozmej blockers	5–7	✅ Fixed
Unconditional `IsActive => true` dead toggle	6–7	✅ Fixed — `#if NETHERMIND_NO_EXECUTION_METRICS`
`Delta*` fields not reset in `BlockDataPolicy.Return`	7	✅ Fixed — all 18 fields reset
`parallel_execution` not in JSON	7	✅ Fixed
Negative-time fallback silently lies	7	✅ Fixed — now debug-logs clamp
PR description "zero overhead" claim	7	✅ Fixed — `ExecutionMetricsFlag` docs now explicit
Duplicate `AccountReads`/`StorageReads` `Interlocked` overhead	implicit	✅ Removed in `a03e6d1b`

Acknowledged Low (non-blocking, unchanged)

PerTxTimingCollector statics not volatile — class <remarks> explicitly documents the join-barrier dependency on ParallelUnbalancedWork.For; safe on x86/.NET.
Missing parallel path test — no coverage of ProcessTransactionsParallel for per-tx timing; single-block assumption is undocumented under actual concurrent workers.
Test fixture duplication — MetricsIntegrationTests / SlowBlockIntegrationTests share ~30 lines of setup boilerplate; AGENTS.md recommends factoring this out.

Positive Notes

Everything praised in prior rounds still holds. The a03e6d1b commit is a net improvement: removing three hot-path Interlocked.Increment calls per state access (two for account reads, one for storage reads) reduces per-block overhead with no loss of accuracy, since the DB-level Db.Metrics trie counters already track this information. WaitableTestLogger with Monitor.Wait/PulseAll (replacing the old Thread.Sleep(50) polling) is a clean deterministic test fix.

| Branch

benaadams · 2026-05-26T08:34:22Z

@claude re-review

claude · 2026-05-26T08:34:38Z

Claude finished @benaadams's task in 6m 9s —— View job

PR Re-Review (Round 9): consensus: standardize slow block JSON output for cross-client metrics

Tasks:

Post initial comment
Read rule files and gather context
Read changed files and assess current state vs Round 8
Evaluate new commits (68d6f79, 43a0b97, 2c66f53)
Check for new issues
Finalize review summary

Summary

Three new commits since Round 8. All prior High/Medium findings remain fixed. The new commits are improvements; no new High or Medium issues found. This PR is ready to merge.

New Commits — Analysis

`2c66f538` — Use main-thread DB metrics in EVM metric tests ✅

Fixes the Round 8 Low finding. MetricsIntegrationTests now uses DbMetrics.MainThreadStateTreeReads and DbMetrics.MainThreadStorageTreeReads + MainThreadStorageTreeCache instead of combined counters, eliminating susceptibility to parallel-test-run contamination. Also adds InternalsVisibleTo("Nethermind.Evm.Test") to Nethermind.Db.

`43a0b97c` — Restore cached contracts slow block metric ✅

Restores evm.cached_contracts_used in the slow-block JSON. Both evm.cached_contracts_used and cache.code.hits still use the MainThreadCodeDbCache delta, so they remain equal at runtime. The PR description's "EVM field definitions" table explicitly documents this equality ("retained because it is part of the original cross-client metrics spec"), which is the right place for that note.

Minor: The JSON example in the PR body shows cached_contracts_used: 97 and cache.code.hits: 180 — these should be equal but differ in the example. The text in the table explains they're always equal, so consumers who read that table will understand, but the inconsistent example could confuse cross-client implementors who parse the example first.

`68d6f79e` — Use thread-static block processing attribution ✅

Key improvement. Switches ProcessingThread.IsBlockProcessingThread from AsyncLocal<bool> to [ThreadStatic], with a new test PreWarmCaches_DoesNotFlowIsBlockProcessingThread_IntoTask that validates the critical invariant.

Correctness: With AsyncLocal, tasks spawned by the prewarmer (via BlockCachePreWarmer.PreWarmCaches) inherited IsBlockProcessingThread = true from the calling thread's ExecutionContext. That caused prewarmer EVM metric increments to land in _main* counters — the root cause of the prewarmer contamination that previous rounds worked around with MainThread* accessor additions. With [ThreadStatic], the flag is not inherited by new tasks or threads: prewarmer tasks correctly see IsBlockProcessingThread = false and route their increments to _other*. The new test pins this behavior.

Performance: [ThreadStatic] reads are ~1–2 ns vs ~5–20 ns for AsyncLocal, benefiting all existing and new Increment* calls across every hot-path state operation.

BlockchainProcessor safety: The loop correctly saves/restores the flag in each iteration (bool previousMainThread = IsBlockProcessingThread; ... finally { IsBlockProcessingThread = previousMainThread; }). After await _blockQueue.Reader.WaitToReadAsync, the continuation may run on a fresh thread with IsBlockProcessingThread = false; the code immediately sets it to IsMainProcessor before any processing — no stale state window.

Round 8 Findings — Status

Finding	Status
PR description example stale after `a03e6d1b`	✅ Non-issue — PR description already shows `state_reads.accounts: 322 = cache.account.misses: 322`; numbers are self-consistent
Asymmetric `state_reads.*` semantics (accounts/code = misses, storage = hits+misses)	✅ Documented in "State read field definitions" table; intentional by design
`MetricsIntegrationTests` uses combined counter	✅ Fixed in `2c66f538`

All Prior Findings — All Confirmed Fixed

Finding	Round	Status
Storage cache hit rate bug	1	✅
Per-tx timing all-zeros in parallel mode	1	✅
`StorageWrites` counted TSTORE	1	✅
Interface pollution (default no-op methods)	1	✅
State I/O prewarmer contamination	3	✅
`cache.code.hits` numerator/denominator asymmetry	4	✅
Account/storage trie cache prewarmer contamination	4	✅
`PerTxTimingCollector._ticksPerTx` plain heap array	4	✅
`cached_contracts_used` ≡ `cache.code.hits` (removed)	5	✅ (restored with doc in 9)
`SlowBlockLogEntry` in wrong test assembly	5	✅
`ProcessingStats` ctor default discrepancy	5	✅
Parallel caveat not documented	5–6	✅
All five LukaszRozmej Round 7 blockers	7	✅
`ExecutionMetricsFlag.IsActive` dead toggle	7	✅
`Delta*` fields not reset in `BlockDataPolicy.Return`	7	✅
Negative-time clamp silently lies	7	✅
Prewarmer contamination at source	8→9	✅ Fixed in `68d6f79e`
`MetricsIntegrationTests` combined counter	8	✅ Fixed in `2c66f538`

Remaining Low Items (non-blocking, acknowledged)

evm.cached_contracts_used example inconsistency — the JSON example in the PR body shows cached_contracts_used: 97 and cache.code.hits: 180, but at runtime they are always equal (same MainThreadCodeDbCache delta). The "EVM field definitions" table documents the equivalence; consider updating the example to show equal values.
PerTxTimingCollector statics not volatile — documented in class <remarks> as relying on the ParallelUnbalancedWork.For join-barrier; safe on x86/.NET.
Missing parallel path test — no test covers ProcessTransactionsParallel for per-tx timing under actual concurrent workers.
Test fixture duplication — MetricsIntegrationTests / SlowBlockIntegrationTests share ~30 lines of setup boilerplate.

Positive Notes

All structural quality from prior reviews still holds, and the [ThreadStatic] change in 68d6f79e makes the whole metrics attribution model materially cleaner: the MainThread* accessors now truly reflect main-thread-only work at the source, not just at the query boundary.
| Branch

- Factor the duplicated SpecProvider / WorldState / VM / TxProcessor / Ecdsa setup out of MetricsIntegrationTests + SlowBlockIntegrationTests into an IDisposable EvmTestHarness in Nethermind.Evm.Test/Helpers/. Add Nethermind.Evm.Test as a ProjectReference of Nethermind.Consensus.Test so the harness is visible to both fixtures. - Add PerTxTimingCollectorTests covering ProcessTransactionsParallel's contract on the static collector: 512 concurrent ParallelUnbalancedWork.For workers writing unique values via Record, plus zero-tx prepare, disabled-collector returns null, ownership-transfer cleanliness across blocks, and out-of-range Record being silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CPerezz and others added 30 commits January 21, 2026 12:00

fix(metrics): set slow block threshold to 1000ms per spec

170e2cf

fix(metrics): use CodeReads for code cache miss tracking

1c0573d

ContractsAnalysed tracks jump destination analysis, not cache misses. CodeReads is incremented when code is loaded from DB (cache miss).

fix(metrics): derive execution_ms from timing breakdown

a355db7

Instead of duplicating total_ms, calculate: execution_ms = total_ms - state_read_ms - state_hash_ms - commit_ms Falls back to total_ms if timing metrics not fully captured.

feat(metrics): add state write tracking

20aadd6

- Add IncrementCodeWrites/IncrementCodeBytesWritten in InsertCode - Add IncrementAccountWrites in SetState - Add IncrementStorageWrites in Set

fix(metrics): track account reads at DB level only

84c84c7

Move IncrementAccountReads inside GetState to only count trie/DB reads, matching storage read semantics. Previously counted all reads including in-memory cache hits.

fix(chainspec): add maxCodeSize to spaceneth for EIP-3860

6d0a208

fix(chainspec): add explicit chainID to spaceneth

2061afc

fix(chainspec): add Prague system contracts to spaceneth genesis

b28789e

chore: expose internal metrics to test projects

db2bdf1

Add InternalsVisibleTo attributes to allow test projects to access ThreadLocal metric values for verification in slow block tests.

Merge branch 'master' into feature/execution-metrics-standardization

3c5d62a

chore: remove unused MetricsSnapshot.cs

d53bebc

Dead code — the MetricsSnapshot and MetricsDelta classes were never referenced anywhere in the codebase.

Merge origin/master into feature/execution-metrics-standardization

b5f2594

test: use Prague.Instance directly instead of spec provider lookups

88b20ab

Address PR review feedback from flcl42.

chore: default SlowBlockThresholdMs to -1 (disabled)

7103fa3

Slow block logging is opt-in. Set Blocks.SlowBlockThresholdMs=1000 to enable logging blocks slower than 1s.

Marchhill added 2 commits May 18, 2026 14:08

small fixes

572ee0f

reduce comments

fb62e7c

Marchhill requested review from LukaszRozmej and Copilot May 19, 2026 14:08

Copilot started reviewing on behalf of Marchhill May 19, 2026 14:09 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

review comments

56776b6

Marchhill and others added 3 commits May 21, 2026 13:35

Merge branch 'master' into feature/execution-metrics-standardization

ad8519c

Remove duplicate execution read metrics

a03e6d1

benaadams approved these changes May 26, 2026

View reviewed changes

benaadams added 2 commits May 26, 2026 08:05

Use main-thread DB metrics in EVM metric tests

2c66f53

Restore cached contracts slow block metric

43a0b97

benaadams approved these changes May 26, 2026

View reviewed changes

Use thread-static block processing attribution

68d6f79

benaadams added the skip-evm-benchmarks label May 26, 2026

fix lint

2801b23

benaadams approved these changes May 26, 2026

View reviewed changes

Marchhill merged commit 06761a4 into master May 26, 2026
562 of 1080 checks passed

Marchhill deleted the feature/execution-metrics-standardization branch May 26, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consensus: standardize slow block JSON output for cross-client metrics#11400

consensus: standardize slow block JSON output for cross-client metrics#11400
Marchhill merged 60 commits into
masterfrom
feature/execution-metrics-standardization

Marchhill commented Apr 29, 2026 •

edited by benaadams

Loading

Uh oh!

claude Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LukaszRozmej commented May 20, 2026

Uh oh!

Marchhill commented May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 •

edited

Loading

Uh oh!

benaadams commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026 •

edited

Loading

Uh oh!

benaadams commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Marchhill commented Apr 29, 2026 • edited by benaadams Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

References

Configuration

Example usage

JSON Output Format

State read field definitions

EVM field definitions

Timing field definitions

Implementation

Uh oh!

claude Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

LukaszRozmej Blockers — All Resolved

All Prior Findings — Confirmed Fixed

Outstanding Low Items (acknowledged, non-blocking)

Positive Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LukaszRozmej commented May 20, 2026

Review — focused on correctness and performance

Verdict

Performance concerns

1. Hot-path atomics added regardless of SlowBlockThresholdMs (HIGH)

2. Per-tx try/finally is added unconditionally

3. LogSlowBlock recomputes blob count

Correctness concerns

1. PerTxTimingCollector is process-wide static

2. PerTxTimingCollector.SetEnabled(true) is sticky

3. Stopwatch ticks vs TimeSpan ticks

4. Negative-time fallback hides a real bug

5. BlockData pool — new Delta* fields not reset on Return

6. Cross-client metric semantics under parallel execution

Minor

Suggested fixes before merge

Uh oh!

Marchhill commented May 20, 2026

Uh oh!

claude Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

LukaszRozmej's Blockers (from 2026-05-20) — All Resolved

Additional items addressed this round

All Prior Findings — Confirmed Fixed

Remaining Low Items (non-blocking, acknowledged)

Positive Notes (all still valid)

Uh oh!

benaadams commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Re-Review (Round 8): consensus: standardize slow block JSON output for cross-client metrics

Summary

New Findings from a03e6d1b

Medium — PR description example is stale after read-counter removal

Low — Asymmetric semantics across state_reads.* sub-fields

Low — MetricsIntegrationTests uses the combined counter after read-counter removal

Prior Findings — All Confirmed Fixed

Acknowledged Low (non-blocking, unchanged)

Positive Notes

Uh oh!

benaadams commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Commits — Analysis

Marchhill commented Apr 29, 2026 •

edited by benaadams

Loading

claude Bot commented May 18, 2026 •

edited

Loading

1. Hot-path atomics added regardless of `SlowBlockThresholdMs` (HIGH)

2. Per-tx `try/finally` is added unconditionally

3. `LogSlowBlock` recomputes blob count

1. `PerTxTimingCollector` is process-wide static

2. `PerTxTimingCollector.SetEnabled(true)` is sticky

3. `Stopwatch` ticks vs `TimeSpan` ticks

5. `BlockData` pool — new `Delta*` fields not reset on `Return`

claude Bot commented May 20, 2026 •

edited

Loading

claude Bot commented May 26, 2026 •

edited

Loading

New Findings from `a03e6d1b`

Low — Asymmetric semantics across `state_reads.*` sub-fields

Low — `MetricsIntegrationTests` uses the combined counter after read-counter removal

claude Bot commented May 26, 2026 •

edited

Loading

`2c66f538` — Use main-thread DB metrics in EVM metric tests ✅

`43a0b97c` — Restore cached contracts slow block metric ✅

`68d6f79e` — Use thread-static block processing attribution ✅