prefetcher: builder-phase prefetch + streaming worker pool by lucca30 · Pull Request #2192 · 0xPolygon/bor

lucca30 · 2026-04-20T18:23:49Z

Purpose

Today's prefetcher only runs in a speculative idle phase: before block-building starts, it scans the tx pool and warms state for what it guesses the block might contain. Whatever it misses — late-arriving p2p txs, txs unlocked by freed gas, anything the pool view didn't reflect at guess-time — pays the full cache-miss cost at commit time. On contract-heavy blocks this is the tail of the miss-rate distribution.

This PR adds three more prefetch phases, synchronized to the live block build, so the prefetcher stops guessing and starts following what the builder is actually about to do. All three target exactly the gap the idle phase leaves: near-certain txs the builder will commit, warmed with near-zero speculation.

Prefetch lifecycle (after this PR)

One long-lived worker pool per block; four sequential tx providers feed it. The pool is never torn down between phases — the mode switch is a provider swap, not a prefetcher restart.

Phase	When	Source of txs	What it warms	Precision character
1. Idle	Pre-build, while `buildAndCommitBlock` is still assembling its environment	Pool snapshot, scanned by price-nonce heap with a gas-capped budget	Whatever idle can fit before builder starts	Speculative — bounded by pool staleness
2. Builder: upfront plan scan	Moment builder mode begins, once per `commitTransactions` pass (priority + normal)	`sendPlan` clones the price-nonce heap; `buildTxPlan` walks it with `remainingGas()` as budget, emits every non-prefetched tx that fits	The bulk of builder-phase prefetch — the full predicted block contents based on declared gas limits, minus whatever idle already warmed	Near-deterministic — same heap the builder will consume, filtered by the same gas arithmetic
3. Builder: per-tx forwarding	During `commitTransactions`, right before each apply	Each tx forwarded through `builderPlanCh` as the builder reaches it	Last-mile catch for anything the upfront scan missed (late pool arrivals, prefetch-status changes in flight)	Deterministic — one channel hop ahead of the EVM
4. Builder: freed-gas overflow	Continuously during commit, as each tx completes	`ltx.Gas − actualUsed` delivered via `builderGasFreedCh`; overflow heap scanned when budget accumulates	Bonus txs the builder promotes into the tail once freed gas accrues — including txs the upfront scan had to `Pop()` as "too large"	Semi-deterministic — tracks the same heap the builder promotes from

All three builder phases dedup against prefetchedTxHashes and a sentThisPhase local set, so no tx is re-executed. The three-edge dedup matrix (plan↔overflow within iter, plan→overflow cross-iter, overflow→plan cross-iter) is fully closed.

Handoff sequence (idle → builder)

Coordinated by a two-signal interrupt to avoid pool teardown:

Builder sets builderStarted → coordinator sets evmAbort.Store(true). In-flight idle EVM execution aborts via the EVM interrupt; workers entering the loop see the flag and skip.
Coordinator drains any buffered idle txs from txsCh non-blockingly.
evmAbort.Store(false). Workers resume, now fed by the builder provider (upfront-plan + per-tx + freed-gas overflow).

No duplicate prefetches, no lost builder txs, no worker pool churn.

Precision impact

New headline metric: worker/prefetch/builder_added_percent — fraction of a block's txs the builder phase (upfront plan + per-tx + overflow combined) warmed on its own, beyond what idle had already done. Attributes each prefetch completion to the phase it fired under, so operators can see at a glance how much of a block's cache warming idle alone could not have reached.

Also split: worker/txApplyDuration/{prefetched,notPrefetched} — quantifies the cache-miss penalty this lifecycle is closing.

Together these two metrics answer: how many txs needed builder-phase help, and how much apply-time that help saved.

Implementation notes

Streaming worker pool. Workers block on range txsCh (buffer 4096, ≈33 KB); phase switches change producer only, not pool.
Block-import path unchanged. Prefetch(block, ...) keeps its original signature as a thin wrapper around PrefetchStream. Same topology, same parallelism, same PrefetchResult shape.
Per-tx isolation unchanged. Each worker still calls statedb.Copy() — no shared state, no consensus-path coupling.
Complexity. diffguard: no new hot-path functions exceed the 10-cognitive bar; all flagged functions are pre-existing.

Validation

go build ./... + golangci-lint run clean
go test -race ./core/... ./miner/... — 121s, all prefetch + slow-tx tests pass. Includes 13 new unit tests on the pure primitives (scanOverflow, forwardTxs, collectPlanBatch, buildTxPlan, no-duplicate-forward invariant across 10k iterations) and 3 stream-lifecycle integration tests.
diffguard --base origin/develop — no new complexity violations.
Local kurtosis-pos devnet, 5 scenarios (baseline / builder-heavy / handoff-stress / panic-fidelity / dup-probe). Cross-scenario invariants held on all of them:
- Zero duplicate forwards across 369 blocks with ~500 tx/block density.
- processed == fails + successful in every PrefetchStream exit (2180+ closures).
- Panic recovery: synthetic panic every 500 tx × 525K txs → chain_prefetch_worker_panic meter = 1050, harness log count = 1050 (exact match). Node kept producing blocks throughout.
- pprof on a live node: zero leaked prefetch goroutines between blocks (sync.Once shutdown verified).
- Full lifecycle log sequence (STREAM-START → IDLE-EXIT → HANDOFF-BEGIN/DRAIN/RESET → BUILDER-EXIT → STREAM-END) intact per block across all scenarios.
Ultra-review pass: one actionable finding (prepareWork early-return bypassing defer interruptPrefetch.Store(true)) — fixed.
5 standard review rounds addressed; final 3 findings closed the cross-iteration dedup matrix.

Replace per-call Prefetch() invocations with a long-running PrefetchStream that spans the block lifetime. A fixed worker pool pulls transactions from a channel, eliminating pool-startup overhead across the ~500 builder-mode and ~20 idle-mode calls previously issued per block. Idle and builder phases share the same stream, acting as swappable tx providers. Phase handoff uses a two-signal interrupt: hardKill for permanent exit and evmAbort for soft, per-phase aborts that discard in-flight idle work before the builder provider takes over. Prefetch(block, ...) keeps its original signature as a thin wrapper so blockchain.go behavior stays byte-identical.

…mers, builder contribution metric Correlate tail latency with prefetch miss rate so we can tell whether the slowest block txs are the ones we failed to prefetch or whether prefetch quality is not the lever. Three additions: * slow_tx_tracker 10-min log: each of the top-K slowest txs is now annotated inline with MGas/s and a prefetched/not-prefetched flag. MGas/s uses integer math on nanoseconds — per-tx durations are in the tens of microseconds and float seconds lose precision. * worker/txApplyDuration/{prefetched,notPrefetched}: side-by-side p50/ p95/p99 of the two populations. The aggregate worker/txApplyDuration stays for dashboard compatibility. * worker/prefetch/builder_added_percent: share of block txs that the builder phase prefetched, isolating the payoff of the new builder- phase prefetch beyond the existing overall miss-rate metric.

Sizes the buffer to the idle provider's theoretical per-loop burst (IdleGasBudget / 21k ≈ 4762 at default config) so the scan never blocks on a full channel. Workers drain at ~200k tx/s so the channel sits mostly empty in practice; the buffer exists to absorb the brief scan-phase burst where idle dumps a heap's worth of txs. Memory cost: 4096 * 8 B + channel header ≈ 33 KB, negligible. Handoff drain is O(actual-occupancy), not capacity, so a larger buffer doesn't slow phase transitions.

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

lucca30 · 2026-04-20T18:25:34Z

@claude review once

claude · 2026-04-20T18:25:52Z

Claude finished @lucca30's task in 3m 52s —— View job

Code Review: prefetcher builder-phase prefetch + streaming worker pool

Read and analyze core/state_prefetcher.go changes
Read and analyze miner/worker.go changes
Read and analyze miner/ordering.go changes
Read and analyze miner/slow_tx_tracker.go changes
Read and analyze test files
Check concurrency safety (atomics, channels, goroutine lifecycle)
Check for resource leaks (goroutines, channels)
Check for DoS vectors (unbounded allocations, missing limits)
Post comprehensive review

Overall Assessment

This is a well-structured PR that introduces a builder-phase prefetch path alongside the existing idle-phase prefetcher. The architecture is clean: a single long-lived worker pool shared across phases, with a soft-interrupt (evmAbort) for phase transitions and a hard-interrupt (hardKill) for permanent exit. The code is thoroughly tested, channel lifecycle management is careful, and the backward-compatible Prefetch() wrapper is a good call.

No critical or high-severity issues found. One medium finding on metric attribution accuracy, and several low/nit observations below.

Medium

1. Race window in builder-phase metric attribution (`worker.go:2306-2317`)

The onSuccess callback checks builderStarted.Load() to decide whether to attribute a successful prefetch to the builder phase:

if genParams.builderPrefetchedTxHashes != nil &&
    genParams.builderStarted != nil && genParams.builderStarted.Load() {
    genParams.builderPrefetchedTxHashes.Store(hash, struct{}{})
}

The comment claims "The handoff (evmAbort + drainTxChan + EVM interrupt) guarantees no idle-phase tx's onSuccess fires after builderStarted flips." This isn't strictly true. There's a window between builderStarted.Store(true) (set in buildAndCommitBlock at line 2188) and evmAbort.Store(true) (set later in runPrefetcher at line 2338 after runIdleTxProvider returns). During this window:

A worker already past the evmAbort check at line 173 is executing an idle-phase tx
The EVM finishes before evmAbort is set (so EVM interrupt doesn't fire)
onSuccess runs, sees builderStarted=true, and records the tx in builderPrefetchedTxHashes

Impact: Metric-only — builder_added_percent could be inflated by 1-2 txs per block. No correctness or security issue. The aggregate miss-rate metric (prefetchedTxHashes) is unaffected since the tx was genuinely prefetched.

Suggestion: Either update the comment to acknowledge the small attribution inaccuracy, or gate builder attribution on a separate flag set after evmAbort.Store(false) in the handoff sequence. Given the metric-only impact, updating the comment is probably sufficient.

Fix this →

Low / Nit

2. `runWorker` skips txs under `evmAbort` — subtle channel consumption semantics (`state_prefetcher.go:169-177`)

func (s *streamCtx) runWorker() {
    for tx := range s.txsCh {
        if s.hardKill != nil && s.hardKill.Load() {
            return
        }
        if s.evmAbort != nil && s.evmAbort.Load() {
            continue
        }
        s.processTx(tx)
    }
}

When evmAbort=true, workers consume txs from the channel and discard them. This is correct for the handoff (idle txs need to be consumed so they don't block the channel), and drainTxChan handles the bulk drain. But it does mean a tx can be "lost" — consumed but never processed. This is fine because prefetch is best-effort and the same txs will be re-sent by the builder provider if still relevant. No change needed, just noting for reviewers.

3. Non-blocking sends can silently drop plan txs (`worker.go:2005-2009`)

for _, tx := range buildTxPlan(clone, gasLimit, prefetchedHashes) {
    select {
    case ch <- tx:
    default:
    }
}

Also in forwardTxs (line 2533). If the 4096-deep channel is full, plan txs are silently dropped. This is acceptable given the buffer sizing rationale in the PR description (4096 ≈ full block at 21k gas), and prefetch is explicitly best-effort. But if block gas limits increase significantly in the future, this could silently degrade plan coverage. Consider a debug-level log or metric for dropped sends to aid future diagnosis.

4. `buildOverflowHeap` takes a fresh pool snapshot (`worker.go:2545-2551`)

The overflow heap is built by querying Pending() again, which may include txs that arrived after fillTransactions took its snapshot. This means the overflow view can diverge from the builder's actual view. Not a bug (prefetch is speculative), but worth a brief comment noting the intentional staleness tradeoff.

5. `txIndex` as ordering hint is approximate (`state_prefetcher.go:181`)

idx := int(s.txIndex.Add(1) - 1)

Since workers pull from the channel concurrently, txIndex doesn't reflect the actual nonce/execution order — it's the order in which workers claim txs. This is fine because SkipNonceChecks = true (line 231) and the index is only used for SetTxContext. Worth noting that this is a deliberate design choice.

6. Minor: `clone()` shares `signer` and `baseFee` references (`ordering.go:198-213`)

The clone() method shares signer and baseFee with the original. Both are effectively immutable after creation (signer is stateless, baseFee is a uint256 value), so this is safe. No change needed.

What looks good

Two-signal interrupt design (hardKill + evmAbort) is clean and well-documented. The evmAbort as both a soft worker-level skip and an EVM interrupt is elegant.
Channel lifecycle is carefully managed: planWg.Wait() before close(builderPlanCh), close(builderGasFreedCh) before close(builderPlanCh), defer interruptPrefetch.Store(true) covering all exit paths.
Prefetch() backward compatibility — wrapping the new streaming path via a buffered channel + close is minimal and correct.
Test coverage is thorough: TestPrefetchStream_HardKillExits, TestPrefetchStream_EvmAbortSkipsAndResumes, TestPrefetchStream_BlockEquivalence directly exercise the core concurrency contracts. TestBuilderTxProvider_FiltersAlreadyPrefetched and TestBuilderTxProvider_FreedGasFeedback validate the builder provider in isolation.
Goroutine leak test (TestPrefetchFromPool_BuilderModeSwitch) with runtime goroutine counting is a practical safeguard.
collectPlanBatch batching with a 2ms window is a good approach to amortize overhead while keeping latency low.
Panic recovery in the prefetch goroutine (line 2156) prevents a prefetcher bug from crashing the node.
Metrics design — splitting txApplyDuration by prefetch status while preserving the aggregate is backward-compatible. The builder_added_percent metric directly measures the feature's value proposition.

|

…mports fixes Review feedback: the old onSuccess closure checked genParams.builderStarted.Load() to decide whether to attribute a successful prefetch to the builder phase. That check races with buildAndCommitBlock's builderStarted.Store(true), which fires before runPrefetcher reaches the handoff — an idle-phase tx whose EVM execution finishes in the gap between those two moments would be miscounted as builder. Impact was metric-only but the old comment overstated the guarantee. Route attribution through a dedicated inBuilderPhase *atomic.Bool that the coordinator flips to true only after the handoff completes (evmAbort drain + reset). Any onSuccess firing after that point is known to come from post-handoff work, so builder_added_percent now reflects genuine builder-phase contribution. Also fix goimports formatting in core/state_prefetcher.go and miner/worker.go flagged by CI lint.

claude · 2026-04-20T18:40:45Z

Code review

Found 1 issue. Checked for bugs and CLAUDE.md compliance.

Bug: Spin loop in streamIdleBatch when channel buffer is full

File: miner/worker.go lines 2429-2442

When txsCh is full, the default case fires but txs.Shift() still executes unconditionally. Since the tx was not sent, it is not added to localPrefetched and gas is not subtracted from gaspool. This means nextViableIdleTx keeps finding viable transactions (gas budget never decreases), and the loop churns through the entire heap doing O(N log N) heap operations (Peek + Shift) without sending anything.

The comment at line 2412 says Dropped sends (buffer full) are not retried -- but the current behavior is worse than not retrying: it burns through every remaining transaction doing useless work.

Suggested fix: Add return in the default case so the function stops when the channel is full.

Review feedback: when the stream channel fills mid-batch, the default branch in streamIdleBatch was dropping the tx but still calling txs.Shift() and continuing to walk the heap. Since dropped sends don't subtract from the gas budget or populate localPrefetched, the outer loop's viability check (nextViableIdleTx) kept returning valid txs, so every remaining entry burned Peek + Shift cycles only to drop. Return on the default branch instead. The outer runIdleTxProvider loop re-snapshots the pool every ~100ms, by which time workers have drained the channel and there's capacity for new sends. Dropping one tx per batch and re-entering the outer loop is strictly cheaper than walking the rest of the heap dropping everything.

claude · 2026-04-20T18:48:06Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

… resolves Review feedback on buildTxPlan / scanOverflow / collectPlanBatch: * buildTxPlan decremented remaining before calling ltx.Resolve(), so when a LazyTransaction failed to resolve (tx evicted between heap listing and resolution) the budget was consumed for a tx that never entered the plan. Reorder: charge gas only after Resolve succeeds, matching scanOverflow. The prefetched-skip path keeps consuming gas intentionally — those txs are still bound for the block. * scanOverflow and collectPlanBatch called prefetchedHashes.Load() without a nil guard, while the sibling buildTxPlan explicitly guarded. Production is safe because commitWork always initializes the map, but the API inconsistency is a real footgun for unit tests or future callers. Add the missing guards so all three functions accept a nil *sync.Map as "no hashes known, don't skip". No functional change in the current call paths; tests unchanged.

lucca30 · 2026-04-20T19:08:51Z

@claude Review it again. Do a careful analysis on the block building lifecycle, how prefetch works with block building. If any code introduced may lead to a panic, dead code or decreasing on performance. This PR aims to increase the window of prefetching by do also prefetching while block is building.

Some mechanism were introduced like buildTxPlan to let us better predict the most probable txs to be introduced so we can improve our prefetch precision.

claude · 2026-04-20T19:09:11Z

Claude finished @lucca30's task in 9m 23s —— View job

Deep Review: Builder-Phase Prefetch + Block Building Lifecycle

Note: The three findings from the prior reviews (buildTxPlan gas-before-resolve, nil guards, idle batch spin loop) have all been properly fixed in commits fdf704d and 9a91340. This review is a fresh pass.

Block Building ↔ Prefetch Lifecycle Trace

Traced the full flow to verify correctness of the two-phase design:

commitWork()
  ├─ genParams{prefetchedTxHashes: &sync.Map{}, builderStarted: new(atomic.Bool)}
  │
  ├─ go runPrefetcher(parent, throwaway, genParams, &interruptPrefetch)
  │    │
  │    ├─ Creates txsCh (4096 buf), evmAbort, inBuilderPhase
  │    ├─ go PrefetchStream(header, throwaway, ..., hardKill=&interruptPrefetch, evmAbort, txsCh, onSuccess)
  │    │    └─ Workers: for tx := range txsCh { if hardKill→return; if evmAbort→skip; processTx(tx) }
  │    │
  │    ├─ PHASE 1: runIdleTxProvider(txsCh, ...)
  │    │    └─ Loop ~100ms: Pending() → heap → streamIdleBatch → non-blocking sends to txsCh
  │    │       Exits when: builderStarted=true OR hardKill OR gas budget exhausted
  │    │
  │    ├─ HANDOFF (if builderStarted):
  │    │    evmAbort.Store(true)   ← interrupts in-flight EVM + workers skip
  │    │    drainTxChan(txsCh)     ← empties buffered idle txs
  │    │    evmAbort.Store(false)  ← workers resume
  │    │    inBuilderPhase.Store(true) ← metric attribution gate
  │    │
  │    ├─ PHASE 2: runBuilderTxProvider(txsCh, ...)
  │    │    ├─ buildOverflowHeap() — fresh pool snapshot for freed-gas overflow
  │    │    └─ Loop: collectPlanBatch(planCh, gasFreedCh, 2ms) → scanOverflow → forwardTxs(txsCh)
  │    │       Exits when: planCh closes OR hardKill
  │    │
  │    └─ SHUTDOWN: evmAbort.Store(true); close(txsCh); <-streamDone
  │
  └─ buildAndCommitBlock(interrupt, noempty, genParams, &interruptPrefetch)
       ├─ Creates builderPlanCh (4096), builderGasFreedCh (256)
       ├─ builderStarted.Store(true)  ← triggers idle→builder handoff
       ├─ defer interruptPrefetch.Store(true)  ← kills prefetcher on ALL exit paths
       │
       ├─ fillTransactions(interrupt, work, genParams)
       │    ├─ sendPlan: clone heap → go buildTxPlan → non-blocking sends to builderPlanCh
       │    └─ commitTransactions: for each tx {
       │         apply(tx) → on success: freed_gas = ltx.Gas - actualUsed → builderGasFreedCh
       │       }
       │
       ├─ planWg.Wait()            ← sendPlan goroutines done
       ├─ close(builderGasFreedCh) ← prefetcher sees gas stream end
       ├─ close(builderPlanCh)     ← prefetcher exits builder loop
       └─ commit(block)

Lifecycle verdict: The sequencing is sound. Channel creation precedes builderStarted flip. Channel closes follow the correct order (gasFreedCh before planCh). The planWg.Wait() ensures no sends to a closed channel. The defer interruptPrefetch.Store(true) covers all exit paths including panics.

Panic Analysis

Source	Path	Protected?	Severity
`runPrefetcher` goroutine	`commitWork` line 2161	Yes — defer/recover + `prefetchPanicMeter`	Safe
`sendPlan` goroutine	`fillTransactions` line 2009	No — but only does channel sends (non-blocking), `buildTxPlan` is arithmetic + heap walks. Very low panic surface.	Low risk
`PrefetchStream` worker goroutines	`state_prefetcher.go` line 125	No — no recover. A panic in `processTx` → `prefetchOneTx` → `ApplyMessage` kills the process.	See finding below
Nil deref on `prefetchedHashes`	`buildTxPlan`, `scanOverflow`, `collectPlanBatch`	Yes — all three now have nil guards (fixed in `fdf704d`)	Safe
Send to closed channel	`txsCh`, `builderPlanCh`, `builderGasFreedCh`	Yes — sequencing guarantees no concurrent senders at close time	Safe
Freed gas underflow `ltx.Gas - actualUsed`	`commitTransactions` line 1571	Yes — `actualUsed = gasPoolBefore - gasPool.Gas()` = `msg.GasLimit - gasRemaining` ≤ `ltx.Gas` always (gas pool charges full limit, refunds remaining)	Safe

Findings

1. 🔴 Worker goroutines in `PrefetchStream` lack panic recovery — node crash on any EVM edge case

File: core/state_prefetcher.go lines 124-128

for i := 0; i < workers; i++ {
    go func() {
        defer pool.Done()
        ctx.runWorker()  // no recover()
    }()
}

Each worker calls processTx → prefetchOneTx → statedb.Copy() → ApplyMessage → full EVM execution → IntermediateRoot. An unrecovered panic in any of these operations (corrupted state object, unexpected nil in trie path, precompile edge case) kills the entire node process.

The parent runPrefetcher goroutine has a recover (line 2161), but Go's recover only catches panics in the goroutine that calls it — it does not protect child goroutines. The worker goroutines are separate goroutines.

This is a pre-existing issue (old Prefetch code had the same pattern), but this PR expands the surface: workers now run longer (streaming vs. single-batch), process builder-phase txs in addition to idle-phase txs, and run IntermediateRoot(true) on every tx. More time in workers = more exposure to edge-case panics.

Impact: Node crash from a prefetcher bug. The prefetcher operates on throwaway state copies, so any panic is by definition non-critical to correctness — it should be recovered, not fatal.

Suggested fix: Add defer recover() to the worker goroutine, matching the pattern used in commitWork's prefetcher goroutine:

go func() {
    defer pool.Done()
    defer func() {
        if r := recover(); r != nil {
            log.Error("Prefetch worker panicked", "err", r)
        }
    }()
    ctx.runWorker()
}()

Fix this →

2. 🟡 `inBuilderPhase` attribution has a residual edge case (IntermediateRoot tail)

File: miner/worker.go lines 2311-2353

The inBuilderPhase flag (replacing the racier builderStarted check from the prior review's finding) is a significant improvement. However, a narrow edge remains:

evmAbort.Store(true) — interrupts in-flight EVM execution
A worker was past ApplyMessage (EVM done) but inside IntermediateRoot(true) (not interruptible by evmAbort)
drainTxChan(txsCh) + evmAbort.Store(false) + inBuilderPhase.Store(true) all execute while that worker is still in IntermediateRoot
Worker's processTx completes → onSuccess fires → inBuilderPhase.Load() returns true → idle tx miscounted as builder

Impact: Metric-only — builder_added_percent could be inflated by at most 1 tx per block. The overall miss-rate metric (prefetchedTxHashes) is unaffected. The handoff itself (draining + abort) typically takes microseconds, so IntermediateRoot would need to span that entire window. In practice this is near-impossible.

Suggestion: A comment acknowledging this residual edge case would prevent future readers from assuming the attribution is strictly exact. No code change needed.

3. 🟡 `sendPlan` goroutines have no panic recovery

File: miner/worker.go lines 2009-2017

genParams.planWg.Add(1)
go func() {
    defer genParams.planWg.Done()
    for _, tx := range buildTxPlan(clone, gasLimit, prefetchedHashes) {
        select {
        case ch <- tx:
        default:
        }
    }
}()

buildTxPlan walks the cloned heap calling ltx.Resolve() and doing arithmetic. While the panic surface is small (no EVM, no state), Resolve() calls ltx.Pool.Get(ltx.Hash) which touches pool internals. An unrecovered panic here would crash the node.

The planWg.Wait() in buildAndCommitBlock (line 2234) would also hang forever if the goroutine panics before planWg.Done() fires — but defer genParams.planWg.Done() ensures it fires even on panic, so the WaitGroup is safe. The process crash is the real concern.

Suggested fix: Add defer func() { if r := recover(); r != nil { ... } }() inside the goroutine. Given buildTxPlan is pure computation (no IO, no state mutation), this is lower priority than finding #1.

4. 🟢 Performance analysis — no regressions found on the critical path

Additions to commitTransactions hot loop (per tx):

gasPoolBefore := env.gasPool.Gas() — one uint64 read
Freed gas calculation + non-blocking channel send — one subtraction + one select (skipped entirely when builderGasFreedCh == nil)
sync.Map.Load() for prefetch status — one atomic map read
slowTxTracker.Add() — O(1) fast-path reject, O(log 10) on accept

Additions to fillTransactions:

sendPlan: clone() is O(N) pointer copies (not deep copies), done synchronously. The buildTxPlan scan runs in a goroutine — does not block the builder.

Overall: The hot-path overhead is ~3 uint64 ops + one non-blocking select per tx. The heap clone is the heaviest addition but is O(N) pointer copies for N≈pool size, which is sub-microsecond for typical pools. No allocations in the hot loop beyond what already existed.

buildOverflowHeap calls Pending() once per block — this is a full pool snapshot. It's additional work vs. the pre-PR baseline, but equivalent to one iteration of the idle provider's loop. Acceptable.

5. 🟢 Dead code analysis — none found

All new functions are reachable:

PrefetchStream ← runPrefetcher + Prefetch wrapper
buildTxPlan, scanOverflow, collectPlanBatch ← fillTransactions, runBuilderTxProvider
clone() ← sendPlan
streamIdleBatch, nextViableIdleTx, waitUntilNextLoop ← runIdleTxProvider
forwardTxs, buildOverflowHeap, drainTxChan ← runPrefetcher / runBuilderTxProvider
formatSlowTxs, mgasPerSecond ← flushSlowTxWindow

One minor waste: In the runPrefetcher path, PrefetchStream returns a PrefetchResult that is discarded. The successfulTxs slice is populated under txsMutex per successful tx, but never read. The useful output goes via the onSuccess callback. This is negligible overhead (mutex is uncontended in practice — workers spend orders of magnitude more time in EVM execution than in the lock).

What looks good

inBuilderPhase replaces direct builderStarted check — addresses the metric attribution race from the prior review with a proper post-handoff flag.
streamIdleBatch returns on full channel (fixed in 9a91340) — eliminates the spin-loop risk.
buildTxPlan gas deduction after Resolve() (fixed in fdf704d) — gas budget is no longer consumed for evicted txs.
All three *sync.Map consumers have nil guards (fixed in fdf704d) — consistent API contract.
Channel lifecycle is rigorous: planWg.Wait() → close(gasFreedCh) → close(planCh) ordering prevents all send-to-closed-channel panics. The defer interruptPrefetch.Store(true) covers all exit paths.
Two-signal interrupt (hardKill + evmAbort) is clean. evmAbort serving dual duty as both worker-level skip and EVM interpreter interrupt is elegant and reduces synchronization complexity.
Freed gas arithmetic is underflow-safe: ltx.Gas - actualUsed where actualUsed = msg.GasLimit - gasRemaining ≤ msg.GasLimit = ltx.Gas.
Concurrent clone() + commitTransactions is safe: the cloned heap shares LazyTransaction pointers with the original, but Resolve() is read-only (no caching write). Heap operations only rearrange slices, not pointed-to structs.
Test coverage is thorough — TestPrefetchStream_HardKillExits, _EvmAbortSkipsAndResumes, _BlockEquivalence directly exercise the core concurrency contracts. The goroutine-leak test (TestPrefetchFromPool_BuilderModeSwitch) with runtime goroutine counting is a practical safeguard.

Summary

The design is sound and well-engineered. The prior review's findings have all been addressed. I found one actionable issue: PrefetchStream worker goroutines lack panic recovery (finding #1), which is pre-existing but worth fixing given the expanded worker lifetime in this PR. The metric attribution edge case (finding #2) is theoretical-only. No dead code, no performance regressions on the critical path, and the channel lifecycle management is careful throughout.

--- | Branch: lmartins/lmartins/builder-prefetch-mode-switch

codecov · 2026-04-20T19:30:13Z

Codecov Report

❌ Patch coverage is 90.40140% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.68%. Comparing base (92e427a) to head (1696a5d).

Files with missing lines	Patch %	Lines
core/state_prefetcher.go	78.51%	22 Missing and 7 partials ⚠️
miner/worker.go	93.61%	21 Missing and 5 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2192      +/-   ##
===========================================
+ Coverage    52.58%   52.68%   +0.10%     
===========================================
  Files          885      885              
  Lines       156286   156673     +387     
===========================================
+ Hits         82179    82544     +365     
- Misses       68845    68863      +18     
- Partials      5262     5266       +4

Files with missing lines	Coverage Δ
core/blockchain.go	`62.20% <ø> (ø)`
miner/ordering.go	`94.54% <100.00%> (+0.86%)`	⬆️
miner/slow_tx_tracker.go	`73.61% <100.00%> (+16.71%)`	⬆️
miner/worker.go	`77.17% <93.61%> (+4.02%)`	⬆️
core/state_prefetcher.go	`79.16% <78.51%> (-12.40%)`	⬇️

... and 22 files with indirect coverage changes

Files with missing lines	Coverage Δ
core/blockchain.go	`62.20% <ø> (ø)`
miner/ordering.go	`94.54% <100.00%> (+0.86%)`	⬆️
miner/slow_tx_tracker.go	`73.61% <100.00%> (+16.71%)`	⬆️
miner/worker.go	`77.17% <93.61%> (+4.02%)`	⬆️
core/state_prefetcher.go	`79.16% <78.51%> (-12.40%)`	⬇️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Review feedback: worker goroutines in PrefetchStream ran without their own recover. Go's recover only catches panics in its own goroutine — the parent runPrefetcher's recover does not protect the children. Any panic in statedb.Copy / ApplyMessage / IntermediateRoot / precompile edge cases would therefore kill the node even though prefetching runs on a throwaway state copy and is best-effort by definition. Add defer/recover per worker (new chain/prefetch/worker/panic meter) and match it on the sendPlan goroutine in fillTransactions (reuses the existing worker/prefetch/panic meter). Also documents the residual attribution edge where a worker stuck in IntermediateRoot (not interruptible by evmAbort) can still reach onSuccess after inBuilderPhase flips. Metric-only impact, bounded by at most one tx per block; added a comment instead of a runtime fix since the window is tiny in practice and closing it would require a more invasive barrier.

claude

The block building lifecycle looks correct: channel setup happens before , prevents sends-after-close, and the defer on covers all exit paths. No panics, dead code, or correctness regressions found. The two new nit-level issues found by this review (gas limit overestimation in the second call, and a metrics overcount for interrupted txs) are attached as inline comments — they affect only prefetch efficiency and metric accuracy, not block building correctness. The spin loop and nil-guard issues from prior reviews remain unaddressed and worth fixing before merge.

Extended reasoning...

Overview

This PR introduces a builder-phase prefetch path over ~600 lines across , , , , and tests. The review focused on the block building lifecycle, potential panics, dead code, and performance regressions as requested.

Block Building Lifecycle

The sequencing is correct: creates and before calling , ensuring the prefetcher goroutine always finds a valid channel. before channel close prevents any send-to-closed-channel race from goroutines. The covers all exit paths from . No deadlock or goroutine leak paths were found.

New Issues Found (Nit-Level)

Two new nit-level bugs are attached as inline comments: (1) the second call uses the full block gas limit rather than the remaining gas after priority txs, causing to over-plan normal-tier txs that will be rejected — wasting prefetch worker capacity; (2) is incremented in before checks the interrupt flag, so early-abort txs inflate . Both are metrics/efficiency issues only; block building correctness is unaffected.

Security Risks

No security risks. The prefetch path operates on throwaway state copies and cannot affect committed block state. All channels are properly bounded (4096 buffer) with non-blocking sends, preventing DoS from a stuck consumer.

Level of Scrutiny

This is a production-critical path in the Bor block builder. The streaming worker pool, two-signal interrupt scheme, and idle→builder handoff are novel concurrency patterns that warrant human review before merge, especially given several issues identified across multiple review passes that remain unaddressed (spin loop in , nil guards in /, gas ordering in ).

…cution Round 4 of review fixes — four independent issues: * sendPlan gasLimit stale for the second call. The closure captured env.header.GasLimit once, but the second invocation (normal-tier txs) runs after commitTransactions has already consumed gas from env.gasPool. Plan was over-sized, wasting prefetch capacity on txs that the builder would reject. Thread gasLimit as an explicit argument and pass env.gasPool.Gas() on the second call (with a nil guard for the first, where env.gasPool is still nil). * prefetchOneTx early-interrupt return skipped fails.Add(1). txIndex is incremented unconditionally in processTx, so every interrupt-aborted tx was counted as a successful prefetch in blockPrefetchTxsValidMeter. The function docstring already promises fails is bumped on every (0,false) return; the interrupt path was the one branch that didn't. * runPrefetcher shutdown (evmAbort + close(txsCh) + <-streamDone) was sequential, so a panic in runIdleTxProvider or runBuilderTxProvider unwound past it. The PrefetchStream goroutine and its N workers would then block forever on `range txsCh` (hardKill is only checked after a dequeue). Wrap the close in a sync.Once-guarded closure and defer it so the channel is released on every exit path, panics included. * Builder phase could double-execute an in-flight plan tx. scanOverflow checked prefetchedHashes, which is only written by onSuccess after EVM completion — a tx forwarded in one batch but still mid-EVM when the next batch's overflow scan ran could be re-emitted and picked up by a second worker. Track locally-sent hashes in runBuilderTxProvider (single- threaded, plain map is fine) and have scanOverflow consult it alongside prefetchedHashes.

- scanOverflow: break instead of Pop on gas-too-large so accumulated extendedBudget can cover price-leading accounts on later iterations; popping permanently evicted them from the long-lived overflow heap. - runBuilderTxProvider: populate sentThisPhase with the current plan batch before scanOverflow so bonus collection can't re-emit a tx already returned by collectPlanBatch in the same window. - PrefetchStream worker recover: increment ctx.fails so the {valid,invalid} meters stay consistent with txIndex when a panic aborts prefetchOneTx mid-flight.

Covers the helpers that prior review rounds kept surfacing bugs in: - scanOverflow: zero budget, budget growth preserves popped accounts (regression test for the h.Pop() → break fix), in-flight skip without budget cost, prefetched skip without budget cost. - forwardTxs: sent-hash recording, dropped-on-full channel must not record, nil sentThisPhase is safe. - collectPlanBatch: closed planCh, timer-fires-on-empty, freed-gas accumulation, prefetched filter, closed gas channel nilled out. - runBuilderTxProvider: no-duplicate-forward invariant across interleaved planCh + freed-gas signals — regression test for the within-iteration dedup fix (sentThisPhase pre-population).

lucca30 · 2026-04-22T22:55:07Z

@claude ultra-review this PR.

Context:

8 commits on top of develop; feature is a streaming prefetcher with idle→builder phase handoff.
5 rounds of standard @claude review already landed; all findings addressed through 827c017fb.
13 unit tests cover the pure helpers (scanOverflow, forwardTxs, collectPlanBatch, buildTxPlan, runBuilderTxProvider no-dup invariant).
Kurtosis devnet run build(deps): bump mixin-deep from 1.3.1 to 1.3.2 in /dashboard/assets #2 empirically validated: BUILDER-DUP=0 across 369 blocks, panic meter == log count exactly (1050==1050), 0 leaked goroutines in pprof, STREAM-END arithmetic 2180/2180 ok, builder_added_percent p50 climbed 0→86 under tight idle cap.

Please focus only on angles the above does not already cover:

1. Concurrency correctness. Trace every interaction between hardKill, evmAbort, inBuilderPhase, sync.Once-shutdown, close(txsCh), and the worker pool. Look for:

lost-wakeup between evmAbort.Store(true) and drainTxChan
double-close of txsCh
a goroutine that can observe hardKill=false, evmAbort=true, closed txsCh in an order that stalls it
panic during handoff hold that skips sync.Once

2. Resource-accounting invariants, proven not just asserted:

(a) txIndex == valid + invalid on every PrefetchStream exit (normal / hardKill / evmAbort+close / panic / interrupt)
(b) No tx hash reaches txsCh twice in a single runBuilderTxProvider invocation
(c) builderPrefetchedTxHashes is only written while inBuilderPhase==true

3. Perf regression on the hot import path. Compare the new PrefetchStream wrapper (used by blockchain.go for block import) to the old per-call Prefetch(). Any path where the streaming version does strictly more work for the same input?

4. Failure modes:

builderPlanCh closed before runBuilderTxProvider starts
empty txPool at buildOverflowHeap time
stale pool snapshot (overflow heap built at T, consumed at T+100ms after pool churned)
concurrent hardKill + evmAbort set from different goroutines mid-phase

Scope exclusions:

Don't re-flag findings already fixed in 88c4e44..827c017fb.
Don't comment on test coverage (unit tests recent; kurtosis evidence above).
Don't suggest naming / style changes.
Flag only findings reachable in production.

claude · 2026-04-22T22:55:27Z

Claude finished @lucca30's task in 5m 26s —— View job

Ultra-Review: Streaming Prefetcher Concurrency & Correctness

Read core/state_prefetcher.go — worker pool, hardKill, evmAbort, sync.Once shutdown
Read miner/worker.go — runPrefetcher, handoff, runBuilderTxProvider, channel lifecycle
Read miner/ordering.go — clone() and heap interactions
Read core/blockchain.go — compare old Prefetch vs new PrefetchStream wrapper
Analysis 1: Concurrency correctness
Analysis 2: Resource-accounting invariants
Analysis 3: Perf regression on hot import path
Analysis 4: Failure modes
Post findings

All prior review findings (88c4e44..827c017) verified as addressed. This review covers only the four scoped areas. One actionable finding; remainder is clean.

1. Concurrency Correctness

Traced all interactions between hardKill, evmAbort, inBuilderPhase, sync.Once-shutdown, close(txsCh), and the worker pool.

Lost-wakeup between evmAbort.Store(true) and drainTxChan: Not possible. Both workers and drainTxChan consume from txsCh concurrently — Go channels are thread-safe. Every consumed item during evmAbort=true is correctly skipped by workers (line 188-189) or drained (line 2421-2428). No idle items are produced between drain and evmAbort.Store(false) because runIdleTxProvider has already returned. No builder items are produced yet because runBuilderTxProvider hasn't started. ✅

Double-close of txsCh: Prevented by sync.Once (line 2381-2387). The shutdown closure wraps close(txsCh) inside shutdownOnce.Do. The explicit shutdown() at line 2414 and the deferred shutdown() at line 2388 both call the same Once-guarded function. ✅

Goroutine stall ordering — hardKill=false, evmAbort=true, closed txsCh: This combination occurs during the handoff's shutdown sequence (line 2383-2386: evmAbort.Store(true); close(txsCh)). Workers in runWorker (line 184) are in for tx := range s.txsCh. When txsCh closes, range returns zero-value and terminates the loop — workers exit regardless of evmAbort state. No stall. ✅

Panic during handoff that skips sync.Once: If runIdleTxProvider or runBuilderTxProvider panics, the deferred shutdown() at line 2388 fires, closing txsCh via sync.Once. Workers drain, pool.Wait() returns, PrefetchStream completes, and the PrefetchStream goroutine self-terminates. The <-streamDone at line 2415 is not reached on the panic path, but this is acceptable — the PrefetchStream goroutine holds no persistent resources and self-resolves within microseconds. ✅

Concurrent hardKill + evmAbort from different goroutines: Both are atomic.Bool — no data race. Workers check hardKill first (line 185), which takes priority. If both are true, worker returns immediately. ✅

No concurrency issues found.

2. Resource-Accounting Invariants

(a) `txIndex == valid + invalid` on every `PrefetchStream` exit

Traced every path through processTx and runWorker:

Path	txIndex incremented?	fails incremented?	Counted?
`processTx` → `prefetchOneTx` succeeds	Yes (line 196)	No	valid
`processTx` → `prefetchOneTx` interrupt	Yes (line 196)	Yes (line 233)	invalid
`processTx` → `prefetchOneTx` error (any)	Yes (line 196)	Yes (lines 239, 247, 258)	invalid
`processTx` → worker panic in `prefetchOneTx`	Yes (line 196)	Yes (recover at line 137)	invalid
`runWorker` → `evmAbort=true` → `continue`	No	No	not counted (correct)
`runWorker` → `hardKill=true` → `return`	No	No	not counted (correct)

At stream exit (line 147-149):

processed := ctx.txIndex.Load()
blockPrefetchTxsValidMeter.Mark(processed - ctx.fails.Load())
blockPrefetchTxsInvalidMeter.Mark(ctx.fails.Load())

valid + invalid = (processed - fails) + fails = processed = txIndex. Invariant holds on all exit paths. ✅

(b) No tx hash reaches `txsCh` twice in a single `runBuilderTxProvider` invocation

Three dedup layers:

collectPlanBatch (line 2665-2666): checks prefetchedHashes.Load(tx.Hash()) — skips txs already completed by idle/prior builder work.
sentThisPhase pre-populated (lines 2600-2602) with current plan-batch hashes before scanOverflow runs — prevents within-iteration plan-vs-overflow duplicates (fixed in 62a72c0).
forwardTxs (line 2624) records each successfully-sent tx in sentThisPhase — prevents cross-iteration re-emission.

The one gap: forwardTxs drops on a full txsCh buffer (default branch, line 2626-2627), but the pre-populate at lines 2600-2602 already added the hash to sentThisPhase unconditionally. A dropped plan tx still has a sentThisPhase entry, so it won't be re-emitted by scanOverflow in future iterations. However, a dropped bonus tx from scanOverflow was h.Shift()-ed out of the overflow heap (line 1950) and its hash was not pre-populated in sentThisPhase (only plan txs are pre-populated). If it was added to sentThisPhase by forwardTxs on the successful-send branch, it's tracked. If it was dropped, it's gone from the heap and not in sentThisPhase — it can't reappear. So no hash can reach txsCh twice. ✅

(c) `builderPrefetchedTxHashes` is only written while `inBuilderPhase==true`

onSuccess at line 2363: if inBuilderPhase.Load() && .... inBuilderPhase is set to true at line 2406, strictly after evmAbort.Store(false) at line 2403. The residual edge case (worker completing IntermediateRoot during the handoff, acknowledged in comments at lines 2351-2356) could misattribute at most 1 tx per block. Documented and acceptable. ✅

3. Perf Regression on Hot Import Path

The import path (blockchain.go:782) calls Prefetch(block, throwaway, vmCfg, false, followupInterrupt), which now wraps PrefetchStream via a pre-filled channel.

Strictly new work per tx on the import path vs the old errgroup code:

1 channel send (non-blocking, buffer == len(txs)) + 1 channel receive per tx
1 txIndex.Add(1) atomic increment (old code used loop variable i directly)
2 nil-checks (evmAbort, onSuccess) that short-circuit immediately since both are nil

Total overhead: ~2-5 nanoseconds per tx. Negligible against statedb.Copy() + ApplyMessage + IntermediateRoot (~1-10ms each).

Structural improvement: Old code called statedb.Copy() in the dispatching goroutine before spawning work. New code calls statedb.Copy() inside the worker goroutine (prefetchOneTx line 244), allowing copies to happen concurrently — slightly lower latency under contention.

Worker pool size: Identical — max(1, 4*runtime.NumCPU()/5) in both old and new.

No performance regression on the import path. ✅

4. Failure Modes

`builderPlanCh` closed before `runBuilderTxProvider` starts

builderPlanCh is created at line 2223 and closed at line 2274, both in buildAndCommitBlock (main goroutine). runBuilderTxProvider runs in the prefetcher goroutine. The ordering: builderStarted.Store(true) (line 2225) → runIdleTxProvider exits (sees builderStarted=true) → handoff → runBuilderTxProvider starts.

Can close(builderPlanCh) at line 2274 fire before runBuilderTxProvider starts? Yes — if fillTransactions + planWg.Wait() completes before the prefetcher goroutine reaches line 2408. In that case, collectPlanBatch reads from a closed planCh and gets builderDone=true immediately (line 2660). runBuilderTxProvider returns immediately. Any plan txs were buffered in the 4096-deep channel and consumed; any freed-gas signals were buffered in the 256-deep builderGasFreedCh. These are lost. This is acceptable: the entire block was already built; prefetching for an already-committed block is pointless. ✅

Empty txPool at `buildOverflowHeap` time

buildOverflowHeap (line 2635-2641) calls Pending() which returns an empty map. newTransactionsByPriceAndNonce with empty txs returns an empty heap. scanOverflow immediately breaks at line 2917 (ltx == nil). No bonus txs emitted. No crash, no wasted work. ✅

Stale pool snapshot (overflow heap built at T, consumed at T+100ms after pool churned)

buildOverflowHeap is called once at the start of runBuilderTxProvider (line 2576). The overflow heap is then scanned repeatedly over the 2ms batch windows. Pool churn means:

Txs in the overflow heap may have been evicted from the pool → ltx.Resolve() returns nil → h.Pop() at line 1944, skipped gracefully.
New txs arriving after the snapshot are invisible to the overflow scan → missed opportunity, but prefetch is best-effort. The plan channel covers the primary path (builder's own view).

No crash, no incorrect behavior. Stale snapshot is a deliberate design tradeoff acknowledged in the PR. ✅

Concurrent `hardKill` + `evmAbort` set from different goroutines mid-phase

Already covered in Analysis 1. Both are atomic.Bool; no ordering dependency. Workers prioritize hardKill. The shutdown() function sets evmAbort=true then close(txsCh), both inside sync.Once. If hardKill=true races with shutdown(), workers exit on whichever signal they see first. ✅

🟡 One Remaining Unfixed Finding

`prepareWork` early-return bypasses `defer interruptPrefetch.Store(true)` — zombie prefetcher on `prepareWork` failure

File: miner/worker.go lines 2209-2228

func (w *worker) buildAndCommitBlock(..., interruptPrefetch *atomic.Bool) {
    work, err := w.prepareWork(genParams, w.makeWitness)
    if err != nil {
        return  // ← line 2212: exits BEFORE the defer at line 2228 is registered
    }
    ...
    defer interruptPrefetch.Store(true)  // ← line 2228: too late

The prefetcher goroutine (spawned unconditionally at line 2191) is never signaled to stop. It runs runIdleTxProvider which loops until gas exhaustion (totalGasPool.Gas() == 0), burning through the full header gas budget (~100M gas at default PrefetchGasLimitPercent=100) across 4*NumCPU/5 workers doing throwaway EVM executions.

This was flagged in the previous review round but not fixed.

Why it's bounded, not a leak: runPrefetcher's deferred shutdown() (via sync.Once) ensures txsCh is closed once runIdleTxProvider returns, so the PrefetchStream goroutine and workers self-terminate. No goroutine leak — just wasted CPU.

Production trigger: prepareWork → engine.Prepare can fail via UnauthorizedSignerError, Heimdall snapshot/span lookup failures, or makeEnv → stateless.NewWitness() errors. On a Giugliano primary producer, Prepare includes the succession-delay time.After — but the runPrefetcher call to makeHeader(..., false) skips this wait and starts streaming immediately, so the zombie is active for the full duration of the wasted work.

Impact: ~100M gas of throwaway EVM work across ~6 workers per prepareWork failure. No correctness/consensus impact, but real CPU waste on an uncommon-but-realistic error path. If prepareWork fails repeatedly (e.g., prolonged Heimdall issues), zombie prefetchers from successive commitWork calls compete for CPU.

Fix: Move the defer before the err check:

func (w *worker) buildAndCommitBlock(..., interruptPrefetch *atomic.Bool) {
    defer interruptPrefetch.Store(true)  // ← moved up: covers ALL exit paths
    work, err := w.prepareWork(genParams, w.makeWitness)
    if err != nil {
        return
    }

Fix this →

Summary

Area	Verdict
Lost-wakeup in handoff	Clean
Double-close of txsCh	Protected by sync.Once
Goroutine stall ordering	Clean — channel close unblocks all paths
Panic during handoff	sync.Once defer covers it; PrefetchStream goroutine self-terminates
`txIndex == valid + invalid`	Holds on all 6 exit paths (verified per-path)
No-dup in `runBuilderTxProvider`	Three dedup layers cover all reachable cases
`builderPrefetchedTxHashes` attribution	Correctly gated; 1-tx residual acknowledged
Import path perf	~2-5ns overhead per tx; `statedb.Copy()` moved to worker (slight improvement)
`builderPlanCh` closed early	Graceful — `collectPlanBatch` sees closed channel, exits
Empty pool	Graceful — empty heap, no work
Stale snapshot	By design — `Resolve()` nil-check handles evicted txs
Concurrent kill+abort	Atomic stores, no ordering dependency, workers prioritize hardKill
prepareWork early-return	Unfixed — zombie prefetcher burns CPU until gas exhaustion

The concurrency design is sound. The sync.Once shutdown, two-signal interrupt model, and channel lifecycle are correctly composed. The one actionable item is the defer placement in buildAndCommitBlock — a one-line move.
| Branch: lmartins/lmartins/builder-prefetch-mode-switch

Move `defer interruptPrefetch.Store(true)` to the top of buildAndCommitBlock so the prefetcher goroutine is signaled to stop on every return path — including the early return when prepareWork fails (UnauthorizedSignerError, Heimdall snapshot/span failures, stateless.NewWitness errors). Previously the defer sat after the err check, so on prepareWork failure runIdleTxProvider would loop until gas exhaustion, burning ~100M gas of throwaway EVM work across the worker pool before self-terminating. No correctness/consensus impact; fixes a real CPU-waste path on an uncommon-but-realistic error branch. Bounded by PrefetchStream's sync.Once shutdown (no goroutine leak), but the wasted work was material under prolonged Heimdall issues when consecutive commitWork calls could pile up zombie prefetchers.

Pass sentThisPhase into collectPlanBatch and skip txs already present, mirroring the existing scanOverflow dedup pattern. Closes the third edge of the builder-phase dedup matrix: 1. within-iter plan↔overflow — covered by pre-populate sentThisPhase before scanOverflow (commit 62a72c0) 2. plan→overflow cross-iter — covered by scanOverflow's sentThisPhase read (commit 88c4e44) 3. overflow→plan cross-iter — this commit Scenario: scanOverflow emits tx T in iteration N (h.Shift() past T, sentThisPhase[T]=true). Worker W1 begins multi-ms EVM on T. In iteration N+1, collectPlanBatch reads a buffered copy of T from planCh; prefetchedHashes is still empty because onSuccess hasn't fired, so T slips through and gets forwarded a second time. Impact is wasted worker capacity only (throwaway state, no consensus/correctness effect), but the race opens wider exactly on contract-heavy workloads where builder-phase prefetch matters most. Kurtosis run #2 scenario E didn't catch this because value-transfer EVM finishes in ~10µs, well inside the 2ms batch window — so prefetchedHashes raced fast enough to plug the gap. Added TestCollectPlanBatch_SkipsInflight as the regression test.

- hoist prefetch tunables (chan buf, idle loop interval, gas-pct default/cap) into the file's top const block - rename sentThisPhase to inFlightHashes — the variable tracks txs forwarded on txsCh whose onSuccess hasn't fired yet - promote sendPlan from a closure inside fillTransactions to a free function alongside buildTxPlan

The streaming prefetcher's per-tx IntermediateRoot call (introduced via the intermediateRootPrefetch flag, hardcoded true) was found to add 80–130% prefetch wall time for at most ~10% commit speedup (≈0.1 ms in absolute terms). With snapshots active, EVM reads bypass the trie entirely, so the only path to value is warming pebble's block cache for the subsequent Commit. Under realistic clean-cache sizes that working set is already resident, leaving the warming work redundant in nearly every regime. Upstream go-ethereum's prefetcher likewise does not compute intermediate roots — its Prefetch is (block, statedb, cfg, interrupt) only. Flips the call site to false (parameter retained for API stability) and checks in three Go benchmarks documenting the methodology and findings: - TestIntermediateRootPrefetch_AccuracyVsCost: in-memory baseline, three access patterns (hot 3-slot, unique-spread, per-sender counter) - TestIntermediateRootPrefetch_PebbleAccuracyVsCost: real pebble disk with prewarm + Commit, hashdb meter deltas - TestIntermediateRootPrefetch_HeavyColdContract: 100k pre-populated storage slots with a constrained clean cache, modelling fat contracts that haven't been touched recently Across all three, flag=true produced zero cache-hit benefit measurable in StorageHitFromPrefetch / clean cache deltas, with consistent prefetch CPU overhead. The residual ~10% commit speedup observed in the heavy-cold regime is attributable to pebble block-cache warming and amounts to ≈0.13 ms per affected block.

…lict # Conflicts: # miner/worker_test.go

claude

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

…lict # Conflicts: # miner/worker.go

claude

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

Extract resolveEvmInterrupt helper so the evmAbort/hardKill fallback wiring is unit-testable. Add focused unit tests for preloadReaderForTx (bad-signature, contract-To code preload, EOA/nil-To) and streamIdleBatch (per-loop gas budget). Brings T1 logic and T2 semantic mutation kill rate to 100% on the diff.

claude · 2026-05-19T18:32:34Z

Code review

Found 1 issue. Checked for bugs and CLAUDE.md compliance.

Latent nil-dereference in block production path (miner/worker.go:1963-1975)

Line 1970 nil-checks genParams before accessing prefetchedTxHashes, but line 1973 unconditionally dereferences genParams.planWg.Add(1). If sendPlan is ever called with a non-nil builderPlanCh but nil genParams, the node panics in the block production goroutine.

Currently unreachable: both call sites in fillTransactions (line 2112, line 2122) derive builderPlanCh from genParams.builderPlanCh at line 2091, which guarantees genParams != nil. But the inconsistent guard is misleading about the function contract and will cause a panic if a future caller doesn't hold this invariant.

Per security-common.md: No panics in consensus, sync, or block production paths.

Suggested fix -- add genParams to the nil guard on line 1964:

if builderPlanCh == nil || genParams == nil || plainTxs == nil {

Drops a redundant uint64() conversion in the streamIdleBatch test and trims the trailing blank line goimports flagged.

The function nil-checked genParams before reading prefetchedTxHashes but then unconditionally dereferenced genParams.planWg.Add(1). Currently unreachable — callers in fillTransactions derive builderPlanCh from genParams.builderPlanCh so genParams is always non-nil — but the inconsistent guard is misleading and would panic on the block-production path if a future caller drops that invariant.

sonarqubecloud · 2026-05-19T19:51:05Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.0% Duplication on New Code

See analysis details on SonarQube Cloud

* metrics: fix ResettingSample Prometheus _count monotonicity (#2174) * metrics: fix ResettingSample Prometheus _count monotonicity * use atomic count and remove cumulative sum in resettingSample * metrics: add Clear method for resetting sample and add more tests * metrics: avoid cloning slice --------- Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * ethapi: include pre-Madhugiri state-sync logs in bor_getLogs (#2155) * ethapi: include pre-Madhugiri state-sync logs in bor_getLogs * test: directly validate getBlockAndReceipts state-sync behavior * security: harden gRPC server — disable by default, bind to localhost (#2078) * security: harden gRPC server - disable by default, bind localhost - Add 'Enabled' field to GRPCConfig (default: false), requiring explicit --grpc.enabled flag to start the gRPC server - Change default bind address from 0.0.0.0:3131 to 127.0.0.1:3131 - Guard gRPC server startup with Enabled check in NewServer() - Add --grpc.enabled CLI flag in flags.go - Fix GetGrpcAddr() to use net.SplitHostPort instead of string slice - Update docs and test helper for compatibility The gRPC server currently starts unconditionally on all interfaces (0.0.0.0:3131) with no authentication, no TLS, and no way to disable it. This is inconsistent with HTTP-RPC and WS-RPC which are disabled by default. An attacker with network access can invoke sensitive RPCs including ChainSetHead (reorg), PeersAdd/Remove (eclipse attacks), and StatusBorStatus (reconnaissance) without credentials. * remove grpc.enabled flag, simplify changes * internal/cli/server: update default toml for tests --------- Co-authored-by: harshinsecurity <hi@harshinsecurity.in> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * eth: stop dialing before waiting on handler shutdown (#2158) * core/state: fix GetWriteMapDump write-set output (#2156) * metrics, build: fix time.Tick resource leaks and oss-fuzz shebang (#2145) * metrics, build: fix time.Tick resource leaks and oss-fuzz shebang Two fixes inherited from upstream go-ethereum: 1. metrics: replace time.Tick() with time.NewTicker() + defer Stop() in 6 monitoring goroutines. time.Tick() leaks ticker resources (SA1015). Files: debug.go, json.go, log.go, opentsdb.go, syslog.go, writer.go. 2. oss-fuzz.sh: fix missing '!' in shebang (#/bin/sh → #!/bin/sh) of generated fuzz test runner scripts. Both fixes have been accepted upstream: - ethereum/go-ethereum#34044 (shebang, merged) - ethereum/go-ethereum#34046 (time.Tick, open) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: remove time.Tick changes per maintainer request Go 1.23+ garbage-collects unreferenced tickers created by time.Tick, making the NewTicker replacement unnecessary. Revert to stay close to upstream go-ethereum and avoid merge conflicts. The oss-fuzz.sh shebang fix is retained as it is independent. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * build: bump Go from 1.26.1 to 1.26.2 to fix stdlib CVEs (#2189) Addresses 6 vulnerabilities in the Go standard library fixed in go1.26.2: - GO-2026-4947: Unexpected work during chain building in crypto/x509 - GO-2026-4946: Inefficient policy validation in crypto/x509 - GO-2026-4866: Case-sensitive excludedSubtrees auth bypass in crypto/x509 - GO-2026-4870: Unauthenticated TLS 1.3 KeyUpdate DoS in crypto/tls - GO-2026-4869: Unbounded allocation for old GNU sparse in archive/tar - GO-2026-4865: JsBraceDepth XSS in html/template * triedb/pathdb: fix exponential memory growth in preload BFS (#2187) * triedb/pathdb: replace exponential BFS with trie-aware child decoding in preload The preload BFS previously called gatherChildPaths which always enqueued all 16 nibble children for every node regardless of node type. This caused the BFS queue and visited map to grow as O(16^depth), leading to visited maps of ~14 GB per goroutine for large storage tries (8 GB caches). On machines with limited RAM this caused severe swapping that stalled block import for the duration of the preload (~2 hours for the largest caches). Replace with decodeChildPaths which decodes the RLP-encoded trie node and returns only the actual children: - Branch nodes (17 elements): enqueue slots with non-nil children only - Extension nodes (2 elements, no terminator): enqueue the single child path using the decoded compact key nibbles - Leaf nodes (2 elements, with terminator): no children Because MPT child paths are strictly longer than the parent path, the traversal is cycle-free without a visited set. Remove the visited map entirely. Queue size is now O(trie width) rather than O(16^depth). Add compactKeyToNibbles helper and replace the gatherChildPaths test with comprehensive tests covering all node types, the round-trip compact encoding, and a cycle-free proof via a 5-node trie where all nodes must be loaded exactly once. * triedb/pathdb: address PR review feedback on biased cache preload - Change nil-child check from exact 0x80 match to len <= 1 for robustness against non-canonical RLP encodings (both branch and extension nodes) - Make decodeChildPaths a package-level function (receiver was unused) - Pre-allocate nibbles slice in compactKeyToNibbles to avoid reallocations - Replace time.Sleep with cache.wg.Wait() in two tests for deterministic sync * triedb/pathdb: fix legacy rate-limit tests to use valid RLP trie nodes TestAddressBiasedCache_RateLimitInterruption and ShutdownDuringRateLimitWait were silently passing without exercising their stated scenario: both used raw non-RLP bytes as the root node, causing decodeChildPaths to return nil immediately and the rate limiter to never block. Rewrite both tests to use valid branch/leaf RLP nodes so the BFS actually traverses children and exhausts the 64KB burst, making WaitN genuinely block before Close() is called. * build(deps): bump go.opentelemetry.io/otel/sdk from 1.40.0 to 1.43.0 (#2182) Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * eth/fetcher: refactor witness_manager to reduce complexity (#2188) * eth/fetcher: refactor witness_manager to reduce complexity and close test gaps Structural refactor of witness_manager.go plus targeted tests, guided by diffguard (https://github.com/0xPolygon/diffguard). Max cognitive complexity 44 → 10; Tier 1 / Tier 2 mutation score 100%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: add diffguard workflow for PR quality metrics Runs diffguard (https://github.com/0xPolygon/diffguard) on every PR to report cognitive complexity, function/file sizes, dependency structure, churn-weighted complexity, and mutation testing at 10% sample rate against the base branch. Text report goes to the step summary (visible in the PR check UI) and a JSON artifact is retained for 14 days for offline triage. Uses the 16-core bor runner to keep mutation turnaround bounded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * eth/fetcher: fix inverted condition in armTimerChan The refactor introduced an inverted condition where armTimerChan returned a nil timer channel when pending requests existed, preventing tick() from ever firing through the timer path. The retry loop only appeared to work because rescheduleWitness also pokes pokeCh. Reported in PR #2188 review. Adds TestWitnessLoopDrivesFetchesForPending to guard against regressions — it exercises the full loop→tick→ fetchWitness pipeline via real channels, which the existing TestLoop did not. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * eth/fetcher: fix distance check drift and test nil derefs Three issues from PR #2188 code review: - handleNeed's distance check was accidentally tightened from `dist < -maxUncleDist` to `<=` during the refactor, introducing a boundary mismatch with block_fetcher.go (which uses `<`). A block at exactly dist == -maxUncleDist would be accepted by block_fetcher but dropped by witness_manager, potentially importing blocks without their witnesses. Reverted to match block_fetcher. - TestWitnessHandleFilterResultSkipsAlreadyPending dereferenced `m.pending[hash].op.origin` inside the nil-check branch, panicking on the exact mutation it was meant to catch (pending entry removal). Now fails cleanly with t.Fatal when entry is nil. - Same nil deref pattern in TestWitnessCheckCompletingSkipsAlreadyPending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * all: fix nightly race test flakes (#2193) The nightly-race workflow had been failing on every run for a month. Most failures were either race-detector overhead pushing test timings past narrow assertion margins, tests mutating shared global state that collided with parallel tests, or a single real data race that cascaded into many victim tests. internal/ethapi: testBackendWithPreMadhuguriBorReceipt.ChainConfig() did a shallow copy of params.AllEthashProtocolChanges, leaving cfg.Bor aliased to the global BorConfig, then mutated MadhugiriBlock on it. Concurrent tests calling IsMadhugiri raced with the write. Deep-copy BorConfig like the sibling testBackendWithNilBorTx already does. This single race was the root cause of the ~10 cascading ethapi test failures (TestBorForks, TestBorGetLogs_*, TestCoinbase, TestEstimateGas, etc.). consensus/bor/heimdall: TestFailover_SwitchOnPrimaryDown and TestRegistry_MarkUnhealthyOnRealFailure set only getSpanFn on the primary mock, leaving FetchStatus returning success. The registry's background probe then raced with the test's MarkUnhealthy call and could flip the primary back to healthy (or the active gauge back to 0) before the assertion ran. Make the primary mock fail FetchStatus too so probe and API are consistent. core/state: TestConcurrentUsedParallelism measures wall-clock parallel speedup and asserts >=2x. Race instrumentation serializes atomic/mutex ops and skews the measurement to ~1.7x. Skip under -race via a new race_{on,off}_test.go build-tag pair; the test still guards against the global-lock regression in non-race runs. core/txpool/legacypool: TestLockOrdering_{PricedHeapNoDeadlock, ReplacePendingNoDeadlock,RemovedNoDeadlock} used a 10s deadlock-detect timeout that is genuinely too short once -race instrumentation is added (the bare test takes ~12s). Bump to 60s — still catches real deadlocks, no longer fires on legitimate completion. eth/downloader: TestBeaconSync68/69Full used a 3s sync timeout that CI routinely exceeded under -race. Bump to 30s. TestSkeletonSyncRetrievals had four 2s polling loops that could exit before background serving goroutines finished incrementing the served counter (the assertion then read a partial count). Bump the budget to 30s, cap per-iteration sleep at 500ms so exponential backoff stays responsive, and fold the served counter into the polling condition so we wait for both subchain state and served totals before asserting. eth/relay: TestCheckTxStatus, TestSubmitPreconfTx, TestSubmitPrivateTx parallel subtests had handlers sleeping for rpcTimeout-100ms, leaving only 100ms of slack before the client-side 2s timeout fired; under -race that slack was easily exhausted. Halve the handler sleep to rpcTimeout/2 — still proves parallelism (3 serial calls would exceed the unchanged 2s upper bound) with comfortable margin for -race. metrics: TestExpDecaySampleNanosecondRegression relied on the unseeded global RNG and wall-clock timing, making the reservoir's average a noisy statistic that occasionally drifted outside [14, 16] under -race. Drive the test with a seeded RNG and synthesised monotonic timestamps; override t0/t1 after NewExpDecaySample so dt in update() stays positive. Test still catches the priority-overflow regression (average would stick at 10 under the buggy formula). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge pull request #2191 from 0xPolygon/psp-enable-wit-fs internal/cli: enabled witness filestore by default * eth: cancel milestone subscriber on shutdown (#2199) * Merge pull request #2175 from 0xPolygon/manav/disable-pending-loop (feat): disable pending block creation loop via flag * rawdb: recover witness size from filesystem metadata miss (#2200) * ci: fix kurtosis setup/cleanup action paths (#2223) kurtosis-pos moved the setup and cleanup composite actions under .github/actions/kurtosis/, so the bor workflows fail with "Can't find action.yml" at the Pre/Post kurtosis run steps after checkout. Update both kurtosis-e2e.yml and kurtosis-stateless-e2e.yml to use ./.github/actions/kurtosis/setup and ./.github/actions/kurtosis/cleanup. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fixing amoy static peers (#2209) * core/types: guard (*Header).GetValidatorBytes against short Extra (#2222) * core/types: guard (*Header).GetValidatorBytes against short Extra (*Header).GetValidatorBytes panics with 'slice bounds out of range' when len(h.Extra) < ExtraVanityLength+ExtraSealLength on pre-Cancun chain configs. The post-Cancun branch and companion helpers already guard this path; hoist the check to the top so both branches are safe. Add TestGetValidatorBytesShortExtra as regression test. Discovered with Zorya (concolic execution). Fixes #2221 * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * chore: pos workspace setup (#2218) * miner, consensus: more block building metrics (#2228) * consensus/bor, miner: fix stale pending state for non-validator nodes (#2183) * consensus/bor, miner: fix stale pending state for non-validator nodes Nodes whose signer is configured but not in the active validator set (e.g. after a validator rotation) returned "insufficient funds for transfer" on eth_estimateGas and other RPC calls against the "pending" block. The root cause was that Prepare() returned UnauthorizedSignerError when the signer was not in the validator set, which caused prepareWork() to fail on every block. Since commit() was never called, the pending snapshot was never refreshed, and its underlying pathdb trie layers were eventually garbage-collected — making all state reads return zero. Fix Prepare() to not fail for unauthorized signers. Instead, default the succession number to 0 and let the block building pipeline continue normally. Seal() already independently rejects blocks from unauthorized signers, so no invalid blocks can be produced. This ensures the pending snapshot is refreshed on every chain head, keeping RPC queries functional regardless of whether the node is actively producing blocks. Also remove a redundant second call to GetSignerSuccessionNumber in the Giugliano waitOnPrepare block, reusing the succession value already computed earlier in Prepare(). * fix linter * Merge pull request #2229 from 0xPolygon/psp-pos-3598 docs, .claude: add hardfork rollout review guidance * build: bump Go to 1.26.3 (#2230) * consensus/bor, internal/cli: full grpc implementation (#2194) * implement full grpc comms with heimdall * address comments * address comments * address comments * address comments * address comments * revert tracers/data.csv * internal/cli, docs: skip gRPC on empty addr, warn unauth non-loopback * internal/cli: address comments * internal/cli: address comment * internal/cli: implement additional tests * internal/cli: address comments and improve tests/validation * internal/cli: address comments and add new tests * internal/cli/server: map reorg & non-contiguous range sentinels; tidy cmd/keeper go.sum mapBorAPIError now translates the two GetRootHash sentinels that previously fell through to codes.Internal: - "reorg occurred while computing checkpoint root" -> codes.Aborted (transient/retriable; clients can retry once the chain settles) - "non-contiguous headers in checkpoint range" -> codes.DataLoss (DB inconsistency / ancient pruning; conveys unrecoverable server state without the "this is a bug" implication of Internal) Also runs go mod tidy in cmd/keeper to add the missing golang.org/x/sys v0.42.0 h1: source hash that the previous bump left out. * internal/cli: add maxGRPCMessageSize to mirror HTTP behaviour * internal/cli: add nil check on grpc addr. Bump go to fix vulncheck * chore: fix govuln * internal/cli: lower chunkSize * chore: bump polyproto * templates: replace dead witnessprotocol with witness.enable hint (#2204) * templates: replace dead witnessprotocol with witness.enable hint * ci: stabilize codecov/diffguard and skip fork-only e2e constraints * templates: move witness hint and drop workflow edits * prefetcher: builder-phase prefetch + streaming worker pool (#2192) * prefetcher: streaming worker pool with unified idle/builder tx providers Replace per-call Prefetch() invocations with a long-running PrefetchStream that spans the block lifetime. A fixed worker pool pulls transactions from a channel, eliminating pool-startup overhead across the ~500 builder-mode and ~20 idle-mode calls previously issued per block. Idle and builder phases share the same stream, acting as swappable tx providers. Phase handoff uses a two-signal interrupt: hardKill for permanent exit and evmAbort for soft, per-phase aborts that discard in-flight idle work before the builder provider takes over. Prefetch(block, ...) keeps its original signature as a thin wrapper so blockchain.go behavior stays byte-identical. * miner: prefetch observability — slow-tx prefetch flag, split apply timers, builder contribution metric Correlate tail latency with prefetch miss rate so we can tell whether the slowest block txs are the ones we failed to prefetch or whether prefetch quality is not the lever. Three additions: * slow_tx_tracker 10-min log: each of the top-K slowest txs is now annotated inline with MGas/s and a prefetched/not-prefetched flag. MGas/s uses integer math on nanoseconds — per-tx durations are in the tens of microseconds and float seconds lose precision. * worker/txApplyDuration/{prefetched,notPrefetched}: side-by-side p50/ p95/p99 of the two populations. The aggregate worker/txApplyDuration stays for dashboard compatibility. * worker/prefetch/builder_added_percent: share of block txs that the builder phase prefetched, isolating the payoff of the new builder- phase prefetch beyond the existing overall miss-rate metric. * prefetcher: grow stream channel buffer from 512 to 4096 Sizes the buffer to the idle provider's theoretical per-loop burst (IdleGasBudget / 21k ≈ 4762 at default config) so the scan never blocks on a full channel. Workers drain at ~200k tx/s so the channel sits mostly empty in practice; the buffer exists to absorb the brief scan-phase burst where idle dumps a heap's worth of txs. Memory cost: 4096 * 8 B + channel header ≈ 33 KB, negligible. Handoff drain is O(actual-occupancy), not capacity, so a larger buffer doesn't slow phase transitions. * prefetcher: gate builder-phase attribution on post-handoff flag + goimports fixes Review feedback: the old onSuccess closure checked genParams.builderStarted.Load() to decide whether to attribute a successful prefetch to the builder phase. That check races with buildAndCommitBlock's builderStarted.Store(true), which fires before runPrefetcher reaches the handoff — an idle-phase tx whose EVM execution finishes in the gap between those two moments would be miscounted as builder. Impact was metric-only but the old comment overstated the guarantee. Route attribution through a dedicated inBuilderPhase *atomic.Bool that the coordinator flips to true only after the handoff completes (evmAbort drain + reset). Any onSuccess firing after that point is known to come from post-handoff work, so builder_added_percent now reflects genuine builder-phase contribution. Also fix goimports formatting in core/state_prefetcher.go and miner/worker.go flagged by CI lint. * prefetcher: stop idle batch on full channel instead of spinning the heap Review feedback: when the stream channel fills mid-batch, the default branch in streamIdleBatch was dropping the tx but still calling txs.Shift() and continuing to walk the heap. Since dropped sends don't subtract from the gas budget or populate localPrefetched, the outer loop's viability check (nextViableIdleTx) kept returning valid txs, so every remaining entry burned Peek + Shift cycles only to drop. Return on the default branch instead. The outer runIdleTxProvider loop re-snapshots the pool every ~100ms, by which time workers have drained the channel and there's capacity for new sends. Dropping one tx per batch and re-entering the outer loop is strictly cheaper than walking the rest of the heap dropping everything. * prefetcher: nil-guard prefetchedHashes + stop charging gas for failed resolves Review feedback on buildTxPlan / scanOverflow / collectPlanBatch: * buildTxPlan decremented remaining before calling ltx.Resolve(), so when a LazyTransaction failed to resolve (tx evicted between heap listing and resolution) the budget was consumed for a tx that never entered the plan. Reorder: charge gas only after Resolve succeeds, matching scanOverflow. The prefetched-skip path keeps consuming gas intentionally — those txs are still bound for the block. * scanOverflow and collectPlanBatch called prefetchedHashes.Load() without a nil guard, while the sibling buildTxPlan explicitly guarded. Production is safe because commitWork always initializes the map, but the API inconsistency is a real footgun for unit tests or future callers. Add the missing guards so all three functions accept a nil *sync.Map as "no hashes known, don't skip". No functional change in the current call paths; tests unchanged. * prefetcher: recover panics in worker and sendPlan goroutines Review feedback: worker goroutines in PrefetchStream ran without their own recover. Go's recover only catches panics in its own goroutine — the parent runPrefetcher's recover does not protect the children. Any panic in statedb.Copy / ApplyMessage / IntermediateRoot / precompile edge cases would therefore kill the node even though prefetching runs on a throwaway state copy and is best-effort by definition. Add defer/recover per worker (new chain/prefetch/worker/panic meter) and match it on the sendPlan goroutine in fillTransactions (reuses the existing worker/prefetch/panic meter). Also documents the residual attribution edge where a worker stuck in IntermediateRoot (not interruptible by evmAbort) can still reach onSuccess after inBuilderPhase flips. Metric-only impact, bounded by at most one tx per block; added a comment instead of a runtime fix since the window is tiny in practice and closing it would require a more invasive barrier. * prefetcher: fix gas budget, interrupt accounting, panic leak, dup execution Round 4 of review fixes — four independent issues: * sendPlan gasLimit stale for the second call. The closure captured env.header.GasLimit once, but the second invocation (normal-tier txs) runs after commitTransactions has already consumed gas from env.gasPool. Plan was over-sized, wasting prefetch capacity on txs that the builder would reject. Thread gasLimit as an explicit argument and pass env.gasPool.Gas() on the second call (with a nil guard for the first, where env.gasPool is still nil). * prefetchOneTx early-interrupt return skipped fails.Add(1). txIndex is incremented unconditionally in processTx, so every interrupt-aborted tx was counted as a successful prefetch in blockPrefetchTxsValidMeter. The function docstring already promises fails is bumped on every (0,false) return; the interrupt path was the one branch that didn't. * runPrefetcher shutdown (evmAbort + close(txsCh) + <-streamDone) was sequential, so a panic in runIdleTxProvider or runBuilderTxProvider unwound past it. The PrefetchStream goroutine and its N workers would then block forever on `range txsCh` (hardKill is only checked after a dequeue). Wrap the close in a sync.Once-guarded closure and defer it so the channel is released on every exit path, panics included. * Builder phase could double-execute an in-flight plan tx. scanOverflow checked prefetchedHashes, which is only written by onSuccess after EVM completion — a tx forwarded in one batch but still mid-EVM when the next batch's overflow scan ran could be re-emitted and picked up by a second worker. Track locally-sent hashes in runBuilderTxProvider (single- threaded, plain map is fine) and have scanOverflow consult it alongside prefetchedHashes. * prefetcher: address within-iter dup, heap preservation, panic fails - scanOverflow: break instead of Pop on gas-too-large so accumulated extendedBudget can cover price-leading accounts on later iterations; popping permanently evicted them from the long-lived overflow heap. - runBuilderTxProvider: populate sentThisPhase with the current plan batch before scanOverflow so bonus collection can't re-emit a tx already returned by collectPlanBatch in the same window. - PrefetchStream worker recover: increment ctx.fails so the {valid,invalid} meters stay consistent with txIndex when a panic aborts prefetchOneTx mid-flight. * miner: unit tests for prefetch stream primitives Covers the helpers that prior review rounds kept surfacing bugs in: - scanOverflow: zero budget, budget growth preserves popped accounts (regression test for the h.Pop() → break fix), in-flight skip without budget cost, prefetched skip without budget cost. - forwardTxs: sent-hash recording, dropped-on-full channel must not record, nil sentThisPhase is safe. - collectPlanBatch: closed planCh, timer-fires-on-empty, freed-gas accumulation, prefetched filter, closed gas channel nilled out. - runBuilderTxProvider: no-duplicate-forward invariant across interleaved planCh + freed-gas signals — regression test for the within-iteration dedup fix (sentThisPhase pre-population). * prefetcher: signal interrupt on prepareWork early-return Move `defer interruptPrefetch.Store(true)` to the top of buildAndCommitBlock so the prefetcher goroutine is signaled to stop on every return path — including the early return when prepareWork fails (UnauthorizedSignerError, Heimdall snapshot/span failures, stateless.NewWitness errors). Previously the defer sat after the err check, so on prepareWork failure runIdleTxProvider would loop until gas exhaustion, burning ~100M gas of throwaway EVM work across the worker pool before self-terminating. No correctness/consensus impact; fixes a real CPU-waste path on an uncommon-but-realistic error branch. Bounded by PrefetchStream's sync.Once shutdown (no goroutine leak), but the wasted work was material under prolonged Heimdall issues when consecutive commitWork calls could pile up zombie prefetchers. * prefetcher: close scanOverflow→plan cross-iter dedup edge Pass sentThisPhase into collectPlanBatch and skip txs already present, mirroring the existing scanOverflow dedup pattern. Closes the third edge of the builder-phase dedup matrix: 1. within-iter plan↔overflow — covered by pre-populate sentThisPhase before scanOverflow (commit 62a72c0) 2. plan→overflow cross-iter — covered by scanOverflow's sentThisPhase read (commit 88c4e44) 3. overflow→plan cross-iter — this commit Scenario: scanOverflow emits tx T in iteration N (h.Shift() past T, sentThisPhase[T]=true). Worker W1 begins multi-ms EVM on T. In iteration N+1, collectPlanBatch reads a buffered copy of T from planCh; prefetchedHashes is still empty because onSuccess hasn't fired, so T slips through and gets forwarded a second time. Impact is wasted worker capacity only (throwaway state, no consensus/correctness effect), but the race opens wider exactly on contract-heavy workloads where builder-phase prefetch matters most. Kurtosis run #2 scenario E didn't catch this because value-transfer EVM finishes in ~10µs, well inside the 2ms batch window — so prefetchedHashes raced fast enough to plug the gap. Added TestCollectPlanBatch_SkipsInflight as the regression test. * prefetcher: address review feedback (constants, naming, sendPlan) - hoist prefetch tunables (chan buf, idle loop interval, gas-pct default/cap) into the file's top const block - rename sentThisPhase to inFlightHashes — the variable tracks txs forwarded on txsCh whose onSuccess hasn't fired yet - promote sendPlan from a closure inside fillTransactions to a free function alongside buildTxPlan * prefetcher: disable IntermediateRoot per tx; add evaluation benchmarks The streaming prefetcher's per-tx IntermediateRoot call (introduced via the intermediateRootPrefetch flag, hardcoded true) was found to add 80–130% prefetch wall time for at most ~10% commit speedup (≈0.1 ms in absolute terms). With snapshots active, EVM reads bypass the trie entirely, so the only path to value is warming pebble's block cache for the subsequent Commit. Under realistic clean-cache sizes that working set is already resident, leaving the warming work redundant in nearly every regime. Upstream go-ethereum's prefetcher likewise does not compute intermediate roots — its Prefetch is (block, statedb, cfg, interrupt) only. Flips the call site to false (parameter retained for API stability) and checks in three Go benchmarks documenting the methodology and findings: - TestIntermediateRootPrefetch_AccuracyVsCost: in-memory baseline, three access patterns (hot 3-slot, unique-spread, per-sender counter) - TestIntermediateRootPrefetch_PebbleAccuracyVsCost: real pebble disk with prewarm + Commit, hashdb meter deltas - TestIntermediateRootPrefetch_HeavyColdContract: 100k pre-populated storage slots with a constrained clean cache, modelling fat contracts that haven't been touched recently Across all three, flag=true produced zero cache-hit benefit measurable in StorageHitFromPrefetch / clean cache deltas, with consistent prefetch CPU overhead. The residual ~10% commit speedup observed in the heavy-cold regime is attributable to pebble block-cache warming and amounts to ≈0.13 ms per affected block. * prefetcher: kill diffguard mutation survivors Extract resolveEvmInterrupt helper so the evmAbort/hardKill fallback wiring is unit-testable. Add focused unit tests for preloadReaderForTx (bad-signature, contract-To code preload, EOA/nil-To) and streamIdleBatch (per-loop gas budget). Brings T1 logic and T2 semantic mutation kill rate to 100% on the diff. * prefetcher: fix lint (unconvert + trailing newline) Drops a redundant uint64() conversion in the streamIdleBatch test and trims the trailing blank line goimports flagged. * prefetcher: nil-guard genParams in sendPlan The function nil-checked genParams before reading prefetchedTxHashes but then unconditionally dereferenced genParams.planWg.Add(1). Currently unreachable — callers in fillTransactions derive builderPlanCh from genParams.builderPlanCh so genParams is always non-nil — but the inconsistent guard is misleading and would panic on the block-production path if a future caller drops that invariant. * core/vm: add hardfork precompile continuity checks (#2234) * core/vm: add hardfork precompile continuity checks * docs: clarify Pro precompile continuity guidance * packaging/docs: replace deprecated rpc.batchlimit key (#2216) * packaging/docs: replace deprecated rpc.batchlimit key * packaging/docs: set rpc.batch-request-limit default to 1000 * p2p: add flag to disable snap/1 serving (#2152) * flag to disable snap serving * p2p: add clarifying comment on NoSnapServing behaviour Document that NoSnapServing=true disables snap/1 entirely (no serving and no snap-syncing from peers), while the in-memory snapshot tree remains active for local state reads. * miner, consensus/bor: fix leaked-wedge family in worker state machine (#2220) * miner, consensus/bor: fix leaked-wedge family in worker state machine Closes four silent stall paths in the producer state machine 1. miner.mainLoop: when PeerCount==0 on production chains, the dropped newWorkReq branch now clears pendingWorkBlock instead of leaking it. 2. miner.commitWork: the defer that clears pendingWorkBlock is now registered above the early syncing-check return. 3. miner.taskLoop: interrupt() now deletes the previous sealhash's pendingTasks entry. Bor.Seal's stop-branch returns silently without posting to resultCh, so resultLoop never cleans the entry. 4. consensus/bor.Seal: the result-delivery goroutine's second select no longer has a silent default branch — it blocks on send or exits via <-stop. Regression tests cover each path (four unit + one integration): - TestMainLoopClearsPendingWorkBlockOnPeerCountZero - TestCommitWorkLeaksPendingWorkBlockWhenSyncing - TestTaskLoopInterruptCleansStalePendingTasks - TestSeal_BlocksOnFullResultChannelInsteadOfSilentDrop - TestProducerRecoversAfterMiningRestart (tests/bor, integration tag) * miner: gate PeerCount check on heimdall presence, not chain-ID The previous chain-ID gate (BorMainnet/Mumbai/Amoy) blocked the PeerCount==0 drop from being exercised by kurtosis chaos tests on chain ID 4927. Replacing it with chain-ID list omissions broke dev/test setups (Clique single-node, intentional-disconnection tests like TestValidatorWentOffline) where peer count is legitimately zero by design. Gate on heimdall presence instead. The HeimdallClient is nil on all test/dev setups (--bor.withoutheimdall, Clique, Ethash) and non-nil for any real-network Bor node, so the semantic 'this is a real network node' is captured directly. Forward-compatible: new production chain IDs don't need to be added to a list. Also restore the !IsRio removal in the veblop fallback: the post-Rio kurtosis bootstrap needs the periodic retry trigger after the PeerCount drop. Gate the fallback on !isBor instead so Clique/Ethash don't accidentally fire it. Fixes CI failures introduced by the prior commit: - TestServer_DeveloperMode (internal/cli/server) - TestCommand_DebugBlock (internal/cli) - TestValidatorWentOffline (tests/bor, integration) * miner: fix data race on pendingTasks length in stall warning Caller in newWorkLoop's veblopTimer case captured len(w.pendingTasks) under pendingMu, but the helper warnIfStalled read len(w.pendingTasks) again unguarded in its log statement. Concurrent taskLoop / resultLoop mutations would race with the read. Capture the count once under the lock and pass it through to the helper instead. Also drop the redundant DevFakeAuthor=false write from TestMainLoopClearsPendingWorkBlockOnPeerCountZero — NewFakeBor already constructs with DevFakeAuthor=false, and writing after mainLoop has captured the value at startup is itself a data race that -race flagged. Reported in PR #2220 review. * miner, tests/bor: update regression-test docstrings to match heimdall-gate The PeerCount==0 fix was originally chain-ID-gated (BorMainnet/Mumbai/Amoy) and the test docstrings reflected that. After review the gate was changed to heimdall-presence (`bor.HeimdallClient != nil`), but the docstrings were not updated. This commit corrects the inaccuracies: - TestMainLoopClearsPendingWorkBlockOnPeerCountZero now correctly describes the heimdall-presence trigger and notes that the test uses the mock heimdall client (non-nil) from NewFakeBor — no ChainID override exists. - TestProducerRecoversAfterMiningRestart now correctly explains that Bug 1's drop branch isn't exercised because the test uses withoutHeimdall=true (HeimdallClient nil), not because of chain-ID. * miner, consensus/bor: scope pendingTasks cleanup to Seal stop-branch exits The previous taskLoop fix made interrupt() unconditionally call deletePendingTask(prev), which races with Bor.Seal's success path: when the goroutine has already delivered the result to resultCh but resultLoop is busy, interrupt() deletes the entry resultLoop is about to look up, and the !exist branch silently drops the validly-sealed block. Move cleanup into Bor.Seal itself, gated to stop-branch exits, via a new SealWithStopHook(..., onStopExit func()) method. The existing Seal becomes a thin wrapper passing nil to preserve consensus.Engine. taskLoop type-asserts to *bor.Bor and uses SealWithStopHook with a per-sealhash cleanup closure. interrupt() now only closes stopCh. Adds three bor-level tests covering both stop-branch exits and the success path, and rewrites the worker-level test to assert the new contract: interrupt() must NOT delete pendingTasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * miner, tests/bor: update docstrings for SealWithStopHook cleanup direction Two docstrings still described the pre-revision design where interrupt() deleted pendingTasks directly: - tests/bor/bor_test.go bullet #3 named a unit test (TestTaskLoopInterruptCleansStalePendingTasks) that no longer exists; the actual test is TestTaskLoopInterruptPreservesPendingTasks asserting the opposite semantic. - miner/worker.go deletePendingTask docstring said "Used by taskLoop.interrupt" but the caller is the per-task onStopExit closure passed to Bor.SealWithStopHook. Documentation-only. * miner: drop PeerCount==0 gate in mainLoop (#2238) * miner: drop PeerCount==0 gate in mainLoop The peer-count drop branch in `mainLoop` was introduced in #977 to delay block production until at least one peer was connected, scoped to BorMainnet/Mumbai (Amoy was added later in e8157bbe6). #2220 widened the gate from a chain-ID list to "any heimdall-connected node", which drew kurtosis devnets into the gate for the first time: in a single-validator kurtosis devnet PeerCount() is 0 by design (no other node to peer with), the gate drops every newWorkReq, and the veblop fallback timer re-fires straight back into the same drop. Veblop's primary/backup producer election already handles outage safety at the protocol level — peer count adds no safety, only a foot-gun. Strip the gate; `mainLoop` now routes every newWorkReq directly to `commitWork` (except the pre-existing DisablePendingBlock-on-non-validator short-circuit). Verified end-to-end on a kurtosis 1-validator devnet (chain ID 4927, veblop active). * miner: drop stale PeerCount==0 gate references in test backends Follow-up to the gate removal. With no test exercising peer count anymore, the `testWorkerBackend.peerCount` atomic.Int32 plumbing in `worker_test.go` was just ceremony — drop the field, the `Store(1)` default, and simplify `PeerCount()` to return a constant. The `mockBackend` / `mockBackendBor` PeerCount comments in `miner_test.go` and `fake_miner.go` still named the removed gate; rewrite them to describe what the function does rather than what it no longer guards. No production change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * miner: drop dangling incident references in syncing-leak test docstring The doc comment and t.Fatalf message for TestCommitWorkLeaksPendingWorkBlockWhenSyncing framed it against the "PeerCount==0 race fixed for the 2026-05-07 incident" and the related val4 stall — context that disappeared from the tree with the gate removal and the sibling regression test in this branch's earlier commits. Rewrite the docstring to describe the leak on its own terms and trim the t.Fatalf tail so a future reader hitting the failure doesn't grep for a name that no longer exists in-tree. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tests/bor: drop dangling gate references in producer-recovery docstring TestProducerRecoversAfterMiningRestart's docstring framed itself against four leak paths; item (1) named the deleted unit test TestMainLoopClearsPendingWorkBlockOnPeerCountZero and the deleted gate (`realNetworkNode := bor.HeimdallClient != nil`). The integration-level limitations clause also leaned on Bug 1 / that gate. Drop both. Renumber the remaining three leak paths and add a short historical note that the fourth was closed by removing the gate itself in earlier commits on this branch. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tests/bor: drop dangling Bug 1 references in producer-recovery test After the previous commit renumbered the four-bug list down to three, three remaining "Bug 1" mentions in the docstring header and the test body's Phase 2 / Phase 3 comments became dangling (the enumeration no longer has a Bug 1 entry). Reword each to point at the generic "leak family" instead. The test's behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * miner: format build-block log times as RFC3339Nano, gate on IsRunning (#2239) * consensus/bor: fix race in SpanStore.PurgeCache (#2235) * consensus/bor: fix race in SpanStore.PurgeCache PurgeCache cleared latestSpanCache via atomic Store(nil) but did not stop the background polling goroutine started by NewSpanStore. That goroutine ticks every 200ms and writes the latest span back into latestSpanCache, silently undoing the purge whenever a tick lands between the clear and the caller's next read. Fix: extract the loop into runPollLoop, track it with a WaitGroup, and have PurgeCache stop and join the goroutine before resetting state. Close uses the same path. PurgeCache no longer restarts the loop — on-demand reads via getLatestSpan fall back to updateLatestSpan inline, so callers still get fresh data without the race window. Adds TestSpanStore_PurgeCache_RaceWithPollLoop, a deterministic reproducer that sleeps past one tick before asserting. Fails reliably on develop, passes with the fix. * consensus/bor: clear heimdallStatus in PurgeCache With the poll loop stopped after PurgeCache, a stale heimdallStatus (typically CatchingUp:false) would persist and let waitUntilHeimdallIsSynced return immediately without refreshing against a freshly-swapped heimdall client. Clear it alongside the other atomics, and extend the reproducer to assert the invariant. * build: bump x/net for govulncheck (#2240) * core/forkid: include polygon-specific forks in wire forkid (#2237) * core/forkid: include polygon-specific forks in wire forkid NewID computes the eth p2p handshake forkid by calling gatherForks (lowercase), which uses reflection over the outer ChainConfig struct and is blind to the polygon-specific block fields nested under config.Bor (RioBlock, MadhugiriBlock, DandeliBlock, LisovoBlock, LisovoProBlock, GiuglianoBlock, ChicagoBlock). The polygon-aware GatherForks (uppercase) was added in #2063 to expose fork data via the new bor_forks RPC, but the wire path was never updated to use it. As a result, bor's wire forkid does not include any polygon-specific fork activation in its checksum. On chains where polygon forks activate at non-zero blocks (devnets, testnets, mainnet post-fork), bor's wire forkid is inconsistent with what the bor_forks RPC reports for the same node. This was latent until erigon v3.6.0 started including the new polygon forks (Lisovo, LisovoPro, Giugliano, Chicago) in its own GatherForks. erigon's wire forkid now correctly hashes those blocks while bor's still does not, causing the eth handshake to fail with 'fork ID rejected: local incompatible or needs update' on any deployment combining bor v2.8.0 with erigon v3.6.0 on a chain where those forks activate above block 0. Switch NewID to call GatherForks so the wire forkid includes the polygon-specific forks and matches what bor_forks RPC reports. * core/forkid: include polygon-specific forks in inbound filter newFilter (used by NewFilter / NewStaticFilter) builds the local table of valid fork checksums that the inbound eth handshake validator matches incoming peers against. It was calling lowercase gatherForks, which is the same polygon-blind reflection-only path the previous commit fixed in NewID. Without this, a patched bor (with the fixed NewID) would emit a forkid that includes the polygon-specific forks, but its own inbound filter would compute the local sums[] table without them. Incoming peer forkids would never match the table by exact rule #1 of EIP-2124 and would only be accepted via the rule #3 "remote is a superset" forgiveness path. The validator's accept/reject behaviour would be correct by accident rather than by design, and any nuance that depends on rule #1 (matching local fork state) would be wrong. Switch newFilter to GatherForks so the validator's sums[] table is symmetric with what NewID emits. After this, the eth handshake's forkid check is end-to-end polygon-aware on bor. * (feat): tracing support for state-sync transactions (#2236) * eth/tracers: add state-sync tracing support Add live tracing and refactor tracing methods for state-sync transactions. Co-authored-by: Milan Došlić <milan.doslic@tenderly.co> * core: call OnTxEnd hook only if state-sync is added * consensus/bor: remove duplicate import * internal/cli: use stream context * consensus/bor: use deterministic time for state-sync traces * eth/tracers: remove TODO * consensus/bor: fix build * eth/tracers: make cumulative gas used semantics same for tracing * core/types: removed unused system address * eth/tracers: sort statedb logs * eth/tracers: fix comment * eth/tracers: add transaction index to state-sync reciept * address claude comments * eth: move tx index check * core/types: use bor system address as sender address of state-sync transactions * eth/tracers: add state-sync custom tracer to handle invariants * core, eth, consensus: fix live state-sync tracing * eth/tracers: add state sync wrapped hooks test * eth, consensus: address claude comments * tests/bor: add e2e tests * eth/tracers: fix lint * core/types: undo sender change * consensus/bor: remove comments * eth, consensus: apply some simplifications * consensus/bor: apply diffguard changes * eth, core, consensus: increase code coverage * eth/tracers: fix prestate tracer for state-sync transactions * eth/tracers: add tests to run all tracers against state-sync transaction * eth, core: explain state-sync address representation in comments * core/types: add tests to validate empty values in state-sync tx getters * eth: disable live tracer completely if parallel evm is enabled * eth/tracers: record gas used for state-sync transactions in root frame * eth/tracers: add explainer log for sorting state db logs * consensus/bor: reuse reference * pass vmconfig while bor consensus creation * core: check for start and end hooks before tracing * consensus, eth: pass ctx in state-sync execution for timeout, add bloom to trace receipts * core/types: fix typo * eth/tracers: correct timeout ctx * eth/tracers: add more tests * eth, consensus: address bot comments * builder, packaging/templates: update all config.toml files --------- Co-authored-by: Milan Došlić <milan.doslic@tenderly.co> * Fix triejournaldirectory wiring (#2213) * wire trie journal directory from server config * server: guard trie journal path resolution * server: deduplicate trie journal default resolution --------- Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> * miner: fix TestPrefetchRaceWithSetExtra hanging on unbuffered channel send (#2233) Assisted-by: Claude:claude-sonnet-4-6 Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> * miner: disable tracer in vm.Config to prevent conflicts during block synchronization (#2206) * miner: disable tracer in vm.Config to prevent conflicts during block synchronization * Update miner/worker.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * miner: add test to verify vmConfig strips tracer during live tracing --------- Co-authored-by: StreamingFast Agent <agent@streamingfast.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * ethclient: bound tx indexing wait in tests (#2211) * chore: use Claude Opus 4.7 1M context in workflows (#2241) * ci: remove claude github actions (#2244) Replaced by the built-in Claude review bot now enabled on this repo. * params: version bump to v2.8.3-beta * v2.8.2 (#2245) * miner: track env.size in commitTransaction env.size feeds txFitsSize for the params.MaxBlockSize cap. It was seeded from header.Size() in makeEnv but never updated as transactions were committed, so the cap was effectively inert. * miner: set maxBlockSizeBufferZone to 1MB * Bump up version to v2.8.2 * params: version bump to v2.8.3-beta2 * params: version bump to v2.8.3-beta3 * Revert "core/forkid: include polygon-specific forks in wire forkid (#2237)" This reverts commit 5dc5adb9eebe7cfcd2707d54194a10e34d21481a. * params: version bump to v2.8.3-beta4 * Revert "core/forkid: include polygon-specific forks in wire forkid (#2237)" (#2248) This reverts commit 5dc5adb9eebe7cfcd2707d54194a10e34d21481a. * consensus: fix milestone-mismatch rewind deadlock (#2246) * params: bump version to v2.8.3-beta5 * params: bump version to v2.8.3 * build: bump Go to 1.26.4 to fix stdlib CVEs (GO-2026-5039, GO-2026-5037) * Revert "build: bump Go to 1.26.4 to fix stdlib CVEs (GO-2026-5039, GO-2026-5037)" This reverts commit b5a182b7906b066509af6db62f3e9527a73896a2. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Lake <lake@dunamu.com> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> Co-authored-by: Larry Arnault <Larryarnaultla@gmail.com> Co-authored-by: Harsh kumar <51978908+harshinsecurity@users.noreply.github.com> Co-authored-by: harshinsecurity <hi@harshinsecurity.in> Co-authored-by: Aaron Ajose <aaronajosee@gmail.com> Co-authored-by: haoyu-haoyu <85037553+haoyu-haoyu@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> Co-authored-by: Jerry <jerrycgh@gmail.com> Co-authored-by: Krishang <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: Sanket Saagar Karan <ssaagar@polygon.technology> Co-authored-by: Karolina G. <60556109+kajaaz@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Marcello Ardizzone <marcelloardizzone@hotmail.it> Co-authored-by: Andrei Petrov <andreipetrovqs@gmail.com> Co-authored-by: Olivier Dupont <olivierdupontvier@gmail.com> Co-authored-by: Léo Vincent <28714795+leovct@users.noreply.github.com> Co-authored-by: Milan Došlić <milan.doslic@tenderly.co> Co-authored-by: Jonas Schneider <jonasschnierderrrrrr@gmail.com> Co-authored-by: William Morriss <wjmelements@gmail.com> Co-authored-by: Matthieu Vachon <matt@streamingfast.io> Co-authored-by: StreamingFast Agent <agent@streamingfast.io> Co-authored-by: Adam Dossa <adossa@polygon.technology>

lucca30 added 3 commits April 19, 2026 11:55

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread miner/worker.go

Comment thread miner/worker.go Outdated

Comment thread miner/worker.go

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread miner/worker.go Outdated

Comment thread core/state_prefetcher.go

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread miner/worker.go Outdated

Comment thread miner/worker.go

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread miner/worker.go Outdated

Comment thread miner/worker.go Outdated

Comment thread core/state_prefetcher.go

lucca30 added 2 commits April 22, 2026 16:58

claude Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread miner/worker.go

Comment thread miner/worker.go Outdated

claude Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread miner/worker.go

Comment thread miner/worker.go

claude Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread miner/worker.go

cffls reviewed Apr 25, 2026

View reviewed changes

Comment thread miner/worker.go Outdated

Comment thread miner/worker.go Outdated

Comment thread miner/worker.go Outdated

Comment thread miner/worker_test.go

Comment thread miner/worker.go Outdated

Comment thread miner/worker.go Outdated

lucca30 added 3 commits April 27, 2026 10:44

Merge remote-tracking branch 'origin/develop' into codex/pr-2192-conf…

33445c4

…lict # Conflicts: # miner/worker_test.go

claude Bot reviewed Apr 29, 2026

View reviewed changes

adamdossa requested review from kamuikatsurgi and manav2401 May 5, 2026 15:22

manav2401 approved these changes May 8, 2026

View reviewed changes

manav2401 previously approved these changes May 13, 2026

View reviewed changes

Merge remote-tracking branch 'origin/develop' into codex/pr-2192-conf…

612ac22

…lict # Conflicts: # miner/worker.go

lucca30 dismissed manav2401’s stale review via 612ac22 May 18, 2026 17:12

claude Bot reviewed May 18, 2026

View reviewed changes

lucca30 added 2 commits May 19, 2026 15:07

Merge remote-tracking branch 'origin/develop' into HEAD

4548bda

lucca30 added 2 commits May 19, 2026 16:43

prefetcher: fix lint (unconvert + trailing newline)

8eb595d

Drops a redundant uint64() conversion in the streamIdleBatch test and trims the trailing blank line goimports flagged.

manav2401 approved these changes May 19, 2026

View reviewed changes

lucca30 mentioned this pull request May 19, 2026

consensus/bor: fix race in SpanStore.PurgeCache #2235

Merged

4 tasks

cffls approved these changes May 19, 2026

View reviewed changes

lucca30 merged commit 32b1c8e into develop May 19, 2026
30 of 33 checks passed

Conversation

lucca30 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prefetch lifecycle (after this PR)

Handoff sequence (idle → builder)

Precision impact

Implementation notes

Validation

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

lucca30 commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: prefetcher builder-phase prefetch + streaming worker pool

Overall Assessment

Medium

1. Race window in builder-phase metric attribution (worker.go:2306-2317)

Low / Nit

2. runWorker skips txs under evmAbort — subtle channel consumption semantics (state_prefetcher.go:169-177)

3. Non-blocking sends can silently drop plan txs (worker.go:2005-2009)

4. buildOverflowHeap takes a fresh pool snapshot (worker.go:2545-2551)

5. txIndex as ordering hint is approximate (state_prefetcher.go:181)

6. Minor: clone() shares signer and baseFee references (ordering.go:198-213)

What looks good

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 20, 2026

Code review

Uh oh!

lucca30 commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Review: Builder-Phase Prefetch + Block Building Lifecycle

Block Building ↔ Prefetch Lifecycle Trace

Panic Analysis

Findings

1. 🔴 Worker goroutines in PrefetchStream lack panic recovery — node crash on any EVM edge case

2. 🟡 inBuilderPhase attribution has a residual edge case (IntermediateRoot tail)

3. 🟡 sendPlan goroutines have no panic recovery

4. 🟢 Performance analysis — no regressions found on the critical path

5. 🟢 Dead code analysis — none found

What looks good

Summary

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Block Building Lifecycle

New Issues Found (Nit-Level)

Security Risks

Level of Scrutiny

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucca30 commented Apr 22, 2026

lucca30 commented Apr 20, 2026 •

edited

Loading

claude Bot commented Apr 20, 2026 •

edited

Loading

1. Race window in builder-phase metric attribution (`worker.go:2306-2317`)

2. `runWorker` skips txs under `evmAbort` — subtle channel consumption semantics (`state_prefetcher.go:169-177`)

3. Non-blocking sends can silently drop plan txs (`worker.go:2005-2009`)

4. `buildOverflowHeap` takes a fresh pool snapshot (`worker.go:2545-2551`)

5. `txIndex` as ordering hint is approximate (`state_prefetcher.go:181`)

6. Minor: `clone()` shares `signer` and `baseFee` references (`ordering.go:198-213`)

claude Bot commented Apr 20, 2026 •

edited

Loading

claude Bot commented Apr 20, 2026 •

edited

Loading

1. 🔴 Worker goroutines in `PrefetchStream` lack panic recovery — node crash on any EVM edge case

2. 🟡 `inBuilderPhase` attribution has a residual edge case (IntermediateRoot tail)

3. 🟡 `sendPlan` goroutines have no panic recovery

codecov Bot commented Apr 20, 2026 •

edited

Loading

claude Bot commented Apr 22, 2026 •

edited

Loading

(a) `txIndex == valid + invalid` on every `PrefetchStream` exit

(b) No tx hash reaches `txsCh` twice in a single `runBuilderTxProvider` invocation

(c) `builderPrefetchedTxHashes` is only written while `inBuilderPhase==true`

`builderPlanCh` closed before `runBuilderTxProvider` starts

Empty txPool at `buildOverflowHeap` time

Concurrent `hardKill` + `evmAbort` set from different goroutines mid-phase

`prepareWork` early-return bypasses `defer interruptPrefetch.Store(true)` — zombie prefetcher on `prepareWork` failure