Skip to content

core/vm, core, miner: fix interrupt propagation to nested EVM calls#2092

Merged
lucca30 merged 4 commits intodevelopfrom
lmartins/interrupt-on-struct-field-to-address-nested-call-interruption
Mar 2, 2026
Merged

core/vm, core, miner: fix interrupt propagation to nested EVM calls#2092
lucca30 merged 4 commits intodevelopfrom
lmartins/interrupt-on-struct-field-to-address-nested-call-interruption

Conversation

@lucca30
Copy link
Copy Markdown
Contributor

@lucca30 lucca30 commented Feb 27, 2026

Description

Problem

The block-building interrupt flag (interruptBlockBuilding) is not propagated to nested EVM calls. When the interpreter executes opcodes that make nested calls — CALL, DELEGATECALL, STATICCALL, CALLCODE, CREATE — they all pass nil for the interrupt parameter to evm.Run(). This causes Run() to create a dummy new(atomic.Bool) that never fires, making nested contract execution completely uninterruptible.

Since virtually all real transactions involve at least one nested call (proxy patterns, DEX routers, multi-hop swaps), the per-opcode interrupt check is effectively bypassed for the bulk of EVM execution time.

Real-world impact

Block 83,527,864 on Polygon PoS mainnet took 5.654s to build instead of ~2s. The interrupt timer fired on schedule (~1.5s after block building started), but a transaction executing a complex nested call (proxy/DEX pattern) ran for an additional ~4.1 seconds because the interrupt couldn't reach the nested execution. The block was announced 3.7 seconds late.

Root cause

The interrupt flag flows through this chain for the top-level call:

worker.commitTransaction → core.ApplyTransaction → ApplyTransactionWithEVM
  → ApplyMessageNoFeeBurnOrTip → st.execute(interrupt)
    → evm.Call(..., interrupt) → evm.Run(contract, input, false, interrupt) ✓ PASSED

But when opcodes make nested calls, they all pass nil:

instructions.go  opCall:         evm.Call(..., nil)       // ← nil!
evm.go           CallCode:       evm.Run(..., nil)        // ← nil!
evm.go           DelegateCall:   evm.Run(..., nil)        // ← nil!
evm.go           StaticCall:     evm.Run(..., nil)        // ← nil!
evm.go           Create:         evm.Run(..., nil)        // ← nil!

When interrupt is nil, interpreter.go creates a dummy new(atomic.Bool) that is always false — the nested execution never checks the real interrupt flag.

Test: proving the bug exists

TestInterruptNestedCall in core/vm/runtime/runtime_test.go demonstrates the issue with two sub-tests:

  • TopLevel: An infinite loop at the top level, interrupted after 50ms. Proves the interrupt mechanism works for flat code.
  • NestedCall: The same infinite loop inside a STATICCALL. On pre-fix code, the interrupt is ignored inside the nested call.

Pre-fix results (commit 029c58680)

To reproduce the bug, checkout the test commit before the fix and run:

git checkout 029c58680
go test -run TestInterruptNestedCall -v ./core/vm/runtime/
=== RUN   TestInterruptNestedCall/TopLevel
--- PASS: (0.05s)
=== RUN   TestInterruptNestedCall/NestedCall
    runtime_test.go:204: Bug confirmed: nested call took 1.12s (interrupt ignored inside STATICCALL)
--- PASS: (1.12s)

The nested call took 1.12 seconds — it ran until gas exhaustion, completely ignoring the interrupt set at 50ms.

Post-fix results (commit c01a363ec)

git checkout c01a363ec
go test -run TestInterruptNestedCall -v ./core/vm/runtime/
=== RUN   TestInterruptNestedCall/TopLevel
--- PASS: (0.05s)
=== RUN   TestInterruptNestedCall/NestedCall
    runtime_test.go:201: Fix verified: nested call interrupted in 51ms
--- PASS: (0.05s)

The nested call now stops in ~51ms — a 22x improvement, matching the top-level interrupt latency.

Fix

Move interrupt from a function parameter to a field on the EVM struct. This ensures all call depths — top-level, nested, and deeply nested — share the same interrupt flag.

// New field on EVM struct
type EVM struct {
    // ...
    interrupt *atomic.Bool
}

// Set once at block-building start
func (evm *EVM) SetInterrupt(interrupt *atomic.Bool) {
    evm.interrupt = interrupt
}

// Run() reads from the struct — all call depths see the same flag
func (evm *EVM) Run(contract *Contract, input []byte, readOnly bool) (ret []byte, err error) {
    interrupt := evm.interrupt
    if interrupt == nil {
        interrupt = new(atomic.Bool)
    }
    for {
        if interrupt.Load() { return nil, ErrInterrupt }
        // ... execute opcode ...
    }
}

The interrupt parameter is removed from Call(), Run(), and the entire ApplyTransaction chain (29 files, removing parameter threading through ApplyMessage, ApplyMessageNoFeeBurnOrTip, execute(), etc.). The miner sets the interrupt once via evm.SetInterrupt(&w.interruptBlockBuilding) in makeEnv.

Performance validation

An earlier optimization replaced context.Done() with atomic.Bool for the per-opcode interrupt check, yielding a 25x throughput improvement (10.34 ns/op → 0.41 ns/op). We validated that our refactor does not regress this optimization.

Why there should be zero impact

Both pre-fix and post-fix produce identical hot-path code in the interpreter loop:

// Both versions: local variable used in the hot loop
interrupt := /* parameter (pre-fix) | evm.interrupt (post-fix) */
for {
    if interrupt.Load() { ... }  // ← same local *atomic.Bool in both cases
}

The only difference is one extra struct field read at function entry (evm.interrupt), which happens once per Run() call — negligible compared to millions of opcode iterations per transaction.

BenchmarkSimpleLoop comparison (100M gas, 6 runs, 5s benchtime)

                                       │   pre-fix   │              post-fix              │
                                       │   sec/op    │   sec/op     vs base               │
SimpleLoop/loop-100M                      133.6m ± 0%   134.5m ± 0%  +0.63% (p=0.002 n=6)
SimpleLoop/call-reverting-100M            465.3m ± 0%   466.5m ± 0%  +0.26% (p=0.026 n=6)
SimpleLoop/call-nonexist-100M             348.4m ± 0%   350.7m ± 0%  +0.67% (p=0.002 n=6)
SimpleLoop/call-EOA-100M                  312.9m ± 0%   315.5m ± 0%  +0.82% (p=0.002 n=6)
SimpleLoop/call-identity-100M             315.8m ± 1%   321.5m ± 0%  +1.80% (p=0.002 n=6)
SimpleLoop/staticcall-identity-100M       127.3m ± 1%   130.6m ± 1%  +2.58% (p=0.002 n=6)
SimpleLoop/loop2-100M                     155.2m ± 0%   166.8m ± 0%  +7.51% (p=0.002 n=6)

Memory allocations: unchanged across all benchmarks.

Interpretation: The critical loop-100M benchmark (pure opcode loop, most sensitive to per-opcode overhead) shows +0.63% — well within system noise. Most benchmarks fall in the 0.2–1.8% range. The loop2-100M outlier at +7.5% is attributed to thermal throttling: benchmarks were run sequentially (~4 minutes each), and the consistent slight increase across all benchmarks (never a decrease) is a classic sign of CPU thermal effects from back-to-back benchmark suites on Apple Silicon. The per-opcode interrupt check code is literally identical in both versions — the local *atomic.Bool variable and .Load() call produce the same machine code.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

No breaking changes

…truct field

The block-building interrupt flag was threaded as a function parameter
through Call(), Run(), ApplyTransaction(), ApplyMessage(), and execute().
This made it easy to pass nil at nested call sites, which is exactly what
happened — DelegateCall, StaticCall, CallCode, Create, and even the CALL
opcode all passed nil, silently disabling per-opcode interrupt checks for
nested execution.

Instead, store the interrupt once on the EVM struct via SetInterrupt()
and read it in Run(). This removes the parameter from the entire call
chain (29 files) and ensures the interrupt is checked at every opcode
regardless of call depth.
@lucca30 lucca30 requested review from a team, manav2401 and pratikspatil024 February 27, 2026 17:01
@sonarqubecloud
Copy link
Copy Markdown

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 68.62745% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.63%. Comparing base (fdbc857) to head (6514435).

Files with missing lines Patch % Lines
core/vm/evm.go 37.50% 5 Missing ⚠️
core/parallel_state_processor.go 0.00% 2 Missing ⚠️
eth/tracers/api.go 66.66% 0 Missing and 2 partials ⚠️
core/state_processor.go 87.50% 1 Missing ⚠️
core/vm/instructions.go 0.00% 1 Missing ⚠️
eth/gasestimator/gasestimator.go 0.00% 1 Missing ⚠️
eth/state_accessor.go 0.00% 1 Missing ⚠️
eth/tracers/api_bor.go 0.00% 1 Missing ⚠️
tests/bor/helper.go 0.00% 1 Missing ⚠️
tests/state_test_util.go 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (68.62%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2092      +/-   ##
===========================================
+ Coverage    50.59%   50.63%   +0.03%     
===========================================
  Files          875      875              
  Lines       151820   151824       +4     
===========================================
+ Hits         76815    76872      +57     
+ Misses       69929    69879      -50     
+ Partials      5076     5073       -3     
Files with missing lines Coverage Δ
accounts/abi/bind/backends/simulated.go 60.68% <100.00%> (ø)
consensus/bor/statefull/processor.go 0.00% <ø> (ø)
core/chain_makers.go 51.99% <100.00%> (ø)
core/state_prefetcher.go 91.56% <100.00%> (+0.10%) ⬆️
core/state_transition.go 70.00% <100.00%> (ø)
core/vm/interpreter.go 54.93% <100.00%> (+0.27%) ⬆️
core/vm/runtime/runtime.go 89.13% <100.00%> (ø)
internal/ethapi/api.go 39.70% <100.00%> (ø)
miner/worker.go 67.70% <100.00%> (-0.38%) ⬇️
core/state_processor.go 64.00% <87.50%> (ø)
... and 9 more

... and 19 files with indirect coverage changes

Files with missing lines Coverage Δ
accounts/abi/bind/backends/simulated.go 60.68% <100.00%> (ø)
consensus/bor/statefull/processor.go 0.00% <ø> (ø)
core/chain_makers.go 51.99% <100.00%> (ø)
core/state_prefetcher.go 91.56% <100.00%> (+0.10%) ⬆️
core/state_transition.go 70.00% <100.00%> (ø)
core/vm/interpreter.go 54.93% <100.00%> (+0.27%) ⬆️
core/vm/runtime/runtime.go 89.13% <100.00%> (ø)
internal/ethapi/api.go 39.70% <100.00%> (ø)
miner/worker.go 67.70% <100.00%> (-0.38%) ⬇️
core/state_processor.go 64.00% <87.50%> (ø)
... and 9 more

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lucca30
Copy link
Copy Markdown
Contributor Author

lucca30 commented Mar 2, 2026

@claude Do a detailed and careful review of this code changes. Analyze if any issue, bug, security breach was introduced. Analyze if the refactor kept the behavior as it should and if just addressed the missed opcode calls. Check if there are any potential issue with performance introduced.

@claude
Copy link
Copy Markdown

claude bot commented Mar 2, 2026

Claude finished @lucca30's task in 2m 9s —— View job


Detailed Code Review Complete ✅

  • Gather context and understand the changes
  • Analyze core/vm/evm.go - EVM struct changes
  • Analyze core/vm/interpreter.go - Run function changes
  • Analyze core/vm/instructions.go - opcode handler changes
  • Analyze core/state_transition.go - state transition changes
  • Analyze miner/worker.go - block building interrupt setup
  • Review all other modified files for consistency
  • Check for potential bugs or security issues
  • Check for performance implications
  • Provide final review summary

Summary

This PR fixes a significant bug where the block-building interrupt flag was not propagated to nested EVM calls (CALL, DELEGATECALL, STATICCALL, CALLCODE, CREATE). The fix moves the interrupt flag from a function parameter to a field on the EVM struct.

Review Findings

✅ Correctness: PASS

The refactor correctly addresses the issue:

  1. EVM Struct Change (core/vm/evm.go:135-146): New interrupt *atomic.Bool field added with SetInterrupt() method. Clean, well-documented approach.

  2. Interpreter Change (core/vm/interpreter.go:108-111): Run() now reads from evm.interrupt instead of a parameter. If nil, creates a dummy atomic.Bool - this maintains backward compatibility for tests and non-mining contexts.

  3. Call Path Cleanup: All nested call methods (Call, CallCode, DelegateCall, StaticCall, create) no longer pass nil for interrupt - they inherit the EVM-level interrupt automatically.

  4. Miner Integration (miner/worker.go:1078): SetInterrupt(&w.interruptBlockBuilding) is called once when the environment is created, ensuring all subsequent EVM operations share the same interrupt flag.

  5. Prefetcher Fix (core/state_prefetcher.go:121): Correctly calls evm.SetInterrupt(interrupt) before ApplyMessage.

✅ API Consistency: PASS

All 32 modified files consistently update the function signatures:

  • evm.Call() - removed interrupt parameter
  • evm.Run() - removed interrupt parameter
  • ApplyMessage() - removed interrupt parameter
  • ApplyMessageNoFeeBurnOrTip() - removed interrupt parameter
  • ApplyTransaction() - removed interrupt parameter
  • ApplyTransactionWithEVM() - removed interrupt parameter
  • execute() (stateTransition) - removed interrupt parameter

✅ Security: PASS

No security issues introduced:

  • The interrupt mechanism is still atomic (*atomic.Bool)
  • No new attack vectors - the interrupt is only writable by the block builder
  • The fix doesn't change consensus behavior, only execution timing for interrupted transactions

✅ Behavior Preservation: PASS

The refactor maintains identical behavior:

  • Interrupt checks happen at the same location in the interpreter loop (line 185-188)
  • The per-opcode interrupt.Load() check is unchanged
  • The existing evm.abort mechanism for manual cancellation is preserved
  • System calls (ProcessBeaconBlockRoot, ProcessParentBlockHash, processRequestsSystemCall) correctly work without interrupt since they don't call SetInterrupt()

✅ Performance: PASS

No performance regression expected:

  • The hot path code (interrupt.Load() in the interpreter loop) is identical before and after
  • Only one extra struct field read (evm.interrupt) at function entry of Run(), not per-opcode
  • Benchmarks in PR description show +0.63% for the critical loop, well within noise

✅ Test Coverage: PASS

New test TestInterruptNestedCall (core/vm/runtime/runtime_test.go:136-202) properly validates:

  • Top-level interrupt works (sanity check)
  • Nested call interrupt now works (the actual fix)
  • Both complete in ~50ms with 50ms interrupt timeout

Minor Observations (Non-blocking)

  1. Codecov Warning: Patch coverage at 68.6% - this is expected since many changes are in test utilities and tracing code that isn't fully exercised in unit tests. The core logic has adequate coverage.

  2. core/vm/instructions.go:812: The opCall function call to evm.Call() is clean. Previously passed nil explicitly, now simply omitted (inherited from EVM struct).

  3. Consistent nil handling: When evm.interrupt == nil, a dummy atomic.Bool is created in Run(). This is safe and maintains the invariant that interrupt is never nil during the loop.

Conclusion

This PR is well-implemented and ready to merge. The fix is minimal, focused, and correctly addresses the bug without introducing regressions or security concerns. The architectural decision to move interrupt to a struct field is sound - it follows the principle that the interrupt is a property of the entire EVM execution context, not individual call frames.


@lucca30 lucca30 merged commit 62c88ef into develop Mar 2, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants