Skip to content

execution: extract spec tests out of make test-all#21092

Merged
taratorio merged 23 commits into
mainfrom
worktree-spec-test-ci
May 12, 2026
Merged

execution: extract spec tests out of make test-all#21092
taratorio merged 23 commits into
mainfrom
worktree-spec-test-ci

Conversation

@taratorio

@taratorio taratorio commented May 11, 2026

Copy link
Copy Markdown
Member

@taratorio taratorio changed the title execution: extract spec tests out of make test-all [DO-NOT-MERGE] execution: extract spec tests out of make test-all May 11, 2026
@taratorio taratorio requested a review from domiwei as a code owner May 11, 2026 03:22
@taratorio taratorio changed the title [DO-NOT-MERGE] execution: extract spec tests out of make test-all execution: extract spec tests out of make test-all May 11, 2026
@taratorio

Copy link
Copy Markdown
Member Author

Extract EEST spec tests out of make test-all

Why

While looking into adding state-test coverage for the devnet EEST fixtures (which we were not running), it became clear that picking them up would have added ~120,000 tests to make test-all — driven largely by the large test surface of EIP-8037 in the current devnet. The blockchain-test devnet path is in the same situation and is already the biggest single cost in our PR CI today. Adding the EEST engine-x benchmark suite to make test-all would make it worse still — currently we do not run those tests in CI at all (only via a manual env-var gate).

This pattern of "more EEST coverage → slower make test-all" doesn't scale: PR feedback time is already too long. The goal is to keep PR CI around or under ~30 minutes total.

This PR extracts every EEST spec test out of make test-all (and make test-all-race) and runs them through the cmd/evm statetest / blocktest / enginextest binaries in a dedicated new EEST spec tests workflow (test-eest-spec.yml) added to ci-gate. The new workflow shards over state, blockchain, and engine-x test suites, each split into a stable shard and a devnet shard, plus per-gas-target shards for the engine-x benchmark suite (7 gas targets: 1M, 5M, 10M, 30M, 60M, 100M, 150M).

As a secondary cleanup, this PR also deduplicates the consensus spec test (cl/spectest) which was running twice — once inside make test-all and again in test-integration-caplin.yml — by skipping it in make test-all via an env-var gate.

End result:

  • make test-all drops from ~27 min to ~4 min on Linux (the longest running EEST tests and the cl/spectest consensus tests no longer live there).
  • make test-all-race drops its slowest shard from ~34 min to ~4 min, and two whole matrix cells (execution-eest-blockchain, execution-eest-devnet) disappear from the race matrix.
  • The EEST spec tests themselves now run dramatically more tests than before in the same wall-clock window, because every shard runs in parallel — and because we drop almost every SkipLoad pattern and previously-not-running test (state-devnet, engine-x stable, engine-x benchmark) joins the matrix.
  • New coverage scales horizontally going forward — add a gas target, add a shard. No new pressure on make test-all.

CI runs compared below are:

Duration: test-all-erigon.yml + test-all-erigon-race.yml

Per-shard wall time (slowest matrix cell highlighted in bold):

Workflow Shard Before After Δ
test-all-erigon.yml ubuntu-24.04 26m 53s 3m 51s −86%
test-all-erigon.yml windows-2025 23m 34s 15m 19s −35%
test-all-erigon.yml macos-15 5m 22s 4m 09s −23%
test-all-erigon-race.yml execution-tests 33m 39s 2m 27s −93%
test-all-erigon-race.yml consensus 15m 15s 2m 52s −81%
test-all-erigon-race.yml execution-eest-blockchain 15m 23s removed --
test-all-erigon-race.yml execution-eest-devnet 16m 40s removed --
test-all-erigon-race.yml execution-other 7m 14s 4m 07s −43%
test-all-erigon-race.yml other 6m 12s 3m 28s −44%
test-all-erigon-race.yml core-rpc 6m 02s 2m 41s −55%
test-integration-caplin.yml ubuntu-24.04 8m 57s 4m 01s −55%
test-integration-caplin.yml windows-2025 7m 50s 3m 08s −60%

The two workflows that previously dominated wall time on the PR gate (ubuntu test-all at 27m, race execution-tests at 34m) now finish in under 4 minutes each. The race workflow loses two whole matrix cells, freeing two runner slots and ~16 minutes each.

Coverage: test-eest-spec.yml (new) vs. previous Go test equivalents

The new workflow has 13 shards (one is currently skipped because the devnet tarball doesn't yet ship blockchain_tests_engine_x/). All shards run in parallel; the workflow finishes in ~23 min wall time (gated by the slowest shard, blocktests-devnet).

Shard After: tests After: duration Before: equivalent Before: tests
eest-spec-statetests-stable 63,556 10m 30s TestState (walked state_tests/static/state_tests, skipped stTimeConsuming/) ~40,855
eest-spec-statetests-devnet 120,073 14m 22s not run N/A
eest-spec-blocktests-stable 69,256 13m 52s TestExecutionSpecBlockchain + …CancunBlobs + …PragueCalldata + …OsakaCLZ + …FrontierScenarios (with ~10+ SkipLoad patterns combined) subset of after
eest-spec-blocktests-devnet 137,647 22m 51s TestExecutionSpecBlockchainDevnet (whitelisted for_amsterdam/, ~150 SkipLoad patterns) subset of after
eest-spec-enginextests-stable 63,920 16m 49s not run N/A
eest-spec-enginextests-devnet skipped (no fixture yet) 1m 57s not run N/A
eest-spec-enginextests-benchmark-1m 1,076 8m 20s BenchmarkEngineX* (gated behind BENCH_ENGINE_X_MANUAL_ALLOW) N/A
eest-spec-enginextests-benchmark-5m 1,077 8m 13s BenchmarkEngineX* N/A
eest-spec-enginextests-benchmark-10m 1,077 8m 19s BenchmarkEngineX* N/A
eest-spec-enginextests-benchmark-30m 1,077 10m 06s BenchmarkEngineX* N/A
eest-spec-enginextests-benchmark-60m 1,077 12m 27s BenchmarkEngineX* N/A
eest-spec-enginextests-benchmark-100m 1,077 14m 50s BenchmarkEngineX* N/A
eest-spec-enginextests-benchmark-150m 1,077 18m 55s BenchmarkEngineX* N/A
Totals (currently-running shards) 463,990 22m 51s (slowest) Subset of state-tests + blockchain-tests only; no devnet state, no engine-x at all --

The before-coverage rows that are not N/A were running, but in every case as a subset of the post-PR path — TestState walked one subdirectory of state_tests/, TestExecutionSpecBlockchainDevnet walked one fork dir under for_amsterdam/ with ~150 hand-curated SkipLoad regexes, and the stable blockchain-test variants similarly had SkipLoad exclusions. None of statetests-devnet, enginextests-stable, or enginextests-benchmark-* were exercised at all in CI before.

The post-PR matrix runs ~191,500 tests that were not running at all in CI before (state-devnet + engine-x stable + engine-x benchmark), and adds tens of thousands more from dropped SkipLoad lists on the previously-running suites — all while shrinking PR-critical-path workflows (test-all-erigon.yml, test-all-erigon-race.yml) to a fraction of their previous wall time.

@taratorio

Copy link
Copy Markdown
Member Author

@yperbasis this is ready for review

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions (non-blocking)

  1. Two sources of truth for failure budgets. Per-shard max-allowed-failures defaults exist in both .github/workflows/test-eest-spec.yml:209-221 and tools/run-eest-spec-test.sh:1582-1623, with a comment asking future editors to keep them in sync. Could be a single JSON/YAML manifest consumed by both. Minor.

  2. Unconditional timedExec in runEngineXGroup (cmd/evm/enginexrunner.go:280-282). The wrapper runs even when --time is off, paying two runtime.ReadMemStats (stop-the-world) calls per test. With ~64k tests this is ~128k µs-scale STW pauses — sub-second total, almost certainly fine, but if timeIt { … } else { … } would avoid it entirely. The inline comment ("Always timedExec(bench=false)…") is honest about the intent.

  3. Doc/code inconsistency. The runEngineXGroup docstring (cmd/evm/enginexrunner.go:275-277) says "When timeIt is true, each test's single Run is wrapped in timedExec" — but the body wraps unconditionally and only the r.Stats assignment is gated on timeIt. The inline comment three lines down is correct; consider tightening the docstring to match.

  4. tools/test-fixtures.sh edge case (tools/test-fixtures.sh:25-26). With exactly one positional arg, shift 2 2>/dev/null || true fails silently and keys=("$@") then captures the manifest path as a key. No current caller hits this, but (( $# > 2 )) && shift 2; keys=("$@") would be safer.

  5. make test-fixtures reachability. After the PR, test-fixtures is no longer a transitive dep of test-all/test-all-race/test-group/test-sonar-coverage. It still works when invoked directly and downloads the full set (cl + 3× eest), but it's effectively unreferenced from the Makefile. Consider whether to keep it as a convenience alias or delete it in favor of the two narrower targets.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes execution/consensus spec-test suites (EEST and cl/spectest) from the default make test-all / make test-group Go test sweep, and runs them via dedicated Make targets and CI workflows to avoid large fixture downloads and long-running tests in the main test job.

Changes:

  • Split fixture handling into targeted Make targets (test-fixtures-cl, test-fixtures-eest) and add key-filter support to tools/test-fixtures.sh.
  • Add a dedicated EEST spec-test workflow (test-eest-spec.yml) + Makefile shard targets driven by tools/run-eest-spec-test.sh.
  • Skip cl/spectest during make test-all (via env var) and remove EEST Go test packages/benchmarks previously executed under go test ./....

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/test-groups Removes EEST-specific Go test groups now that EEST is run outside go test ./....
tools/test-fixtures.sh Adds optional manifest key filtering for fixture downloads/extraction.
tools/run-eest-spec-test.sh New driver script to run one EEST shard via cmd/evm JSON output and enforce failure budgets.
Makefile Removes test-fixtures from test-all/test-group prerequisites; adds test-fixtures-cl, test-fixtures-eest, and eest-spec-* shard targets.
execution/tests/state_test.go Removes EEST state-test Go test; keeps legacy Cancun tests and updates commentary.
execution/tests/init_test.go Drops eestDir fixture path variable (no longer used).
execution/tests/eest_prague_calldata/testmain_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_prague_calldata/block_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_osaka_clz/testmain_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_osaka_clz/block_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_frontier_scenarios/testmain_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_frontier_scenarios/block_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_devnet/testmain_test.go Removes EEST devnet Go test package (moved out of go test).
execution/tests/eest_devnet/block_test.go Removes EEST devnet Go test package (moved out of go test).
execution/tests/eest_cancun_blobs/testmain_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_cancun_blobs/block_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_blockchain/testmain_test.go Removes EEST Go test package (moved out of go test).
execution/tests/eest_blockchain/block_test.go Removes EEST Go test package (moved out of go test).
execution/tests/benchmark_engine_x_test.go Removes Go benchmark that depended on EEST benchmark fixtures (replaced by shard workflow/flags).
execution/execmodule/execmoduletester/exec_module_tester.go Fixes spelling/clarifies comment around TempDir usage.
execution/engineapi/engineapitester/engine_x_test_runner.go Removes Execute helper (now relying on Run/other paths).
db/kv/memdb/memory_database.go Fixes spelling/clarifies comment around TempDir usage.
cmd/evm/main.go Adds --time flag definition for per-test timing/memstats capture.
cmd/evm/enginexrunner.go Wires --time flag and adds timing output to stderr + JSON stats attachment.
cl/spectest/tests_test.go Skips consensus spec tests when ERIGON_SKIP_CL_SPECTEST=true (used by make test-all).
cl/spectest/Makefile Switches fixture prerequisite from test-fixtures to test-fixtures-cl.
.github/workflows/test-integration-caplin.yml Adds caching for the cl_mainnet.tar.gz fixture tarball.
.github/workflows/test-eest-spec.yml New workflow to run EEST shards with failure budgets and cached tarballs.
.github/workflows/test-all-erigon.yml Removes generic fixture tarball caching now that spec tests are extracted from test-all.
.github/workflows/ci-gate.yml Adds the new EEST spec-test workflow to the CI gate.
.claude/skills/erigon-test-race/SKILL.md Updates guidance to reflect spec tests no longer run under make test-all-race.
.claude/skills/erigon-test-all/SKILL.md Updates guidance to reflect spec tests moved to dedicated targets/workflows.
.claude/skills/erigon-implement-eip/SKILL.md Updates guidance to reference EEST shards as primary EL validation.
.claude/skills/erigon-ci/SKILL.md Updates CI map to include the EEST spec-test workflow and new local commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/test-fixtures.sh Outdated
Comment thread cmd/evm/enginexrunner.go Outdated
Comment thread .claude/skills/erigon-test-all/SKILL.md
Comment thread .claude/skills/erigon-implement-eip/SKILL.md Outdated

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But let's make at least some of the new shards run with the race detector, especially given that we're working on parallel execution.

@taratorio

Copy link
Copy Markdown
Member Author

But let's make at least some of the new shards run with the race detector, especially given that we're working on parallel execution.

Unfortunately the race detector adds a 3X overhead and makes the shards very slow, e.g.:

Shard Without race With race (est.)
eest-spec-blocktests-stable-race 13m 52s ~41 min
eest-spec-blocktests-devnet-race 22m 51s ~67 min

I'm uncertain that -race is a good thing for us to do for 200,000 spec tests. Note these tests will continue growing with every fork so it just would be very hard to scale this.

@taratorio

Copy link
Copy Markdown
Member Author

Note that the -race detector remains for all the other tests covered by make test-all, just not for the spec tests that I've extracted.

@yperbasis

Copy link
Copy Markdown
Member

Hmm. Perhaps we could run --race on a subset of tests, but OK. Hopefully the rest of the test suite is enough to spot any concurrency issues.

@taratorio

Copy link
Copy Markdown
Member Author

Minor suggestions (non-blocking)

  1. Two sources of truth for failure budgets. Per-shard max-allowed-failures defaults exist in both .github/workflows/test-eest-spec.yml:209-221 and tools/run-eest-spec-test.sh:1582-1623, with a comment asking future editors to keep them in sync. Could be a single JSON/YAML manifest consumed by both. Minor.
  2. Unconditional timedExec in runEngineXGroup (cmd/evm/enginexrunner.go:280-282). The wrapper runs even when --time is off, paying two runtime.ReadMemStats (stop-the-world) calls per test. With ~64k tests this is ~128k µs-scale STW pauses — sub-second total, almost certainly fine, but if timeIt { … } else { … } would avoid it entirely. The inline comment ("Always timedExec(bench=false)…") is honest about the intent.
  3. Doc/code inconsistency. The runEngineXGroup docstring (cmd/evm/enginexrunner.go:275-277) says "When timeIt is true, each test's single Run is wrapped in timedExec" — but the body wraps unconditionally and only the r.Stats assignment is gated on timeIt. The inline comment three lines down is correct; consider tightening the docstring to match.
  4. tools/test-fixtures.sh edge case (tools/test-fixtures.sh:25-26). With exactly one positional arg, shift 2 2>/dev/null || true fails silently and keys=("$@") then captures the manifest path as a key. No current caller hits this, but (( $# > 2 )) && shift 2; keys=("$@") would be safer.
  5. make test-fixtures reachability. After the PR, test-fixtures is no longer a transitive dep of test-all/test-all-race/test-group/test-sonar-coverage. It still works when invoked directly and downloads the full set (cl + 3× eest), but it's effectively unreferenced from the Makefile. Consider whether to keep it as a convenience alias or delete it in favor of the two narrower targets.

all of these are now addressed

@taratorio

Copy link
Copy Markdown
Member Author

New race + parallel exec shards: measured local runtimes & CI projections

All measurements from local hardware. CI-projection column applies a 1.9× factor (derived from the existing blocktests-stable CI baseline of 832s ÷ local 428s = 1.94 on the GitHub-hosted ubuntu-24.04 runner).

Race-detector shards (5 total)

Built once with go build -race. Each sub-shard filters its workload via a --run fork_<X> regex.

Shard Tests Failed (max-allowed) Local wall Est. CI wall
blocktests-stable-race-pre-cancun 8,947 0 3m 27s (measured) ~7 min
blocktests-stable-race-cancun 17,685 0 ~6m 50s (extrapolated) ~13 min
blocktests-stable-race-prague 20,858 0 ~8m 00s (extrapolated) ~15 min
blocktests-stable-race-osaka 21,558 0 ~8m 20s (extrapolated) ~16 min
blocktests-devnet-race-amsterdam 24,270 6,082 8m 47s (measured) ~17 min
  • The -parallel siblings of every blocktests-stable-race-* shard (and the devnet-race-amsterdam shard) are listed in the next table — they share the same fixture filtering but add ERIGON_EXEC3_PARALLEL=true.
  • Extrapolated rows use the measured per-test rate from …-pre-cancun (~23 ms/test under race) scaled by each shard's test count.

Parallel-exec shards (5 new + 2 toggled)

Failure budgets come from a fresh sequential run after pulling latest main. exec3-parallel: true in tools/eest-spec-shards.json causes the runner script to export ERIGON_EXEC3_PARALLEL=true.

Shard Tests Failed (max-allowed) Local wall Est. CI wall Status
blocktests-stable-parallel 69,256 270 7m 03s ~13 min new
blocktests-stable-race-pre-cancun-parallel 8,947 159 3m 28s ~7 min new
blocktests-stable-race-cancun-parallel 17,783 40 4m 23s ~8 min new
blocktests-stable-race-prague-parallel 20,945 36 5m 49s ~11 min new
blocktests-stable-race-osaka-parallel 21,564 36 5m 36s ~11 min new
blocktests-devnet 137,647 6,383 15m 27s ~29 min now parallel (was serial)
blocktests-devnet-race-amsterdam 24,270 6,082 8m 47s ~17 min now parallel (was serial)

Notes:

  • 270 stable-parallel failures, 0 stable-serial failures — every one of those 270 is a parallel-only regression on stable fixtures (the serial path already runs as a separate shard with max-allowed-failures: 0, so any regression there is caught).
  • Devnet failure counts shifted a bit vs the prior measurements after the main pull:
    • blocktests-devnet: 6,206 → 6,383 (+177).
    • blocktests-devnet-race-amsterdam: 6,140 → 6,082 (−58).
  • Race + parallel sub-shard failure budgets distribute as: pre-cancun=159 (largest because spans 9 forks), cancun/prague/osaka all ~36–40 — most of the parallel-only stable regressions land in the older forks under race conditions.
  • The engineapitester (engine-x) test runner is currently incompatible with parallel exec3 (validateAndStorePayload: unexpected state step has more work). Engine-x benchmark -parallel clones were considered, measured (all 7,538 tests fail), and dropped from this PR — they'll come back in a follow-up once the runner is fixed.

Shard runtime budget summary

All race and parallel shards fit comfortably under a 30-min CI timeout. Slowest is blocktests-devnet at ~29 min projected (still under the target). The matrix runs all of them in parallel so the workflow's wall time is gated by the slowest shard, not the sum.

@taratorio

Copy link
Copy Markdown
Member Author

@yperbasis note I added several -race shards for blocktests plus some parallel exec shards too as per #21092 (comment)

overall CI time is under 30 mins

the only thing is that we now have a lot more parallel runs, so this may put pressure on our GH runners - it may end up being ok but we may also end up needing another runner or two (let's monitor, im not really sure what will happen once this lands but I can quickly disable some shards if it causes issues until we get more runners)

@taratorio

Copy link
Copy Markdown
Member Author

successful run: https://github.com/erigontech/erigon/actions/runs/25716252300?pr=21092
34 min (slowest is still kurtosis assertoor regular as before, no time impact)

@taratorio taratorio added this pull request to the merge queue May 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026
@taratorio taratorio enabled auto-merge May 12, 2026 07:11
@taratorio taratorio added this pull request to the merge queue May 12, 2026
@AskAlexSharov

Copy link
Copy Markdown
Collaborator

oh, nice

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026
@taratorio taratorio enabled auto-merge May 12, 2026 09:12
@taratorio taratorio added this pull request to the merge queue May 12, 2026
Merged via the queue into main with commit 4646132 May 12, 2026
58 checks passed
@taratorio taratorio deleted the worktree-spec-test-ci branch May 12, 2026 10:22
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request May 15, 2026
…3_PARALLEL flakes (erigontech#21136) (erigontech#21153)

Splits the substantive changes out of erigontech#21017 (which is left with just
the
serial-vs-parallel exec-mode CI matrix yamls). With these merged,
erigontech#21017's new
parallel CI legs go green.

## What's here

**Parallel-exec correctness — the SD-of-pre-existing-contract / recreate
/ fork-transition family**
- `execution/stagedsync, execution/state`: parallel
SD-of-pre-existing-contract fixes — `IteratePrefix(StorageDomain, addr)`
to emit the full `StoragePath=0` cascade on self-destruct (covers
genesis-allocated / prior-block storage that `vm.StorageKeys` doesn't
see), drop `StoragePath` writes for SD'd addresses, post-SD
baseline-is-zero handling, EIP-161 empty-removal gated by
SpuriousDragon, etc.
- `execution/state`: don't emit `StoragePath=0` writes from
`IBS.Selfdestruct`.
- `execution/stagedsync`: clear calc `Deleted` on a non-SD account-field
write even when zero (a 0-value transfer that re-creates a
previously-destroyed address as an empty account, pre-EIP-161).
- `execution/stagedsync`: install the per-block changeset accumulator
before any of the block's writes (idempotent
`ensureChangesetAccumulator`), so changeset capture is robust against
out-of-band block scheduling.
- `execution/stagedsync`: clean exit when a single-block fork-validation
batch already covered `maxBlockNum` (avoids a spurious
`ErrLoopExhausted` → "unexpected state step has more work").

**Performance**
- `execution/stagedsync`: O(1) `CollectorWrites` fee-balance update.
After a tx finalizes, `nextResult` re-applies fee-adjusted balances into
`result.CollectorWrites`; `CollectorWrites` is a flat slice, so doing
that per-`addWrites`-entry was O(len(addWrites)·len(CollectorWrites)).
For a block-end IBS reconstruction both are ~one entry per account the
block touched, so a block whose tx pays ~100k accounts
(`execution/tests`' `invalid-receipt-hash-high-mgas` corner) hit ~10^10
comparisons — `TestInvalidReceiptHashHighMgas` ran >11 min under
`EXEC3_PARALLEL` + `-race` and tripped `mock_cl`'s 10-min per-call
deadline (serial: ~10 s). Index `CollectorWrites`' `BalancePath` entries
by address once (first match wins, mirroring the old `SetBalance` linear
scan) and update in O(1); the test now matches serial. Removed the
now-dead `VersionedWrites.SetBalance`.

**Test gating for residual `EXEC3_PARALLEL` flakes — tracked in erigontech#21136**
- `t.Skip`/`SkipLoad` under `dbg.Exec3Parallel` for:
`TestEIP7708BurnLogWhenCoinbaseSelfDestructs`,
`TestLegacyBlockchain/ValidBlocks/bcEIP3675/tipInsideBlock`, and
`TestValidateChainAndUpdateForkChoiceWithSideForksThatGoBackAndForwardInHeight`
(intermittent wrong-trie-root under heavy concurrent test load — the
parallel reorg/commitment path is timing-sensitive). Serial coverage of
all of these is unaffected. (The `eest_blockchain` `SkipLoad`s from the
original gating commit are no longer applicable — that package was
removed by erigontech#21092 — and were dropped accordingly.)

## Notes
- Cleanly cherry-picked onto current `main`. `make lint` clean; `make
test-short` clean; `execution/state` + `execution/stagedsync` +
`execution/execmodule` tests pass serial **and** `EXEC3_PARALLEL=true`;
`TestInvalidReceiptHashHighMgas` passes both modes (~12 s parallel, no
`-race`).
- The serial/parallel CI matrix that exercises the new parallel legs
lives in erigontech#21017 — once this merges, erigontech#21017 will be trimmed to just those
workflow yamls and rebased on top.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
awskii added a commit that referenced this pull request May 21, 2026
Main's #21092 removed both execution-eest-blockchain and
execution-eest-devnet from tools/test-groups when spec tests were
extracted out of make test-all. The merge resolution mistakenly kept
the PR-side branch with the devnet entry plus a stale "Temporarily
disable the blockchain shard" comment. Drop the devnet entry and the
comment; keep only the PR's own execution-eest-zkevm contribution.
awskii added a commit that referenced this pull request Jun 5, 2026
Per PR #21629 review (the #21092 pattern): the ~22k-test zkevm
execution-witness conformance suite must not run under `go test` /
`make test-all`. Replace it with a `zkevmtest` cmd/evm subcommand executed as
parallel `eest-spec-tests` shards (incl. a -race variant).

- cmd/evm/zkevmrunner.go: new `zkevmtest` subcommand — imports each EEST zkevm
  blockchain_tests fixture, queries debug_executionWitness (canonical) per
  block, multiset-compares state/codes/headers. Expected-fail patterns for the
  known #21563/#21566 divergences are ported via TestMatcher so the shard
  budget stays 0.
- testutil: add BlockTest.RunWithTesterCLI (nil tb, caller-owned tester) and
  TestMatcher.CheckFailureForName for CLI runners; RunCLI now shares the path.
- shards: add zkevm-witness + zkevm-witness-race-all to eest-spec-shards.yml,
  route them in run-eest-spec-test.sh, cache the eest_zkevm tarball in
  test-eest-spec.yml, and add Makefile targets provisioning test-fixtures-zkevm.
- remove the go-test suite (execution/tests/eest_zkevm_witness), its
  execution-eest-zkevm test-group, the zkevm steps in test-all-erigon-race.yml,
  and the GOTEST_PACKAGES filter (back to ./...).
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request Jun 5, 2026
…21629)

Adds the zkEVM (canonical) conformance runner for
`debug_executionWitness`, the legacy-mode field population, the optional
`mode` request param, and a format spec.

## Changes

- **`zkevmtest` runner** — a `cmd/evm` subcommand (mirroring
`staterunner`/`blockrunner`/`enginexrunner`) that checks
`debug_executionWitness` (canonical mode) against the
`ethereum/execution-spec-tests` zkevm@v0.4.0 corpus (~22.5k tests). Runs
as the `zkevm-witness-race` shard in the `eest-spec-tests` workflow, not
under `go test`. Known divergences (erigontech#21563, erigontech#21566) are absorbed in the
runner so the shard budget stays 0.
- **legacy `keys`/`codes`** — `keys` holds the 20B address / 32B
storage-slot preimages; `codes` holds pre-state bytecode incl. EIP-7702
designators and the empty-code entry.
- **`mode` request param** (`legacy`/`canonical`).
- **Spec** at `docs/plans/witness-legacy-mode-spec.md`, re-verified
against code.

## erigontech#21487

Carries erigontech#21487's suite plus the fixes that make it pass; erigontech#21487 is
redundant once CI is green here. The open question from erigontech#21487 — whether
a fixture-gated suite belongs in the required race-gate — is resolved:
it's now a `zkevmtest` runner shard in the `eest-spec-tests` workflow
(the erigontech#21092 pattern), not a `go test` package.

## Mainnet legacy soak vs reth-legacy

Rolling soak near tip (block ~`25.25M`, ~3.7k blocks checked):

- 0 health failures — no 500s, no root mismatches; every witness passes
`verifyWitnessStateless`.
- 90.6% strict set-equal on `{keys, codes, state}`.
- Divergences are almost all over-inclusion (204 blocks `keys`, 86
`codes`, 62 `state`) — a few extra nodes per block, still re-executing
to the correct root. 3 blocks show minor `keys`/`codes` under-inclusion,
none affecting `verifyWitnessStateless`.
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request Jun 6, 2026
…h#21640)

## Problem

`tools/run-eest-spec-test.sh` invokes the cmd/evm runner with `|| true`
and decides shard pass/fail from the `--jsonout` report alone. That
design is correct for the non-race shards (the runners always exit 0 and
report via JSON; crashes surface as missing/truncated JSON, which the
script catches via the `total == 0` check and `set -euo pipefail` on
jq).

It is wrong for the `*-race-*` shards: the Go race runtime reports
detected races to **stderr** and overrides the process exit code to
**66** (even through `os.Exit(0)`), while the run completes normally and
the JSON stays fully valid with every test passing. The `|| true`
discards that exit code, so a race shard whose binary detected a data
race still went green — the `WARNING: DATA RACE` report only visible in
the job log. The race detector's whole value is catching races that
*don't* happen to corrupt results, so this silently weakened the
guarantee of every race shard:

```
blocktests-stable-race-{pre-cancun,cancun,prague,osaka}-{sequential,parallel}
blocktests-devnet-race-amsterdam
```

(and `zkevm-witness-race-all` once erigontech#21629 lands).

Pre-existing since the shards were introduced in erigontech#21092; surfaced while
reviewing erigontech#21629, which doubles the corpus running under the race
detector.

## Fix

Capture the runner's exit code and fail the shard explicitly on 66 with
a pointer to the race report above it in the log. All other exit-code
tolerance is unchanged.

## Verification (local repro)

Built a minimal `-race` Go binary containing a genuine data race that
emits valid passing JSON (`[{"name":"t1","pass":true}]`) and calls
`os.Exit(0)`, then ran it through the real script as `EVM_BIN` against
`blocktests-stable-race-cancun-sequential`:

- **Before** (unmodified script): binary prints `Found 1 data race(s)`
and exits 66 → script prints `ran 1 tests, 0 failed` and exits **0**
(green despite the race).
- **After**: same run exits **1** with `ERROR: data race detected (race
runtime exit code 66); see WARNING: DATA RACE in the log above`.

Regression matrix with stub binaries through the patched script,
confirming only the race case changes behavior:

| Scenario | Before | After |
|---|---|---|
| race detected (exit 66, valid passing JSON) | exit 0 | **exit 1** |
| clean pass (exit 0) | exit 0 | exit 0 |
| non-zero exit ≠ 66 with valid JSON | exit 0 (tolerated by design) |
exit 0 |
| test failures over budget | exit 1 | exit 1 |

`shellcheck --severity=warning` (the lint-workflow invocation) is clean
on the modified script; `make lint` clean.

The race shards running on this PR exercise the patched path end-to-end
in CI (race-free run → `rc=0` → behavior identical to before).

## Note on TDD

This is a shell-script CI guard with no test harness; the Red → Green
cycle above (reproduce the masked exit code through the unmodified
script, then flip it with the fix) stands in for a checked-in failing
test.

## End-to-end verification with a real race in erigon code

Repeated the verification with the full production pipeline instead of
stub binaries: planted a deliberate data race in
`IntraBlockState.CommitBlock` (two unsynchronized goroutines
incrementing a package var, executed once per imported block — never
committed), built the real `evm.race` via `make evm.race`, and ran a
single real EEST fixture
(`cancun/eip1153_tstore/test_basic_tload_after_store.json`, one
`fork_Cancun` test) through the
`blocktests-stable-race-cancun-sequential` shard:

- **old script** (origin/main): race report correctly attributes
`intra_block_state.go` in `CommitBlock.func1`, binary prints `Found 1
data race(s)` and exits 66 → script prints `ran 1 tests, 0 failed`,
**exit 0** — green despite the race.
- **this PR's script**: same binary, same fixture → **exit 1**, last
line `ERROR: data race detected (race runtime exit code 66); see
WARNING: DATA RACE in the log above`.
- **control**: race reverted, `evm.race` rebuilt clean, same fixture
through this PR's script → 0 race warnings, `ran 1 tests, 0 failed`,
**exit 0** — no false positive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants