execution: extract spec tests out of make test-all by taratorio · Pull Request #21092 · erigontech/erigon

taratorio · 2026-05-11T01:33:39Z

…c-test-ci

taratorio · 2026-05-11T06:50:27Z

Extract EEST spec tests out of `make test-all`

Why

While looking into adding state-test coverage for the devnet EEST fixtures (which we were not running), it became clear that picking them up would have added ~120,000 tests to make test-all — driven largely by the large test surface of EIP-8037 in the current devnet. The blockchain-test devnet path is in the same situation and is already the biggest single cost in our PR CI today. Adding the EEST engine-x benchmark suite to make test-all would make it worse still — currently we do not run those tests in CI at all (only via a manual env-var gate).

This pattern of "more EEST coverage → slower make test-all" doesn't scale: PR feedback time is already too long. The goal is to keep PR CI around or under ~30 minutes total.

This PR extracts every EEST spec test out of make test-all (and make test-all-race) and runs them through the cmd/evm statetest / blocktest / enginextest binaries in a dedicated new EEST spec tests workflow (test-eest-spec.yml) added to ci-gate. The new workflow shards over state, blockchain, and engine-x test suites, each split into a stable shard and a devnet shard, plus per-gas-target shards for the engine-x benchmark suite (7 gas targets: 1M, 5M, 10M, 30M, 60M, 100M, 150M).

As a secondary cleanup, this PR also deduplicates the consensus spec test (cl/spectest) which was running twice — once inside make test-all and again in test-integration-caplin.yml — by skipping it in make test-all via an env-var gate.

End result:

make test-all drops from ~27 min to ~4 min on Linux (the longest running EEST tests and the cl/spectest consensus tests no longer live there).
make test-all-race drops its slowest shard from ~34 min to ~4 min, and two whole matrix cells (execution-eest-blockchain, execution-eest-devnet) disappear from the race matrix.
The EEST spec tests themselves now run dramatically more tests than before in the same wall-clock window, because every shard runs in parallel — and because we drop almost every SkipLoad pattern and previously-not-running test (state-devnet, engine-x stable, engine-x benchmark) joins the matrix.
New coverage scales horizontally going forward — add a gas target, add a shard. No new pressure on make test-all.

CI runs compared below are:

Before: ci-gate for PR #21088, run 25635027985
After: ci-gate for PR #21092 (this PR), run 25652814164

Duration: `test-all-erigon.yml` + `test-all-erigon-race.yml`

Per-shard wall time (slowest matrix cell highlighted in bold):

Workflow	Shard	Before	After	Δ
`test-all-erigon.yml`	ubuntu-24.04	26m 53s	3m 51s	−86%
`test-all-erigon.yml`	windows-2025	23m 34s	15m 19s	−35%
`test-all-erigon.yml`	macos-15	5m 22s	4m 09s	−23%
`test-all-erigon-race.yml`	execution-tests	33m 39s	2m 27s	−93%
`test-all-erigon-race.yml`	consensus	15m 15s	2m 52s	−81%
`test-all-erigon-race.yml`	execution-eest-blockchain	15m 23s	removed	--
`test-all-erigon-race.yml`	execution-eest-devnet	16m 40s	removed	--
`test-all-erigon-race.yml`	execution-other	7m 14s	4m 07s	−43%
`test-all-erigon-race.yml`	other	6m 12s	3m 28s	−44%
`test-all-erigon-race.yml`	core-rpc	6m 02s	2m 41s	−55%
`test-integration-caplin.yml`	ubuntu-24.04	8m 57s	4m 01s	−55%
`test-integration-caplin.yml`	windows-2025	7m 50s	3m 08s	−60%

The two workflows that previously dominated wall time on the PR gate (ubuntu test-all at 27m, race execution-tests at 34m) now finish in under 4 minutes each. The race workflow loses two whole matrix cells, freeing two runner slots and ~16 minutes each.

Coverage: `test-eest-spec.yml` (new) vs. previous Go test equivalents

The new workflow has 13 shards (one is currently skipped because the devnet tarball doesn't yet ship blockchain_tests_engine_x/). All shards run in parallel; the workflow finishes in ~23 min wall time (gated by the slowest shard, blocktests-devnet).

Shard	After: tests	After: duration	Before: equivalent	Before: tests
`eest-spec-statetests-stable`	63,556	10m 30s	`TestState` (walked `state_tests/static/state_tests`, skipped `stTimeConsuming/`)	~40,855
`eest-spec-statetests-devnet`	120,073	14m 22s	not run	N/A
`eest-spec-blocktests-stable`	69,256	13m 52s	`TestExecutionSpecBlockchain` + `…CancunBlobs` + `…PragueCalldata` + `…OsakaCLZ` + `…FrontierScenarios` (with ~10+ `SkipLoad` patterns combined)	subset of after
`eest-spec-blocktests-devnet`	137,647	22m 51s	`TestExecutionSpecBlockchainDevnet` (whitelisted `for_amsterdam/`, ~150 `SkipLoad` patterns)	subset of after
`eest-spec-enginextests-stable`	63,920	16m 49s	not run	N/A
`eest-spec-enginextests-devnet`	skipped (no fixture yet)	1m 57s	not run	N/A
`eest-spec-enginextests-benchmark-1m`	1,076	8m 20s	`BenchmarkEngineX*` (gated behind `BENCH_ENGINE_X_MANUAL_ALLOW`)	N/A
`eest-spec-enginextests-benchmark-5m`	1,077	8m 13s	`BenchmarkEngineX*`	N/A
`eest-spec-enginextests-benchmark-10m`	1,077	8m 19s	`BenchmarkEngineX*`	N/A
`eest-spec-enginextests-benchmark-30m`	1,077	10m 06s	`BenchmarkEngineX*`	N/A
`eest-spec-enginextests-benchmark-60m`	1,077	12m 27s	`BenchmarkEngineX*`	N/A
`eest-spec-enginextests-benchmark-100m`	1,077	14m 50s	`BenchmarkEngineX*`	N/A
`eest-spec-enginextests-benchmark-150m`	1,077	18m 55s	`BenchmarkEngineX*`	N/A
Totals (currently-running shards)	463,990	22m 51s (slowest)	Subset of state-tests + blockchain-tests only; no devnet state, no engine-x at all	--

The before-coverage rows that are not N/A were running, but in every case as a subset of the post-PR path — TestState walked one subdirectory of state_tests/, TestExecutionSpecBlockchainDevnet walked one fork dir under for_amsterdam/ with ~150 hand-curated SkipLoad regexes, and the stable blockchain-test variants similarly had SkipLoad exclusions. None of statetests-devnet, enginextests-stable, or enginextests-benchmark-* were exercised at all in CI before.

The post-PR matrix runs ~191,500 tests that were not running at all in CI before (state-devnet + engine-x stable + engine-x benchmark), and adds tens of thousands more from dropped SkipLoad lists on the previously-running suites — all while shrinking PR-critical-path workflows (test-all-erigon.yml, test-all-erigon-race.yml) to a fraction of their previous wall time.

taratorio · 2026-05-11T06:51:11Z

@yperbasis this is ready for review

yperbasis

Minor suggestions (non-blocking)

Two sources of truth for failure budgets. Per-shard max-allowed-failures defaults exist in both .github/workflows/test-eest-spec.yml:209-221 and tools/run-eest-spec-test.sh:1582-1623, with a comment asking future editors to keep them in sync. Could be a single JSON/YAML manifest consumed by both. Minor.
Unconditional timedExec in runEngineXGroup (cmd/evm/enginexrunner.go:280-282). The wrapper runs even when --time is off, paying two runtime.ReadMemStats (stop-the-world) calls per test. With ~64k tests this is ~128k µs-scale STW pauses — sub-second total, almost certainly fine, but if timeIt { … } else { … } would avoid it entirely. The inline comment ("Always timedExec(bench=false)…") is honest about the intent.
Doc/code inconsistency. The runEngineXGroup docstring (cmd/evm/enginexrunner.go:275-277) says "When timeIt is true, each test's single Run is wrapped in timedExec" — but the body wraps unconditionally and only the r.Stats assignment is gated on timeIt. The inline comment three lines down is correct; consider tightening the docstring to match.
tools/test-fixtures.sh edge case (tools/test-fixtures.sh:25-26). With exactly one positional arg, shift 2 2>/dev/null || true fails silently and keys=("$@") then captures the manifest path as a key. No current caller hits this, but (( $# > 2 )) && shift 2; keys=("$@") would be safer.
make test-fixtures reachability. After the PR, test-fixtures is no longer a transitive dep of test-all/test-all-race/test-group/test-sonar-coverage. It still works when invoked directly and downloads the full set (cl + 3× eest), but it's effectively unreferenced from the Makefile. Consider whether to keep it as a convenience alias or delete it in favor of the two narrower targets.

Copilot

Pull request overview

This PR removes execution/consensus spec-test suites (EEST and cl/spectest) from the default make test-all / make test-group Go test sweep, and runs them via dedicated Make targets and CI workflows to avoid large fixture downloads and long-running tests in the main test job.

Changes:

Split fixture handling into targeted Make targets (test-fixtures-cl, test-fixtures-eest) and add key-filter support to tools/test-fixtures.sh.
Add a dedicated EEST spec-test workflow (test-eest-spec.yml) + Makefile shard targets driven by tools/run-eest-spec-test.sh.
Skip cl/spectest during make test-all (via env var) and remove EEST Go test packages/benchmarks previously executed under go test ./....

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tools/test-groups	Removes EEST-specific Go test groups now that EEST is run outside `go test ./...`.
tools/test-fixtures.sh	Adds optional manifest key filtering for fixture downloads/extraction.
tools/run-eest-spec-test.sh	New driver script to run one EEST shard via `cmd/evm` JSON output and enforce failure budgets.
Makefile	Removes `test-fixtures` from `test-all`/`test-group` prerequisites; adds `test-fixtures-cl`, `test-fixtures-eest`, and `eest-spec-*` shard targets.
execution/tests/state_test.go	Removes EEST state-test Go test; keeps legacy Cancun tests and updates commentary.
execution/tests/init_test.go	Drops `eestDir` fixture path variable (no longer used).
execution/tests/eest_prague_calldata/testmain_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_prague_calldata/block_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_osaka_clz/testmain_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_osaka_clz/block_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_frontier_scenarios/testmain_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_frontier_scenarios/block_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_devnet/testmain_test.go	Removes EEST devnet Go test package (moved out of `go test`).
execution/tests/eest_devnet/block_test.go	Removes EEST devnet Go test package (moved out of `go test`).
execution/tests/eest_cancun_blobs/testmain_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_cancun_blobs/block_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_blockchain/testmain_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/eest_blockchain/block_test.go	Removes EEST Go test package (moved out of `go test`).
execution/tests/benchmark_engine_x_test.go	Removes Go benchmark that depended on EEST benchmark fixtures (replaced by shard workflow/flags).
execution/execmodule/execmoduletester/exec_module_tester.go	Fixes spelling/clarifies comment around TempDir usage.
execution/engineapi/engineapitester/engine_x_test_runner.go	Removes `Execute` helper (now relying on `Run`/other paths).
db/kv/memdb/memory_database.go	Fixes spelling/clarifies comment around TempDir usage.
cmd/evm/main.go	Adds `--time` flag definition for per-test timing/memstats capture.
cmd/evm/enginexrunner.go	Wires `--time` flag and adds timing output to stderr + JSON stats attachment.
cl/spectest/tests_test.go	Skips consensus spec tests when `ERIGON_SKIP_CL_SPECTEST=true` (used by `make test-all`).
cl/spectest/Makefile	Switches fixture prerequisite from `test-fixtures` to `test-fixtures-cl`.
.github/workflows/test-integration-caplin.yml	Adds caching for the `cl_mainnet.tar.gz` fixture tarball.
.github/workflows/test-eest-spec.yml	New workflow to run EEST shards with failure budgets and cached tarballs.
.github/workflows/test-all-erigon.yml	Removes generic fixture tarball caching now that spec tests are extracted from `test-all`.
.github/workflows/ci-gate.yml	Adds the new EEST spec-test workflow to the CI gate.
.claude/skills/erigon-test-race/SKILL.md	Updates guidance to reflect spec tests no longer run under `make test-all-race`.
.claude/skills/erigon-test-all/SKILL.md	Updates guidance to reflect spec tests moved to dedicated targets/workflows.
.claude/skills/erigon-implement-eip/SKILL.md	Updates guidance to reference EEST shards as primary EL validation.
.claude/skills/erigon-ci/SKILL.md	Updates CI map to include the EEST spec-test workflow and new local commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yperbasis

But let's make at least some of the new shards run with the race detector, especially given that we're working on parallel execution.

taratorio · 2026-05-11T11:44:48Z

But let's make at least some of the new shards run with the race detector, especially given that we're working on parallel execution.

Unfortunately the race detector adds a 3X overhead and makes the shards very slow, e.g.:

Shard	Without race	With race (est.)
`eest-spec-blocktests-stable-race`	13m 52s	~41 min
`eest-spec-blocktests-devnet-race`	22m 51s	~67 min

I'm uncertain that -race is a good thing for us to do for 200,000 spec tests. Note these tests will continue growing with every fork so it just would be very hard to scale this.

taratorio · 2026-05-11T11:49:51Z

Note that the -race detector remains for all the other tests covered by make test-all, just not for the spec tests that I've extracted.

yperbasis · 2026-05-11T12:08:20Z

Hmm. Perhaps we could run --race on a subset of tests, but OK. Hopefully the rest of the test suite is enough to spot any concurrency issues.

…c-test-ci

taratorio · 2026-05-12T04:47:13Z

Minor suggestions (non-blocking)

Two sources of truth for failure budgets. Per-shard max-allowed-failures defaults exist in both .github/workflows/test-eest-spec.yml:209-221 and tools/run-eest-spec-test.sh:1582-1623, with a comment asking future editors to keep them in sync. Could be a single JSON/YAML manifest consumed by both. Minor.

Unconditional timedExec in runEngineXGroup (cmd/evm/enginexrunner.go:280-282). The wrapper runs even when --time is off, paying two runtime.ReadMemStats (stop-the-world) calls per test. With ~64k tests this is ~128k µs-scale STW pauses — sub-second total, almost certainly fine, but if timeIt { … } else { … } would avoid it entirely. The inline comment ("Always timedExec(bench=false)…") is honest about the intent.

Doc/code inconsistency. The runEngineXGroup docstring (cmd/evm/enginexrunner.go:275-277) says "When timeIt is true, each test's single Run is wrapped in timedExec" — but the body wraps unconditionally and only the r.Stats assignment is gated on timeIt. The inline comment three lines down is correct; consider tightening the docstring to match.

tools/test-fixtures.sh edge case (tools/test-fixtures.sh:25-26). With exactly one positional arg, shift 2 2>/dev/null || true fails silently and keys=("$@") then captures the manifest path as a key. No current caller hits this, but (( $# > 2 )) && shift 2; keys=("$@") would be safer.

make test-fixtures reachability. After the PR, test-fixtures is no longer a transitive dep of test-all/test-all-race/test-group/test-sonar-coverage. It still works when invoked directly and downloads the full set (cl + 3× eest), but it's effectively unreferenced from the Makefile. Consider whether to keep it as a convenience alias or delete it in favor of the two narrower targets.

all of these are now addressed

taratorio · 2026-05-12T06:09:01Z

New race + parallel exec shards: measured local runtimes & CI projections

All measurements from local hardware. CI-projection column applies a 1.9× factor (derived from the existing blocktests-stable CI baseline of 832s ÷ local 428s = 1.94 on the GitHub-hosted ubuntu-24.04 runner).

Race-detector shards (5 total)

Built once with go build -race. Each sub-shard filters its workload via a --run fork_<X> regex.

Shard	Tests	Failed (max-allowed)	Local wall	Est. CI wall
`blocktests-stable-race-pre-cancun`	8,947	0	3m 27s (measured)	~7 min
`blocktests-stable-race-cancun`	17,685	0	~6m 50s (extrapolated)	~13 min
`blocktests-stable-race-prague`	20,858	0	~8m 00s (extrapolated)	~15 min
`blocktests-stable-race-osaka`	21,558	0	~8m 20s (extrapolated)	~16 min
`blocktests-devnet-race-amsterdam`	24,270	6,082	8m 47s (measured)	~17 min

The -parallel siblings of every blocktests-stable-race-* shard (and the devnet-race-amsterdam shard) are listed in the next table — they share the same fixture filtering but add ERIGON_EXEC3_PARALLEL=true.
Extrapolated rows use the measured per-test rate from …-pre-cancun (~23 ms/test under race) scaled by each shard's test count.

Parallel-exec shards (5 new + 2 toggled)

Failure budgets come from a fresh sequential run after pulling latest main. exec3-parallel: true in tools/eest-spec-shards.json causes the runner script to export ERIGON_EXEC3_PARALLEL=true.

Shard	Tests	Failed (max-allowed)	Local wall	Est. CI wall	Status
`blocktests-stable-parallel`	69,256	270	7m 03s	~13 min	new
`blocktests-stable-race-pre-cancun-parallel`	8,947	159	3m 28s	~7 min	new
`blocktests-stable-race-cancun-parallel`	17,783	40	4m 23s	~8 min	new
`blocktests-stable-race-prague-parallel`	20,945	36	5m 49s	~11 min	new
`blocktests-stable-race-osaka-parallel`	21,564	36	5m 36s	~11 min	new
`blocktests-devnet`	137,647	6,383	15m 27s	~29 min	now parallel (was serial)
`blocktests-devnet-race-amsterdam`	24,270	6,082	8m 47s	~17 min	now parallel (was serial)

Notes:

270 stable-parallel failures, 0 stable-serial failures — every one of those 270 is a parallel-only regression on stable fixtures (the serial path already runs as a separate shard with max-allowed-failures: 0, so any regression there is caught).
Devnet failure counts shifted a bit vs the prior measurements after the main pull:
- blocktests-devnet: 6,206 → 6,383 (+177).
- blocktests-devnet-race-amsterdam: 6,140 → 6,082 (−58).
Race + parallel sub-shard failure budgets distribute as: pre-cancun=159 (largest because spans 9 forks), cancun/prague/osaka all ~36–40 — most of the parallel-only stable regressions land in the older forks under race conditions.
The engineapitester (engine-x) test runner is currently incompatible with parallel exec3 (validateAndStorePayload: unexpected state step has more work). Engine-x benchmark -parallel clones were considered, measured (all 7,538 tests fail), and dropped from this PR — they'll come back in a follow-up once the runner is fixed.

Shard runtime budget summary

All race and parallel shards fit comfortably under a 30-min CI timeout. Slowest is blocktests-devnet at ~29 min projected (still under the target). The matrix runs all of them in parallel so the workflow's wall time is gated by the slowest shard, not the sum.

taratorio · 2026-05-12T06:11:36Z

@yperbasis note I added several -race shards for blocktests plus some parallel exec shards too as per #21092 (comment)

overall CI time is under 30 mins

the only thing is that we now have a lot more parallel runs, so this may put pressure on our GH runners - it may end up being ok but we may also end up needing another runner or two (let's monitor, im not really sure what will happen once this lands but I can quickly disable some shards if it causes issues until we get more runners)

taratorio · 2026-05-12T06:29:20Z

successful run: https://github.com/erigontech/erigon/actions/runs/25716252300?pr=21092
34 min (slowest is still kurtosis assertoor regular as before, no time impact)

…c-test-ci

AskAlexSharov · 2026-05-12T07:51:22Z

oh, nice

…3_PARALLEL flakes (erigontech#21136) (erigontech#21153) Splits the substantive changes out of erigontech#21017 (which is left with just the serial-vs-parallel exec-mode CI matrix yamls). With these merged, erigontech#21017's new parallel CI legs go green. ## What's here **Parallel-exec correctness — the SD-of-pre-existing-contract / recreate / fork-transition family** - `execution/stagedsync, execution/state`: parallel SD-of-pre-existing-contract fixes — `IteratePrefix(StorageDomain, addr)` to emit the full `StoragePath=0` cascade on self-destruct (covers genesis-allocated / prior-block storage that `vm.StorageKeys` doesn't see), drop `StoragePath` writes for SD'd addresses, post-SD baseline-is-zero handling, EIP-161 empty-removal gated by SpuriousDragon, etc. - `execution/state`: don't emit `StoragePath=0` writes from `IBS.Selfdestruct`. - `execution/stagedsync`: clear calc `Deleted` on a non-SD account-field write even when zero (a 0-value transfer that re-creates a previously-destroyed address as an empty account, pre-EIP-161). - `execution/stagedsync`: install the per-block changeset accumulator before any of the block's writes (idempotent `ensureChangesetAccumulator`), so changeset capture is robust against out-of-band block scheduling. - `execution/stagedsync`: clean exit when a single-block fork-validation batch already covered `maxBlockNum` (avoids a spurious `ErrLoopExhausted` → "unexpected state step has more work"). **Performance** - `execution/stagedsync`: O(1) `CollectorWrites` fee-balance update. After a tx finalizes, `nextResult` re-applies fee-adjusted balances into `result.CollectorWrites`; `CollectorWrites` is a flat slice, so doing that per-`addWrites`-entry was O(len(addWrites)·len(CollectorWrites)). For a block-end IBS reconstruction both are ~one entry per account the block touched, so a block whose tx pays ~100k accounts (`execution/tests`' `invalid-receipt-hash-high-mgas` corner) hit ~10^10 comparisons — `TestInvalidReceiptHashHighMgas` ran >11 min under `EXEC3_PARALLEL` + `-race` and tripped `mock_cl`'s 10-min per-call deadline (serial: ~10 s). Index `CollectorWrites`' `BalancePath` entries by address once (first match wins, mirroring the old `SetBalance` linear scan) and update in O(1); the test now matches serial. Removed the now-dead `VersionedWrites.SetBalance`. **Test gating for residual `EXEC3_PARALLEL` flakes — tracked in erigontech#21136** - `t.Skip`/`SkipLoad` under `dbg.Exec3Parallel` for: `TestEIP7708BurnLogWhenCoinbaseSelfDestructs`, `TestLegacyBlockchain/ValidBlocks/bcEIP3675/tipInsideBlock`, and `TestValidateChainAndUpdateForkChoiceWithSideForksThatGoBackAndForwardInHeight` (intermittent wrong-trie-root under heavy concurrent test load — the parallel reorg/commitment path is timing-sensitive). Serial coverage of all of these is unaffected. (The `eest_blockchain` `SkipLoad`s from the original gating commit are no longer applicable — that package was removed by erigontech#21092 — and were dropped accordingly.) ## Notes - Cleanly cherry-picked onto current `main`. `make lint` clean; `make test-short` clean; `execution/state` + `execution/stagedsync` + `execution/execmodule` tests pass serial **and** `EXEC3_PARALLEL=true`; `TestInvalidReceiptHashHighMgas` passes both modes (~12 s parallel, no `-race`). - The serial/parallel CI matrix that exercises the new parallel legs lives in erigontech#21017 — once this merges, erigontech#21017 will be trimmed to just those workflow yamls and rebased on top. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>

Main's #21092 removed both execution-eest-blockchain and execution-eest-devnet from tools/test-groups when spec tests were extracted out of make test-all. The merge resolution mistakenly kept the PR-side branch with the devnet entry plus a stale "Temporarily disable the blockchain shard" comment. Drop the devnet entry and the comment; keep only the PR's own execution-eest-zkevm contribution.

Per PR #21629 review (the #21092 pattern): the ~22k-test zkevm execution-witness conformance suite must not run under `go test` / `make test-all`. Replace it with a `zkevmtest` cmd/evm subcommand executed as parallel `eest-spec-tests` shards (incl. a -race variant). - cmd/evm/zkevmrunner.go: new `zkevmtest` subcommand — imports each EEST zkevm blockchain_tests fixture, queries debug_executionWitness (canonical) per block, multiset-compares state/codes/headers. Expected-fail patterns for the known #21563/#21566 divergences are ported via TestMatcher so the shard budget stays 0. - testutil: add BlockTest.RunWithTesterCLI (nil tb, caller-owned tester) and TestMatcher.CheckFailureForName for CLI runners; RunCLI now shares the path. - shards: add zkevm-witness + zkevm-witness-race-all to eest-spec-shards.yml, route them in run-eest-spec-test.sh, cache the eest_zkevm tarball in test-eest-spec.yml, and add Makefile targets provisioning test-fixtures-zkevm. - remove the go-test suite (execution/tests/eest_zkevm_witness), its execution-eest-zkevm test-group, the zkevm steps in test-all-erigon-race.yml, and the GOTEST_PACKAGES filter (back to ./...).

…21629) Adds the zkEVM (canonical) conformance runner for `debug_executionWitness`, the legacy-mode field population, the optional `mode` request param, and a format spec. ## Changes - **`zkevmtest` runner** — a `cmd/evm` subcommand (mirroring `staterunner`/`blockrunner`/`enginexrunner`) that checks `debug_executionWitness` (canonical mode) against the `ethereum/execution-spec-tests` zkevm@v0.4.0 corpus (~22.5k tests). Runs as the `zkevm-witness-race` shard in the `eest-spec-tests` workflow, not under `go test`. Known divergences (erigontech#21563, erigontech#21566) are absorbed in the runner so the shard budget stays 0. - **legacy `keys`/`codes`** — `keys` holds the 20B address / 32B storage-slot preimages; `codes` holds pre-state bytecode incl. EIP-7702 designators and the empty-code entry. - **`mode` request param** (`legacy`/`canonical`). - **Spec** at `docs/plans/witness-legacy-mode-spec.md`, re-verified against code. ## erigontech#21487 Carries erigontech#21487's suite plus the fixes that make it pass; erigontech#21487 is redundant once CI is green here. The open question from erigontech#21487 — whether a fixture-gated suite belongs in the required race-gate — is resolved: it's now a `zkevmtest` runner shard in the `eest-spec-tests` workflow (the erigontech#21092 pattern), not a `go test` package. ## Mainnet legacy soak vs reth-legacy Rolling soak near tip (block ~`25.25M`, ~3.7k blocks checked): - 0 health failures — no 500s, no root mismatches; every witness passes `verifyWitnessStateless`. - 90.6% strict set-equal on `{keys, codes, state}`. - Divergences are almost all over-inclusion (204 blocks `keys`, 86 `codes`, 62 `state`) — a few extra nodes per block, still re-executing to the correct root. 3 blocks show minor `keys`/`codes` under-inclusion, none affecting `verifyWitnessStateless`.

…h#21640) ## Problem `tools/run-eest-spec-test.sh` invokes the cmd/evm runner with `|| true` and decides shard pass/fail from the `--jsonout` report alone. That design is correct for the non-race shards (the runners always exit 0 and report via JSON; crashes surface as missing/truncated JSON, which the script catches via the `total == 0` check and `set -euo pipefail` on jq). It is wrong for the `*-race-*` shards: the Go race runtime reports detected races to **stderr** and overrides the process exit code to **66** (even through `os.Exit(0)`), while the run completes normally and the JSON stays fully valid with every test passing. The `|| true` discards that exit code, so a race shard whose binary detected a data race still went green — the `WARNING: DATA RACE` report only visible in the job log. The race detector's whole value is catching races that *don't* happen to corrupt results, so this silently weakened the guarantee of every race shard: ``` blocktests-stable-race-{pre-cancun,cancun,prague,osaka}-{sequential,parallel} blocktests-devnet-race-amsterdam ``` (and `zkevm-witness-race-all` once erigontech#21629 lands). Pre-existing since the shards were introduced in erigontech#21092; surfaced while reviewing erigontech#21629, which doubles the corpus running under the race detector. ## Fix Capture the runner's exit code and fail the shard explicitly on 66 with a pointer to the race report above it in the log. All other exit-code tolerance is unchanged. ## Verification (local repro) Built a minimal `-race` Go binary containing a genuine data race that emits valid passing JSON (`[{"name":"t1","pass":true}]`) and calls `os.Exit(0)`, then ran it through the real script as `EVM_BIN` against `blocktests-stable-race-cancun-sequential`: - **Before** (unmodified script): binary prints `Found 1 data race(s)` and exits 66 → script prints `ran 1 tests, 0 failed` and exits **0** (green despite the race). - **After**: same run exits **1** with `ERROR: data race detected (race runtime exit code 66); see WARNING: DATA RACE in the log above`. Regression matrix with stub binaries through the patched script, confirming only the race case changes behavior: | Scenario | Before | After | |---|---|---| | race detected (exit 66, valid passing JSON) | exit 0 | **exit 1** | | clean pass (exit 0) | exit 0 | exit 0 | | non-zero exit ≠ 66 with valid JSON | exit 0 (tolerated by design) | exit 0 | | test failures over budget | exit 1 | exit 1 | `shellcheck --severity=warning` (the lint-workflow invocation) is clean on the modified script; `make lint` clean. The race shards running on this PR exercise the patched path end-to-end in CI (race-free run → `rc=0` → behavior identical to before). ## Note on TDD This is a shell-script CI guard with no test harness; the Red → Green cycle above (reproduce the masked exit code through the unmodified script, then flip it with the fix) stands in for a checked-in failing test. ## End-to-end verification with a real race in erigon code Repeated the verification with the full production pipeline instead of stub binaries: planted a deliberate data race in `IntraBlockState.CommitBlock` (two unsynchronized goroutines incrementing a package var, executed once per imported block — never committed), built the real `evm.race` via `make evm.race`, and ran a single real EEST fixture (`cancun/eip1153_tstore/test_basic_tload_after_store.json`, one `fork_Cancun` test) through the `blocktests-stable-race-cancun-sequential` shard: - **old script** (origin/main): race report correctly attributes `intra_block_state.go` in `CommitBlock.func1`, binary prints `Found 1 data race(s)` and exits 66 → script prints `ran 1 tests, 0 failed`, **exit 0** — green despite the race. - **this PR's script**: same binary, same fixture → **exit 1**, last line `ERROR: data race detected (race runtime exit code 66); see WARNING: DATA RACE in the log above`. - **control**: race reverted, `evm.race` rebuilt clean, same fixture through this PR's script → 0 race warnings, `ran 1 tests, 0 failed`, **exit 0** — no false positive.

taratorio added 2 commits May 11, 2026 01:27

execution: extract spec tests out of make test-all

6ca4231

Merge branch 'main' of github.com:erigontech/erigon into worktree-spe…

d32025c

…c-test-ci

taratorio requested review from AskAlexSharov, Giulio2002, mh0lt, mriccobene, sudeepdino008 and yperbasis as code owners May 11, 2026 01:33

taratorio changed the title ~~execution: extract spec tests out of make test-all~~ [DO-NOT-MERGE] execution: extract spec tests out of make test-all May 11, 2026

taratorio added 2 commits May 11, 2026 01:41

fix yml skip matrix

fdb6cfd

dedup cl spectest

951e9fe

taratorio requested a review from domiwei as a code owner May 11, 2026 03:22

taratorio added 5 commits May 11, 2026 03:29

rename to eest spec test

0755e8f

add benchmark shard to eest-spec-test

7e2c7ea

Merge branch 'main' of github.com:erigontech/erigon into worktree-spe…

d5ea051

…c-test-ci

update skills

9d33a45

shard benchmarks into sub-gas targets

fcbee04

taratorio changed the title ~~[DO-NOT-MERGE] execution: extract spec tests out of make test-all~~ execution: extract spec tests out of make test-all May 11, 2026

yperbasis added the QA label May 11, 2026

yperbasis requested review from anacrolix and Copilot May 11, 2026 10:07

Copilot started reviewing on behalf of yperbasis May 11, 2026 10:08 View session

yperbasis approved these changes May 11, 2026

View reviewed changes

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread tools/test-fixtures.sh Outdated

Comment thread cmd/evm/enginexrunner.go Outdated

Comment thread .claude/skills/erigon-test-all/SKILL.md

Comment thread .claude/skills/erigon-implement-eip/SKILL.md Outdated

yperbasis requested changes May 11, 2026

View reviewed changes

taratorio added 2 commits May 12, 2026 04:40

add parallel runs

0f7a5d1

Merge branch 'main' of github.com:erigontech/erigon into worktree-spe…

bd83072

…c-test-ci

taratorio added 2 commits May 12, 2026 05:22

update skilss

5f5cd52

update failures

f9e2989

taratorio added this pull request to the merge queue May 12, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026

taratorio added 2 commits May 12, 2026 07:08

update failures

6287ffa

Merge branch 'main' of github.com:erigontech/erigon into worktree-spe…

4942c28

…c-test-ci

taratorio enabled auto-merge May 12, 2026 07:11

taratorio added this pull request to the merge queue May 12, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026

update parallel exec failures with 100 buffer due to flakes

58b5f57

taratorio enabled auto-merge May 12, 2026 09:12

taratorio added this pull request to the merge queue May 12, 2026

Merged via the queue into main with commit 4646132 May 12, 2026
58 checks passed

taratorio deleted the worktree-spec-test-ci branch May 12, 2026 10:22

mh0lt mentioned this pull request May 12, 2026

execution: parallel-exec correctness + perf fixes; gate residual EXEC3_PARALLEL flakes (#21136) #21153

Merged

taratorio mentioned this pull request Jun 5, 2026

debug_executionWitness: legacy + canonical conformance #21629

Merged

taratorio mentioned this pull request Jun 5, 2026

tools: fail EEST spec shards on race-detector exit code 66 #21640

Merged

Conversation

taratorio commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taratorio commented May 11, 2026

Extract EEST spec tests out of make test-all

Why

Duration: test-all-erigon.yml + test-all-erigon-race.yml

Coverage: test-eest-spec.yml (new) vs. previous Go test equivalents

Uh oh!

taratorio commented May 11, 2026

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Minor suggestions (non-blocking)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

taratorio commented May 11, 2026

Uh oh!

taratorio commented May 11, 2026

Uh oh!

yperbasis commented May 11, 2026

Uh oh!

taratorio commented May 12, 2026

Minor suggestions (non-blocking)

Uh oh!

taratorio commented May 12, 2026

New race + parallel exec shards: measured local runtimes & CI projections

Race-detector shards (5 total)

Parallel-exec shards (5 new + 2 toggled)

Shard runtime budget summary

Uh oh!

taratorio commented May 12, 2026

Uh oh!

taratorio commented May 12, 2026

Uh oh!

Uh oh!

AskAlexSharov commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

taratorio commented May 11, 2026 •

edited

Loading

Extract EEST spec tests out of `make test-all`

Duration: `test-all-erigon.yml` + `test-all-erigon-race.yml`

Coverage: `test-eest-spec.yml` (new) vs. previous Go test equivalents