execution: extract spec tests out of make test-all#21092
Conversation
Extract EEST spec tests out of
|
| Workflow | Shard | Before | After | Δ |
|---|---|---|---|---|
test-all-erigon.yml |
ubuntu-24.04 | 26m 53s | 3m 51s | −86% |
test-all-erigon.yml |
windows-2025 | 23m 34s | 15m 19s | −35% |
test-all-erigon.yml |
macos-15 | 5m 22s | 4m 09s | −23% |
test-all-erigon-race.yml |
execution-tests | 33m 39s | 2m 27s | −93% |
test-all-erigon-race.yml |
consensus | 15m 15s | 2m 52s | −81% |
test-all-erigon-race.yml |
execution-eest-blockchain | 15m 23s | removed | -- |
test-all-erigon-race.yml |
execution-eest-devnet | 16m 40s | removed | -- |
test-all-erigon-race.yml |
execution-other | 7m 14s | 4m 07s | −43% |
test-all-erigon-race.yml |
other | 6m 12s | 3m 28s | −44% |
test-all-erigon-race.yml |
core-rpc | 6m 02s | 2m 41s | −55% |
test-integration-caplin.yml |
ubuntu-24.04 | 8m 57s | 4m 01s | −55% |
test-integration-caplin.yml |
windows-2025 | 7m 50s | 3m 08s | −60% |
The two workflows that previously dominated wall time on the PR gate (ubuntu test-all at 27m, race execution-tests at 34m) now finish in under 4 minutes each. The race workflow loses two whole matrix cells, freeing two runner slots and ~16 minutes each.
Coverage: test-eest-spec.yml (new) vs. previous Go test equivalents
The new workflow has 13 shards (one is currently skipped because the devnet tarball doesn't yet ship blockchain_tests_engine_x/). All shards run in parallel; the workflow finishes in ~23 min wall time (gated by the slowest shard, blocktests-devnet).
| Shard | After: tests | After: duration | Before: equivalent | Before: tests |
|---|---|---|---|---|
eest-spec-statetests-stable |
63,556 | 10m 30s | TestState (walked state_tests/static/state_tests, skipped stTimeConsuming/) |
~40,855 |
eest-spec-statetests-devnet |
120,073 | 14m 22s | not run | N/A |
eest-spec-blocktests-stable |
69,256 | 13m 52s | TestExecutionSpecBlockchain + …CancunBlobs + …PragueCalldata + …OsakaCLZ + …FrontierScenarios (with ~10+ SkipLoad patterns combined) |
subset of after |
eest-spec-blocktests-devnet |
137,647 | 22m 51s | TestExecutionSpecBlockchainDevnet (whitelisted for_amsterdam/, ~150 SkipLoad patterns) |
subset of after |
eest-spec-enginextests-stable |
63,920 | 16m 49s | not run | N/A |
eest-spec-enginextests-devnet |
skipped (no fixture yet) | 1m 57s | not run | N/A |
eest-spec-enginextests-benchmark-1m |
1,076 | 8m 20s | BenchmarkEngineX* (gated behind BENCH_ENGINE_X_MANUAL_ALLOW) |
N/A |
eest-spec-enginextests-benchmark-5m |
1,077 | 8m 13s | BenchmarkEngineX* |
N/A |
eest-spec-enginextests-benchmark-10m |
1,077 | 8m 19s | BenchmarkEngineX* |
N/A |
eest-spec-enginextests-benchmark-30m |
1,077 | 10m 06s | BenchmarkEngineX* |
N/A |
eest-spec-enginextests-benchmark-60m |
1,077 | 12m 27s | BenchmarkEngineX* |
N/A |
eest-spec-enginextests-benchmark-100m |
1,077 | 14m 50s | BenchmarkEngineX* |
N/A |
eest-spec-enginextests-benchmark-150m |
1,077 | 18m 55s | BenchmarkEngineX* |
N/A |
| Totals (currently-running shards) | 463,990 | 22m 51s (slowest) | Subset of state-tests + blockchain-tests only; no devnet state, no engine-x at all | -- |
The before-coverage rows that are not N/A were running, but in every case as a subset of the post-PR path — TestState walked one subdirectory of state_tests/, TestExecutionSpecBlockchainDevnet walked one fork dir under for_amsterdam/ with ~150 hand-curated SkipLoad regexes, and the stable blockchain-test variants similarly had SkipLoad exclusions. None of statetests-devnet, enginextests-stable, or enginextests-benchmark-* were exercised at all in CI before.
The post-PR matrix runs ~191,500 tests that were not running at all in CI before (state-devnet + engine-x stable + engine-x benchmark), and adds tens of thousands more from dropped SkipLoad lists on the previously-running suites — all while shrinking PR-critical-path workflows (test-all-erigon.yml, test-all-erigon-race.yml) to a fraction of their previous wall time.
|
@yperbasis this is ready for review |
yperbasis
left a comment
There was a problem hiding this comment.
Minor suggestions (non-blocking)
-
Two sources of truth for failure budgets. Per-shard
max-allowed-failuresdefaults exist in both.github/workflows/test-eest-spec.yml:209-221andtools/run-eest-spec-test.sh:1582-1623, with a comment asking future editors to keep them in sync. Could be a single JSON/YAML manifest consumed by both. Minor. -
Unconditional
timedExecinrunEngineXGroup(cmd/evm/enginexrunner.go:280-282). The wrapper runs even when--timeis off, paying tworuntime.ReadMemStats(stop-the-world) calls per test. With ~64k tests this is ~128k µs-scale STW pauses — sub-second total, almost certainly fine, butif timeIt { … } else { … }would avoid it entirely. The inline comment ("Always timedExec(bench=false)…") is honest about the intent. -
Doc/code inconsistency. The
runEngineXGroupdocstring (cmd/evm/enginexrunner.go:275-277) says "When timeIt is true, each test's single Run is wrapped in timedExec" — but the body wraps unconditionally and only ther.Statsassignment is gated ontimeIt. The inline comment three lines down is correct; consider tightening the docstring to match. -
tools/test-fixtures.shedge case (tools/test-fixtures.sh:25-26). With exactly one positional arg,shift 2 2>/dev/null || truefails silently andkeys=("$@")then captures the manifest path as a key. No current caller hits this, but(( $# > 2 )) && shift 2; keys=("$@")would be safer. -
make test-fixturesreachability. After the PR,test-fixturesis no longer a transitive dep oftest-all/test-all-race/test-group/test-sonar-coverage. It still works when invoked directly and downloads the full set (cl + 3× eest), but it's effectively unreferenced from the Makefile. Consider whether to keep it as a convenience alias or delete it in favor of the two narrower targets.
There was a problem hiding this comment.
Pull request overview
This PR removes execution/consensus spec-test suites (EEST and cl/spectest) from the default make test-all / make test-group Go test sweep, and runs them via dedicated Make targets and CI workflows to avoid large fixture downloads and long-running tests in the main test job.
Changes:
- Split fixture handling into targeted Make targets (
test-fixtures-cl,test-fixtures-eest) and add key-filter support totools/test-fixtures.sh. - Add a dedicated EEST spec-test workflow (
test-eest-spec.yml) + Makefile shard targets driven bytools/run-eest-spec-test.sh. - Skip
cl/spectestduringmake test-all(via env var) and remove EEST Go test packages/benchmarks previously executed undergo test ./....
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/test-groups | Removes EEST-specific Go test groups now that EEST is run outside go test ./.... |
| tools/test-fixtures.sh | Adds optional manifest key filtering for fixture downloads/extraction. |
| tools/run-eest-spec-test.sh | New driver script to run one EEST shard via cmd/evm JSON output and enforce failure budgets. |
| Makefile | Removes test-fixtures from test-all/test-group prerequisites; adds test-fixtures-cl, test-fixtures-eest, and eest-spec-* shard targets. |
| execution/tests/state_test.go | Removes EEST state-test Go test; keeps legacy Cancun tests and updates commentary. |
| execution/tests/init_test.go | Drops eestDir fixture path variable (no longer used). |
| execution/tests/eest_prague_calldata/testmain_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_prague_calldata/block_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_osaka_clz/testmain_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_osaka_clz/block_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_frontier_scenarios/testmain_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_frontier_scenarios/block_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_devnet/testmain_test.go | Removes EEST devnet Go test package (moved out of go test). |
| execution/tests/eest_devnet/block_test.go | Removes EEST devnet Go test package (moved out of go test). |
| execution/tests/eest_cancun_blobs/testmain_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_cancun_blobs/block_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_blockchain/testmain_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/eest_blockchain/block_test.go | Removes EEST Go test package (moved out of go test). |
| execution/tests/benchmark_engine_x_test.go | Removes Go benchmark that depended on EEST benchmark fixtures (replaced by shard workflow/flags). |
| execution/execmodule/execmoduletester/exec_module_tester.go | Fixes spelling/clarifies comment around TempDir usage. |
| execution/engineapi/engineapitester/engine_x_test_runner.go | Removes Execute helper (now relying on Run/other paths). |
| db/kv/memdb/memory_database.go | Fixes spelling/clarifies comment around TempDir usage. |
| cmd/evm/main.go | Adds --time flag definition for per-test timing/memstats capture. |
| cmd/evm/enginexrunner.go | Wires --time flag and adds timing output to stderr + JSON stats attachment. |
| cl/spectest/tests_test.go | Skips consensus spec tests when ERIGON_SKIP_CL_SPECTEST=true (used by make test-all). |
| cl/spectest/Makefile | Switches fixture prerequisite from test-fixtures to test-fixtures-cl. |
| .github/workflows/test-integration-caplin.yml | Adds caching for the cl_mainnet.tar.gz fixture tarball. |
| .github/workflows/test-eest-spec.yml | New workflow to run EEST shards with failure budgets and cached tarballs. |
| .github/workflows/test-all-erigon.yml | Removes generic fixture tarball caching now that spec tests are extracted from test-all. |
| .github/workflows/ci-gate.yml | Adds the new EEST spec-test workflow to the CI gate. |
| .claude/skills/erigon-test-race/SKILL.md | Updates guidance to reflect spec tests no longer run under make test-all-race. |
| .claude/skills/erigon-test-all/SKILL.md | Updates guidance to reflect spec tests moved to dedicated targets/workflows. |
| .claude/skills/erigon-implement-eip/SKILL.md | Updates guidance to reference EEST shards as primary EL validation. |
| .claude/skills/erigon-ci/SKILL.md | Updates CI map to include the EEST spec-test workflow and new local commands. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
But let's make at least some of the new shards run with the race detector, especially given that we're working on parallel execution.
Unfortunately the race detector adds a 3X overhead and makes the shards very slow, e.g.:
I'm uncertain that -race is a good thing for us to do for 200,000 spec tests. Note these tests will continue growing with every fork so it just would be very hard to scale this. |
|
Note that the -race detector remains for all the other tests covered by |
|
Hmm. Perhaps we could run |
all of these are now addressed |
New race + parallel exec shards: measured local runtimes & CI projectionsAll measurements from local hardware. CI-projection column applies a 1.9× factor (derived from the existing Race-detector shards (5 total)Built once with
Parallel-exec shards (5 new + 2 toggled)Failure budgets come from a fresh sequential run after pulling latest
Notes:
Shard runtime budget summaryAll race and parallel shards fit comfortably under a 30-min CI timeout. Slowest is |
|
@yperbasis note I added several -race shards for blocktests plus some parallel exec shards too as per #21092 (comment) overall CI time is under 30 mins the only thing is that we now have a lot more parallel runs, so this may put pressure on our GH runners - it may end up being ok but we may also end up needing another runner or two (let's monitor, im not really sure what will happen once this lands but I can quickly disable some shards if it causes issues until we get more runners) |
|
successful run: https://github.com/erigontech/erigon/actions/runs/25716252300?pr=21092 |
|
oh, nice |
…3_PARALLEL flakes (erigontech#21136) (erigontech#21153) Splits the substantive changes out of erigontech#21017 (which is left with just the serial-vs-parallel exec-mode CI matrix yamls). With these merged, erigontech#21017's new parallel CI legs go green. ## What's here **Parallel-exec correctness — the SD-of-pre-existing-contract / recreate / fork-transition family** - `execution/stagedsync, execution/state`: parallel SD-of-pre-existing-contract fixes — `IteratePrefix(StorageDomain, addr)` to emit the full `StoragePath=0` cascade on self-destruct (covers genesis-allocated / prior-block storage that `vm.StorageKeys` doesn't see), drop `StoragePath` writes for SD'd addresses, post-SD baseline-is-zero handling, EIP-161 empty-removal gated by SpuriousDragon, etc. - `execution/state`: don't emit `StoragePath=0` writes from `IBS.Selfdestruct`. - `execution/stagedsync`: clear calc `Deleted` on a non-SD account-field write even when zero (a 0-value transfer that re-creates a previously-destroyed address as an empty account, pre-EIP-161). - `execution/stagedsync`: install the per-block changeset accumulator before any of the block's writes (idempotent `ensureChangesetAccumulator`), so changeset capture is robust against out-of-band block scheduling. - `execution/stagedsync`: clean exit when a single-block fork-validation batch already covered `maxBlockNum` (avoids a spurious `ErrLoopExhausted` → "unexpected state step has more work"). **Performance** - `execution/stagedsync`: O(1) `CollectorWrites` fee-balance update. After a tx finalizes, `nextResult` re-applies fee-adjusted balances into `result.CollectorWrites`; `CollectorWrites` is a flat slice, so doing that per-`addWrites`-entry was O(len(addWrites)·len(CollectorWrites)). For a block-end IBS reconstruction both are ~one entry per account the block touched, so a block whose tx pays ~100k accounts (`execution/tests`' `invalid-receipt-hash-high-mgas` corner) hit ~10^10 comparisons — `TestInvalidReceiptHashHighMgas` ran >11 min under `EXEC3_PARALLEL` + `-race` and tripped `mock_cl`'s 10-min per-call deadline (serial: ~10 s). Index `CollectorWrites`' `BalancePath` entries by address once (first match wins, mirroring the old `SetBalance` linear scan) and update in O(1); the test now matches serial. Removed the now-dead `VersionedWrites.SetBalance`. **Test gating for residual `EXEC3_PARALLEL` flakes — tracked in erigontech#21136** - `t.Skip`/`SkipLoad` under `dbg.Exec3Parallel` for: `TestEIP7708BurnLogWhenCoinbaseSelfDestructs`, `TestLegacyBlockchain/ValidBlocks/bcEIP3675/tipInsideBlock`, and `TestValidateChainAndUpdateForkChoiceWithSideForksThatGoBackAndForwardInHeight` (intermittent wrong-trie-root under heavy concurrent test load — the parallel reorg/commitment path is timing-sensitive). Serial coverage of all of these is unaffected. (The `eest_blockchain` `SkipLoad`s from the original gating commit are no longer applicable — that package was removed by erigontech#21092 — and were dropped accordingly.) ## Notes - Cleanly cherry-picked onto current `main`. `make lint` clean; `make test-short` clean; `execution/state` + `execution/stagedsync` + `execution/execmodule` tests pass serial **and** `EXEC3_PARALLEL=true`; `TestInvalidReceiptHashHighMgas` passes both modes (~12 s parallel, no `-race`). - The serial/parallel CI matrix that exercises the new parallel legs lives in erigontech#21017 — once this merges, erigontech#21017 will be trimmed to just those workflow yamls and rebased on top. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: milen <94537774+taratorio@users.noreply.github.com>
Main's #21092 removed both execution-eest-blockchain and execution-eest-devnet from tools/test-groups when spec tests were extracted out of make test-all. The merge resolution mistakenly kept the PR-side branch with the devnet entry plus a stale "Temporarily disable the blockchain shard" comment. Drop the devnet entry and the comment; keep only the PR's own execution-eest-zkevm contribution.
Per PR #21629 review (the #21092 pattern): the ~22k-test zkevm execution-witness conformance suite must not run under `go test` / `make test-all`. Replace it with a `zkevmtest` cmd/evm subcommand executed as parallel `eest-spec-tests` shards (incl. a -race variant). - cmd/evm/zkevmrunner.go: new `zkevmtest` subcommand — imports each EEST zkevm blockchain_tests fixture, queries debug_executionWitness (canonical) per block, multiset-compares state/codes/headers. Expected-fail patterns for the known #21563/#21566 divergences are ported via TestMatcher so the shard budget stays 0. - testutil: add BlockTest.RunWithTesterCLI (nil tb, caller-owned tester) and TestMatcher.CheckFailureForName for CLI runners; RunCLI now shares the path. - shards: add zkevm-witness + zkevm-witness-race-all to eest-spec-shards.yml, route them in run-eest-spec-test.sh, cache the eest_zkevm tarball in test-eest-spec.yml, and add Makefile targets provisioning test-fixtures-zkevm. - remove the go-test suite (execution/tests/eest_zkevm_witness), its execution-eest-zkevm test-group, the zkevm steps in test-all-erigon-race.yml, and the GOTEST_PACKAGES filter (back to ./...).
…21629) Adds the zkEVM (canonical) conformance runner for `debug_executionWitness`, the legacy-mode field population, the optional `mode` request param, and a format spec. ## Changes - **`zkevmtest` runner** — a `cmd/evm` subcommand (mirroring `staterunner`/`blockrunner`/`enginexrunner`) that checks `debug_executionWitness` (canonical mode) against the `ethereum/execution-spec-tests` zkevm@v0.4.0 corpus (~22.5k tests). Runs as the `zkevm-witness-race` shard in the `eest-spec-tests` workflow, not under `go test`. Known divergences (erigontech#21563, erigontech#21566) are absorbed in the runner so the shard budget stays 0. - **legacy `keys`/`codes`** — `keys` holds the 20B address / 32B storage-slot preimages; `codes` holds pre-state bytecode incl. EIP-7702 designators and the empty-code entry. - **`mode` request param** (`legacy`/`canonical`). - **Spec** at `docs/plans/witness-legacy-mode-spec.md`, re-verified against code. ## erigontech#21487 Carries erigontech#21487's suite plus the fixes that make it pass; erigontech#21487 is redundant once CI is green here. The open question from erigontech#21487 — whether a fixture-gated suite belongs in the required race-gate — is resolved: it's now a `zkevmtest` runner shard in the `eest-spec-tests` workflow (the erigontech#21092 pattern), not a `go test` package. ## Mainnet legacy soak vs reth-legacy Rolling soak near tip (block ~`25.25M`, ~3.7k blocks checked): - 0 health failures — no 500s, no root mismatches; every witness passes `verifyWitnessStateless`. - 90.6% strict set-equal on `{keys, codes, state}`. - Divergences are almost all over-inclusion (204 blocks `keys`, 86 `codes`, 62 `state`) — a few extra nodes per block, still re-executing to the correct root. 3 blocks show minor `keys`/`codes` under-inclusion, none affecting `verifyWitnessStateless`.
…h#21640) ## Problem `tools/run-eest-spec-test.sh` invokes the cmd/evm runner with `|| true` and decides shard pass/fail from the `--jsonout` report alone. That design is correct for the non-race shards (the runners always exit 0 and report via JSON; crashes surface as missing/truncated JSON, which the script catches via the `total == 0` check and `set -euo pipefail` on jq). It is wrong for the `*-race-*` shards: the Go race runtime reports detected races to **stderr** and overrides the process exit code to **66** (even through `os.Exit(0)`), while the run completes normally and the JSON stays fully valid with every test passing. The `|| true` discards that exit code, so a race shard whose binary detected a data race still went green — the `WARNING: DATA RACE` report only visible in the job log. The race detector's whole value is catching races that *don't* happen to corrupt results, so this silently weakened the guarantee of every race shard: ``` blocktests-stable-race-{pre-cancun,cancun,prague,osaka}-{sequential,parallel} blocktests-devnet-race-amsterdam ``` (and `zkevm-witness-race-all` once erigontech#21629 lands). Pre-existing since the shards were introduced in erigontech#21092; surfaced while reviewing erigontech#21629, which doubles the corpus running under the race detector. ## Fix Capture the runner's exit code and fail the shard explicitly on 66 with a pointer to the race report above it in the log. All other exit-code tolerance is unchanged. ## Verification (local repro) Built a minimal `-race` Go binary containing a genuine data race that emits valid passing JSON (`[{"name":"t1","pass":true}]`) and calls `os.Exit(0)`, then ran it through the real script as `EVM_BIN` against `blocktests-stable-race-cancun-sequential`: - **Before** (unmodified script): binary prints `Found 1 data race(s)` and exits 66 → script prints `ran 1 tests, 0 failed` and exits **0** (green despite the race). - **After**: same run exits **1** with `ERROR: data race detected (race runtime exit code 66); see WARNING: DATA RACE in the log above`. Regression matrix with stub binaries through the patched script, confirming only the race case changes behavior: | Scenario | Before | After | |---|---|---| | race detected (exit 66, valid passing JSON) | exit 0 | **exit 1** | | clean pass (exit 0) | exit 0 | exit 0 | | non-zero exit ≠ 66 with valid JSON | exit 0 (tolerated by design) | exit 0 | | test failures over budget | exit 1 | exit 1 | `shellcheck --severity=warning` (the lint-workflow invocation) is clean on the modified script; `make lint` clean. The race shards running on this PR exercise the patched path end-to-end in CI (race-free run → `rc=0` → behavior identical to before). ## Note on TDD This is a shell-script CI guard with no test harness; the Red → Green cycle above (reproduce the masked exit code through the unmodified script, then flip it with the fix) stands in for a checked-in failing test. ## End-to-end verification with a real race in erigon code Repeated the verification with the full production pipeline instead of stub binaries: planted a deliberate data race in `IntraBlockState.CommitBlock` (two unsynchronized goroutines incrementing a package var, executed once per imported block — never committed), built the real `evm.race` via `make evm.race`, and ran a single real EEST fixture (`cancun/eip1153_tstore/test_basic_tload_after_store.json`, one `fork_Cancun` test) through the `blocktests-stable-race-cancun-sequential` shard: - **old script** (origin/main): race report correctly attributes `intra_block_state.go` in `CommitBlock.func1`, binary prints `Found 1 data race(s)` and exits 66 → script prints `ran 1 tests, 0 failed`, **exit 0** — green despite the race. - **this PR's script**: same binary, same fixture → **exit 1**, last line `ERROR: data race detected (race runtime exit code 66); see WARNING: DATA RACE in the log above`. - **control**: race reverted, `evm.race` rebuilt clean, same fixture through this PR's script → 0 race warnings, `ran 1 tests, 0 failed`, **exit 0** — no false positive.
see #21092 (comment)