Skip to content

execution/tests: add zkevm execution-witness suite (zkevm@v0.4.0)#21487

Closed
awskii wants to merge 17 commits into
mainfrom
awskii/eest-zkevm-witness
Closed

execution/tests: add zkevm execution-witness suite (zkevm@v0.4.0)#21487
awskii wants to merge 17 commits into
mainfrom
awskii/eest-zkevm-witness

Conversation

@awskii

@awskii awskii commented May 28, 2026

Copy link
Copy Markdown
Member

DRAFT — CI-gating decision needed before merge.

Adds a strict-gate suite validating debug_executionWitness against execution-spec-tests zkevm@v0.4.0, via the PR #21002 fixture-manifest machinery. No t.Skip / bt.Fails / SkipLoad / build-tags.

Plan: docs/plans/completed/20260527-eest-zkevm-witness.md.

Corpus run

~2,515 / 2,871 fixture files pass (~88%). ~356 fail. Failure breakdown:

direction count
Erigon returns more witness elements than EEST 3,923
Erigon returns fewer 4,128
same count, content differs 0

Failures are bidirectional set-membership mismatches in the collected witness, never wrong bytes at matching count. Points at the witness-builder's node/code collection criteria, not serialization. Plus EIP-7702 stateless re-exec rejecting delegated-EOA senders.

Independent of the Amsterdam EIP work in progress elsewhere — would persist after that lands.

Open blocker

As wired, this fails repo-wide CI:

  1. execution-eest-zkevm is in test-all-erigon-race.yml's auto-matrix (required race-tests gate). Fixtures download → ~7k real failures fail the gate.
  2. test-all-erigon.yml runs go test ./... without fixtures → t.Fatalf → fails the required tests gate and local make test-short.

Resolutions (human decision; agent can't pick without violating the no-mute rule):

  • de-gate: move to a dedicated non-required workflow, drop from auto-matrix, exclude from blanket go test ./..., file a tracking issue for the witness-builder finding;
  • fix the witness builder first, keep gating;
  • file issues + add tracked human-authored skips.

awskii added 12 commits May 27, 2026 16:58
… queries (Task 8)

Full corpus run: 15425 PASS / 7107 FAIL subtests across 356/2871 fixture
files. Suite is red by design; failures recorded for human triage, not muted.

Runner fix found during the run: invalid-block fixtures (expectException)
carry an executionWitness but no canonical blocknumber, so debug_executionWitness
can't be queried for them. Capture expectException per block and skip the
witness query for rejected blocks (RunWithTester already asserts rejection).
This removed 972 false "no parseable block number" failures.

Dominant remaining signal: debug_executionWitness returns a strictly smaller
witness (state/codes/headers) than EEST's canonical stateless witness across
nearly every feature - one-directional shortfall, likely a single root cause
in the witness builder. Plus EIP-7702 stateless re-exec rejecting delegated
EOA senders. Recorded under Task 8 results for triage.
…ing, add witness parser unit test

- tools/test-groups: document the greedy-partition ordering invariant
  (subset groups must precede broader ones or resolve empty)
- execution/tests/testutil: add fixture-independent unit test for the
  WitnessBlockTest parser accessors (NumBlocks, ExpectedWitnessForBlock,
  BlockNumberForBlock, BlockExpectsException), covering them when the
  downloaded corpus is absent
- test-all-erigon-race.yml: trim over-verbose cache comment per comment policy
AskAlexSharov pushed a commit that referenced this pull request May 31, 2026
Return `headers` as RLP-encoded bytes instead of JSON header objects,
matching the canonical witness format a stateless verifier consumes.
Field type `[]map[string]any` → `[]hexutil.Bytes`; drops
`marshalWitnessHeader`.

Diverges from Geth's `debug_executionWitness` (header objects) —
deliberate, the target is the canonical format.

Corpus (zkevm@v0.4.0): header hashes match (RLP round-trips), 2846/2881,
no regressions. Suite's `rpcHeaderHashes` switches to RLP decode with
#21487.

Stacked on #21532. Refs #20534.
Comment thread docs/plans/completed/20260527-eest-zkevm-witness.md

@awskii awskii left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added bunch of PRs to improve the passing score to 99.4%:
Merged to main: #21518 #21529 #21537

Opened: #21531, #21532, #21539

…ness

# Conflicts:
#	execution/tests/testutil/block_test_util.go
@awskii

awskii commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

also #21555 was a wrong fix (made zkevm passing) but corpus is not in sync with amsterdam test suite (which is right and follows eip)

awskii added 2 commits June 1, 2026 14:49
main (#21532) changed ExecutionWitnessResult.Headers to []hexutil.Bytes
(RLP-encoded). Update rpcHeaderHashes to RLP-decode like expectedHeaderHashes
instead of reading a map["hash"] field.
zkevm@v0.4.0 charges EIP-8037 AUTH_BASE on 7702 clears of an undelegated
authority; current spec and tests-bal@v7.2.0 refund it. SkipLoad the 21
affected for_amsterdam subtests until the corpus is bumped. Tracking: #21563.
@awskii

awskii commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

Status (branch 007f8dd7c5)

Pass rate: 2845/2871 files (99.1%), up from ~88% at open — via the merged witness PRs #21518, #21529, #21531, #21532, #21539, #21543, this branch's main merge + RLP-header consumer fix (#21532 follow-up), and skipping the stale-fixture gas cases below.

Skipped (stale fixtures, not erigon): 21 EIP-7702 subtests where zkevm@v0.4.0 charges EIP-8037 AUTH_BASE on undelegated clears but current spec (and tests-bal@v7.2.0) refunds it. Tracked in #21563; remove on corpus bump.

Remaining 25 failing files:

  • 22 × eip8025_optional_proofs/witness_validation_* — stateless-verifier negative tests ("removing/adding X should fail/validate"). The stored executionWitness is a deliberately-mutated witness; erigon's produced witness is correct, so the suite's producer-comparison diverges by exactly the mutated item. These need a stateless-verify consumer mode (or a skip), not producer changes.
  • 3 genuine producer gaps:
    • eip7928.../bal_account_touch_system_address — witness missing the SYSTEM_ADDRESS account proof node (−1).
    • 2 × witness_state_*_insert_before_delete_order — witness carries 1 extra node; erigon's post-state-root replay uses key-sorted order instead of insert-before-delete.

…fixtures

The witness_validation_* fixtures store deliberately mutated executionWitnesses
(stateless-verifier negative tests), so producer comparison always diverges -
erigon's produced witness is the canonical one. SkipLoad the 22 affected files
until a stateless-verify consumer mode exists. Tracking: #21566.
Sahil-4555 pushed a commit to Sahil-4555/erigon that referenced this pull request Jun 2, 2026
… touches it (erigontech#21565)

`debug_executionWitness` dropped the system address (`0xff..fe`) from
the witness unless it had a state change — the rationale being it's only
ever the `msg.sender` of begin/end-block system calls. But a user
transaction can access it via an account-accessing opcode
(`BALANCE`/`EXTCODESIZE`/`EXTCODEHASH`/`EXTCODECOPY`/`CALL`/`STATICCALL`);
per EIP-7928 that's an account-only entry the witness must carry. Such a
read usually hits the state-object cache (the account was loaded during
the system call), so the `RecordingState` reader never sees it.

Fix: detect the access via the per-transaction access set
(`ibs.AccessedAddresses()`) and keep the system address when a user tx
touched it.

Verified against the EEST zkevm witness suite:
`bal_account_touch_system_address` (all 6 opcode variants:
balance/call/extcodesize/extcodehash/extcodecopy/staticcall) goes green,
full `for_amsterdam` corpus shows no regressions. The zkevm suite lives
on erigontech#21487 until merged, so its CI coverage of this fixture arrives once
that lands. `make lint` clean.
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request Jun 2, 2026
…rom execution witness (erigontech#21569)

## Problem

`debug_executionWitness` emits one extra state node for blocks that
delete and re-insert keys under the same trie branch.

The canonical witness follows insert-before-delete ordering, under which
such a branch never collapses. Erigon's post-state commitment replays
the block's updates in a different order that can transiently reduce the
branch to a single child; the collapse tracer then touches the surviving
sibling into the witness. The block's net change leaves the branch with
>=2 children, so that sibling is not part of the canonical witness.

Surfaced by EEST fixtures
`eip8025_optional_proofs/witness_state_replay_order/*`, which are
constructed to fail a replay that is not insert-before-delete.

## Fix

This does not change the replay order. It restores canonical node
membership with a post-pass filter:

- `CollapseTracer` carries the collapsing branch's nibble prefix
alongside the sibling path.
- After the post-state commitment, `detectCollapseSiblings` reads each
candidate branch from the in-memory commitment domain and drops the
sibling when the branch ended with >=2 children (transient collapse);
siblings whose branch genuinely collapsed are kept.
- Adds `BranchData.ChildCount()` and
`SharedDomainsCommitmentContext.BranchChildCount()`.

Producer-only; no consumer changes, no state-root impact.

## Testing

Verified against the EEST zkevm witness suite (erigontech#21487, which carries the
fixtures): the two `witness_state_replay_order` fixtures pass and the
full `for_amsterdam` run shows no new failures.

Part of erigontech#21307.
@awskii

awskii commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Finished by #21629

@awskii awskii closed this Jun 5, 2026
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request Jun 5, 2026
…21629)

Adds the zkEVM (canonical) conformance runner for
`debug_executionWitness`, the legacy-mode field population, the optional
`mode` request param, and a format spec.

## Changes

- **`zkevmtest` runner** — a `cmd/evm` subcommand (mirroring
`staterunner`/`blockrunner`/`enginexrunner`) that checks
`debug_executionWitness` (canonical mode) against the
`ethereum/execution-spec-tests` zkevm@v0.4.0 corpus (~22.5k tests). Runs
as the `zkevm-witness-race` shard in the `eest-spec-tests` workflow, not
under `go test`. Known divergences (erigontech#21563, erigontech#21566) are absorbed in the
runner so the shard budget stays 0.
- **legacy `keys`/`codes`** — `keys` holds the 20B address / 32B
storage-slot preimages; `codes` holds pre-state bytecode incl. EIP-7702
designators and the empty-code entry.
- **`mode` request param** (`legacy`/`canonical`).
- **Spec** at `docs/plans/witness-legacy-mode-spec.md`, re-verified
against code.

## erigontech#21487

Carries erigontech#21487's suite plus the fixes that make it pass; erigontech#21487 is
redundant once CI is green here. The open question from erigontech#21487 — whether
a fixture-gated suite belongs in the required race-gate — is resolved:
it's now a `zkevmtest` runner shard in the `eest-spec-tests` workflow
(the erigontech#21092 pattern), not a `go test` package.

## Mainnet legacy soak vs reth-legacy

Rolling soak near tip (block ~`25.25M`, ~3.7k blocks checked):

- 0 health failures — no 500s, no root mismatches; every witness passes
`verifyWitnessStateless`.
- 90.6% strict set-equal on `{keys, codes, state}`.
- Divergences are almost all over-inclusion (204 blocks `keys`, 86
`codes`, 62 `state`) — a few extra nodes per block, still re-executing
to the correct root. 3 blocks show minor `keys`/`codes` under-inclusion,
none affecting `verifyWitnessStateless`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

zkevm@v0.4.0 fixtures: stale EIP-8037 AUTH_BASE gas on EIP-7702 undelegated clears

1 participant