Skip to content

feat(tests): add filler_to_python converter and port static tests#2563

Merged
marioevz merged 33 commits into
ethereum:forks/amsterdamfrom
leolara:new-static-port
Mar 31, 2026
Merged

feat(tests): add filler_to_python converter and port static tests#2563
marioevz merged 33 commits into
ethereum:forks/amsterdamfrom
leolara:new-static-port

Conversation

@leolara

@leolara leolara commented Mar 26, 2026

Copy link
Copy Markdown
Member

🗒️ Description

New filler_to_python script that converts static YAML/JSON filler files
directly into Python test files, replacing the previous
fixture_to_python.py approach that required compiled fixtures as an
intermediate step.

Key improvement: Single source of truth. The new script works from
filler files only — no compiled fixtures needed. It reuses the existing
Pydantic models from execution_testing.specs.static_state for parsing,
tag resolution, and code compilation, then generates Python tests via a
Jinja2 template.

Results:

  • 2,151 fillers → 2,151 Python test files (100% generation)
  • 7,792 fixture files match 100% on full JSON content (including
    postVerifications) between static fill and Python fill
  • 1 filler cannot be converted due to a framework limitation
    (emptyBlobhashListTransaction rejects empty
    blob_versioned_hashes)
  • All generated code passes tox -e static

New files

File Purpose
scripts/filler_to_python/__init__.py Package init
scripts/filler_to_python/__main__.py CLI + pipeline + post-formatting
scripts/filler_to_python/analyzer.py Filler model → IR (tags, bytecode, params)
scripts/filler_to_python/ir.py Intermediate Representation dataclasses
scripts/filler_to_python/render.py Jinja2 env + custom filters + render_test()
scripts/filler_to_python/templates/state_test.py.j2 Single template for the full test file

Modified files

File Change
packages/testing/.../expect_section.py Added resolve_expect_post() and resolve_expect_post_fork() runtime helpers
scripts/compare_fixtures.py Rewritten for full JSON comparison with cross-category matching
tests/ported_static/ Regenerated from fillers (2,151 test files)

🔗 Related Issues or PRs

Depends on #2552 (--post-verifications flag).

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails:
    uvx tox -e static
  • All: PR title adheres to the repo standard.
  • All: Considered updating the online docs.
  • All: Set appropriate labels.
  • Ported Tests: All converted tests have @ported_from marker.

Cute Animal Picture

cute animal

@codecov

codecov Bot commented Mar 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.24%. Comparing base (c90d117) to head (e5fde5f).
⚠️ Report is 12 commits behind head on forks/amsterdam.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           forks/amsterdam    #2563      +/-   ##
===================================================
- Coverage            86.35%   86.24%   -0.12%     
===================================================
  Files                  599      599              
  Lines                36904    36984      +80     
  Branches              3771     3795      +24     
===================================================
+ Hits                 31868    31895      +27     
- Misses                4485     4525      +40     
- Partials               551      564      +13     
Flag Coverage Δ
unittests 86.24% <ø> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leolara leolara changed the title New static test porting system feat(scripts): add filler_to_python converter for static test porting Mar 27, 2026
@leolara

leolara commented Mar 27, 2026

Copy link
Copy Markdown
Member Author

filler_to_python — Feature Summary

Core Pipeline

  • Converts 2,151 static YAML/JSON filler files to Python test files
    (single source of truth: filler only, no compiled fixtures needed)
  • 4 Python modules + 1 Jinja2 template:
    analyzer.py, ir.py, render.py, __main__.py,
    templates/state_test.py.j2
  • CLI: python -m scripts.filler_to_python --fillers ... --output ...
    with --single, --filter, --dry-run options
  • Post-formatting with ruff format and ruff check --fix
  • Automatic # noqa: E501 and # noqa: F841 for generated code

Filler Analysis

  • Reuses Pydantic models from execution_testing.specs.static_state
    directly — no reimplementation of parsing, tag resolution, or
    compilation
  • Tag resolution via PreInFiller.setup() with filler-derived
    deterministic addresses (contract_address_from_hash,
    eoa_from_hash)
  • Bytecode compilation via CodeInFiller.compiled(tags) for all
    source formats (LLL, Yul, ABI, raw hex)
  • Bytecode to Op expression conversion via process_evm_bytes_string()
    with roundtrip verification (falls back to bytes.fromhex() on
    mismatch)
  • Source comment extraction and classification (yul, lll, abi, raw, hex)
  • Fork range detection from expect section network constraints with
    chronological sorting
  • Parameter matrix (d × g × v) matching fill_function() logic
  • Expect entry resolution with CreateTag address derivation
    (compute_create_address, compute_create2_address)
  • Storage ANY marker preservation via set_expect_any()
  • Per-data access list handling (different access lists per data index)
  • Empty access list ([]) vs absent (None) distinction preserved
    (affects transaction type: EIP-2930 vs legacy)

Generated Test Structure

  • Single-case tests: inline values, no parametrize
  • Multi-case tests: TX_DATA[], TX_GAS[], TX_VALUE[] module-level
    arrays with @pytest.mark.parametrize("d, g, v", [...])
  • Runtime post-state resolution:
    • resolve_expect_post(entries, d, g, v, fork) for multi-case
    • resolve_expect_post_fork(entries, fork) for fork-dependent
      single-case
  • All contracts deployed with explicit filler-derived addresses via
    deploy_contract(address=Address("0x...")) to match static fill
  • Oversized contracts (>24576 bytes) use pre[addr] = Account(code=...)
    instead of deploy_contract to bypass max_code_size check
  • Tagged EOAs (including with code/storage) handled correctly as
    address constants
  • Senders not in pre-state use pre.fund_eoa(amount=0) matching
    static fill's setup() step 7
  • Transaction fields omit defaults (nonce=0, gas_limit=21000,
    value=0, data=b"")
  • Compound exceptions rendered as Python lists, not strings

Markers and Metadata

  • @pytest.mark.ported_from([filler_path]) on every test
  • @pytest.mark.valid_from / @pytest.mark.valid_until from fork range
  • @pytest.mark.slow from _info.pytest_marks
  • @pytest.mark.exception_test on single-case exception tests only
    (multi-case uses per-param marks)
  • @pytest.mark.pre_alloc_mutable on all tests
  • Module and function docstrings from _info.comment with D400/D403/D404
    compliance

Verification

  • 7,730 non-slow + 62 slow = 7,792 fixture files match 100% on full
    JSON content (stripping _info, sorting keys) including
    postVerifications
  • 0 mismatches, 0 unpaired Python-only fixtures
  • 4 unpaired static-only fixtures (emptyBlobhashList × 4 forks) —
    framework limitation where Transaction rejects empty
    blob_versioned_hashes
  • scripts/compare_fixtures.py updated for full JSON comparison with
    cross-category matching
  • All generated code passes tox -e static (codespell, ruff, mypy,
    ethereum-spec-lint)

Runtime Support

  • resolve_expect_post() added to
    specs/static_state/expect_section.py — matches (d, g, v, fork)
    against materialized expect entries at fill time
  • resolve_expect_post_fork() added for single-case fork-dependent
    tests
  • _storage_with_any() helper generated when expect entries use
    Storage.set_expect_any() for ANY-valued keys
  • _tx_data(d) / _tx_access_list(d) helpers for multi-case
    transaction field lookup

@leolara leolara marked this pull request as ready for review March 27, 2026 15:30
@leolara

leolara commented Mar 27, 2026

Copy link
Copy Markdown
Member Author

This is how tests/ported_static/stBugs/test_staticcall_createfails.py looks now:

"""
Test_staticcall_createfails.

Ported from:
state_tests/stBugs/staticcall_createfailsFiller.json
"""

import pytest
from execution_testing import (
    EOA,
    Account,
    Address,
    Alloc,
    Environment,
    StateTestFiller,
    Transaction,
)
from execution_testing.forks import Fork
from execution_testing.specs.static_state.expect_section import (
    resolve_expect_post,
)
from execution_testing.vm import Op

REFERENCE_SPEC_GIT_PATH = "N/A"

REFERENCE_SPEC_VERSION = "N/A"

TX_DATA = [
    "000000000000000000000000c94f5374fce5edbc8e2a8697c15331677e6ebf0b",
    "000000000000000000000000d94f5374fce5edbc8e2a8697c15331677e6ebf0b",
]
TX_GAS = [120000]
TX_VALUE = [0]


def _tx_data(d: int) -> bytes:
    """Convert TX_DATA[d] hex string to bytes."""
    return bytes.fromhex(TX_DATA[d])


@pytest.mark.ported_from(
    ["state_tests/stBugs/staticcall_createfailsFiller.json"],
)
@pytest.mark.valid_from("Cancun")
@pytest.mark.parametrize(
    "d, g, v",
    [
        pytest.param(
            0,
            0,
            0,
            id="d0",
        ),
        pytest.param(
            1,
            0,
            0,
            id="d1",
        ),
    ],
)
@pytest.mark.pre_alloc_mutable
def test_staticcall_createfails(
    state_test: StateTestFiller,
    pre: Alloc,
    fork: Fork,
    d: int,
    g: int,
    v: int,
) -> None:
    """Test_staticcall_createfails."""
    coinbase = Address("0x1000000000000000000000000000000000000000")
    contract_0 = Address("0xb94f5374fce5edbc8e2a8697c15331677e6ebf0b")
    contract_1 = Address("0xc94f5374fce5edbc8e2a8697c15331677e6ebf0b")
    contract_2 = Address("0xd94f5374fce5edbc8e2a8697c15331677e6ebf0b")
    sender = EOA(
        key=0x45A915E4D060149EB4365960E6A7A45F334393093061116B197E3240065FF2D8
    )

    env = Environment(
        fee_recipient=coinbase,
        number=1,
        timestamp=1000,
        prev_randao=0x20000,
        difficulty=0x20000,
        base_fee_per_gas=10,
        gas_limit=23826461031063688,
    )

    pre[sender] = Account(balance=0x38BEEC8FEECA2598)
    # Source: lll
    # { [[1]] (STATICCALL 70000 (CALLDATALOAD 0) 0 0 0 0) }
    contract_0 = pre.deploy_contract(  # noqa: F841
        code=Op.SSTORE(
            key=0x1,
            value=Op.STATICCALL(
                gas=0x11170,
                address=Op.CALLDATALOAD(offset=0x0),
                args_offset=0x0,
                args_size=0x0,
                ret_offset=0x0,
                ret_size=0x0,
            ),
        )
        + Op.STOP,
        storage={1: 1},
        nonce=63,
        address=Address("0xb94f5374fce5edbc8e2a8697c15331677e6ebf0b"),  # noqa: E501
    )
    # Source: lll
    # { (MSTORE 1 1) [[2]] (CREATE 1 1 1) }
    contract_1 = pre.deploy_contract(  # noqa: F841
        code=Op.MSTORE(offset=0x1, value=0x1)
        + Op.SSTORE(key=0x2, value=Op.CREATE(value=0x1, offset=0x1, size=0x1))
        + Op.STOP,
        nonce=63,
        address=Address("0xc94f5374fce5edbc8e2a8697c15331677e6ebf0b"),  # noqa: E501
    )
    # Source: raw
    # 0x60006000f0
    contract_2 = pre.deploy_contract(  # noqa: F841
        code=Op.PUSH1[0x0] * 2 + Op.CREATE,
        nonce=63,
        address=Address("0xd94f5374fce5edbc8e2a8697c15331677e6ebf0b"),  # noqa: E501
    )

    expect_entries_: list[dict] = [
        {
            "indexes": {"data": -1, "gas": -1, "value": -1},
            "network": [">=Cancun"],
            "result": {
                contract_0: Account(storage={1: 0}),
                Address(
                    "0x1d0384eb7c2b1a9d9862c8e180f9e4d1696a2a8e"
                ): Account.NONEXISTENT,
            },
        },
    ]

    post, _exc = resolve_expect_post(expect_entries_, d, g, v, fork)

    tx = Transaction(
        sender=sender,
        to=contract_0,
        data=_tx_data(d),
        gas_limit=TX_GAS[g],
        gas_price=10,
        error=_exc,
    )

    state_test(env=env, pre=pre, post=post, tx=tx)

@leolara leolara requested review from marioevz and spencer-tb March 27, 2026 15:32
@leolara

leolara commented Mar 28, 2026

Copy link
Copy Markdown
Member Author

the CI fails on the test blob_versioned_hashes that I am asking feedback about

Comment thread scripts/filler_to_python/templates/state_test.py.j2 Outdated

@spencer-tb spencer-tb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slow pytest marker was not being applied. This seems to fix it on my end!

Comment thread scripts/filler_to_python/analyzer.py
Comment thread scripts/filler_to_python/analyzer.py Outdated
Comment thread scripts/filler_to_python/analyzer.py Outdated

@spencer-tb spencer-tb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip difficulty when prev_randao is set (post-merge forks) and omit default gas_price=10 as its what we set in the framework (less noise). Matches original script.

Comment thread scripts/filler_to_python/templates/state_test.py.j2 Outdated
Comment thread scripts/filler_to_python/templates/state_test.py.j2 Outdated
@spencer-tb spencer-tb added C-feat Category: an improvement or new feature P-high A-tests Area: Consensus tests. labels Mar 30, 2026
@spencer-tb spencer-tb changed the title feat(scripts): add filler_to_python converter for static test porting feat(tests): add filler_to_python converter and port static tests Mar 30, 2026
@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

The slow pytest marker was not being applied. This seems to fix it on my end!

Is that for a specific test? In my code, it does add it for example to tests/ported_static/stAttackTest/test_contract_creation_spam.py

@spencer-tb

Copy link
Copy Markdown
Contributor

The slow pytest marker was not being applied. This seems to fix it on my end!

Is that for a specific test? In my code, it does add it for example to tests/ported_static/stAttackTest/test_contract_creation_spam.py

Ahh these were added by us manually from last script! They are safe to add as slow imo!

@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

Commit Review: new-static-port New Commits

Commit 1: 9ffbe1e — Update template

View commit

Author: Spencer

Fix is correct — changes blob_versioned_hashes check from truthiness
to is not none, so empty lists are no longer silently dropped. However,
it doesn't solve the emptyBlobhashList issue since the framework's
Transaction model still rejects blob_versioned_hashes=[] on
construction.

Commit 2: b8a0874 — Update analyzer

View commit

Author: Spencer

Adds SLOW_CATEGORIES set (stQuadraticComplexityTest, stStaticCall,
stTimeConsuming) but doesn't wire it into is_slow yet — dead code at
this point, likely completed in a subsequent commit.

Needed because stQuadraticComplexityTest and stStaticCall fillers
have empty pytest_marks in _info — only stTimeConsuming carries
the slow marker. The category-based fallback catches the other two.

Commit 3: e64d397 — Update analyzer

View commit

Author: Spencer

Fixes category extraction: filler_path.parts[0]filler_path.parent.name.
For nested paths like Cancun/stTimeConsuming/..., parts[0] returns
Cancun (wrong) while parent.name returns stTimeConsuming (correct).
Essential for SLOW_CATEGORIES to work on fork-prefixed filler paths.

Commit 4: 713c4fa — Update analyzer

View commit

Author: Spencer

Completes commit 2: wires SLOW_CATEGORIES into is_slow with an or
— test is slow if _info.pytest_marks contains "slow" or category is
in SLOW_CATEGORIES. Commits 2+3+4 form a logical unit.

Commit 5: ba11f38 — Update template

View commit

Author: Spencer

Suppresses gas_price=10 in generated code since it's the
TransactionDefaults.gas_price default. Consistent with existing
default-omission pattern for nonce=0, gas_limit=21000, value=0.

Commit 6: 7ee9d49 — Update template

View commit

Author: Spencer

Suppresses difficulty when prev_randao is set. These are mutually
exclusive (pre-Merge vs post-Merge). Post-Merge forks zero out
difficulty, so emitting both is redundant. Correct fix.

Commit 7: 72dfd55 — Fix trailing whitespace and indentation in is_slow

View commit

Author: Spencer

Cosmetic: fixes indentation (6→4 spaces) and trailing whitespace in
is_slow from commit 4, adds missing blank line before class. Pure
style cleanup.

Commit 8: 2caa1ab — Use Hash, Address without strings

View commit

Author: Mario Vega

Substantial change to how tx data and addresses are rendered:

  • TX data entries now use typed constructors based on compiled byte length:
    32 bytes → Hash(0x...), 20 bytes → Address(0x...), other → Bytes("...")
  • Removes _tx_data() helper — TX_DATA[d] used directly
  • Drops string wrapping on Address: Address("0x...")Address(0x...)
    with leading zeros stripped
  • Adds Bytes import, needs_bytes_import IR field
  • Reorders template: single_post checked before multi_case/fork_dependent
  • Render: single_post now computed when exactly 1 expect entry (not just
    non-multi-case)

The 32/20 byte heuristic is cosmetic — Hash, Address, Bytes all
serialize identically, so fixtures match regardless. Produces cleaner
generated code.

Commit 9: 67ef8e2 — Make tx variables local

View commit

Author: Mario Vega

Moves TX_DATA, TX_GAS, TX_VALUE, TX_ACCESS_LISTS and helpers from
module-level to local variables inside the test function. Renames to
lowercase (tx_data, tx_gas, etc.). Removes _tx_access_list() helper,
uses tx_access_lists.get(d) inline. Also fixes REFERENCE_SPEC_VERSION
placement. Makes tests more self-contained.

Commit 10: a6781bf — Typing

View commit

Author: Mario Vega

Large mechanical refactor: addr_to_var dict changes from
dict[str, str] to dict[Address | EOA, str] — native objects as keys
instead of hex strings. Eliminates most _addr_hex() calls. AccountIR.address
becomes Address | None. Adds safety assertion
assert not var_name.startswith("0x"). No behavior change — type-safe
version of existing logic.

Commit 11: a7fbdfc — Match tx data with contract addresses

View commit

Author: Mario Vega

Extracts _build_tx_arrays() function. When compiled tx data contains
bytes that match a known contract address (from addr_to_var), uses the
variable name instead of raw hex. 20-byte match → addr_var, 32-byte
match → Hash(addr_var, left_padding=True). Also adds address-tag
sanitization for tags that look like 0x... addresses.

Commit 13: 6c22010 — Use compute_create_address

View commit

Author: Mario Vega

Major improvement: introduces _resolve_address() which checks if an
address is the result of compute_create_address(address=var, nonce=N)
from a known account (scanning nonces 0-255). Generates
compute_create_address() calls in expect entries instead of hardcoded
addresses. Also introduces ImportsIR dataclass to centralize import
tracking (replaces scattered needs_* booleans), and uses
dataclasses.asdict() to merge imports into the template context.

Commit 15: 6f79210 — Smarter decode of tx data

View commit

Author: Mario Vega

Enhances tx data decoding to parse ABI-like encoded data: splits data
into 4-byte selector + 32-byte words when len % 32 in (0, 4). Each
word is decoded independently via _decode_tx_data_word() (address
matching, Hash/Address/Bytes typing). Words joined with +. Falls
back to single-word decode for non-ABI-structured data.

Commit 17: ca71ccc — Remove unused tx_value = [0]

View commit

Author: Mario Vega

Template cleanup: suppresses tx_value = [0] when value is the default
(all zeros). Matches the existing tx_value != [0] check already used
for the Transaction's value= parameter. Also fixes whitespace around
tx_access_lists block.

Commit 19: 6d96606 — Parse bytecode when tx.to==None

View commit

Author: Mario Vega

For contract creation transactions (to=None), treats tx data as init
code and converts to Op expressions via _bytes_to_op_expr() instead of
raw hex. Uses probably_bytecode flag passed through the decode chain.
Produces much more readable generated code for creation transactions.

Commit 21: a1fe2b5 — Pass variable definitions to evm_bytes

View commit

Author: Mario Vega

Extends process_evm_bytes_string() and process_evm_bytes() in
cli/evm_bytes.py to accept int_definitions: dict[int, str] — maps
integer values to variable names. When a PUSH operand matches a known
address, the Op expression uses the variable name instead of the raw hex.
_bytes_to_op_expr() now builds these definitions from addr_to_var.
Roundtrip check updated to pass variable definitions to eval().
Currently only used for tx data bytecode (creation txs); contract code
has a TODO noting dependency order issues.

@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

Confirm that not slow tests matched the fixtures. Running the slow ones now.

@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

Also with slow, the fixtures match 100%

@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

The slow pytest marker was not being applied. This seems to fix it on my end!

Is that for a specific test? In my code, it does add it for example to tests/ported_static/stAttackTest/test_contract_creation_spam.py

Ahh these were added by us manually from last script! They are safe to add as slow imo!

Ok, I thought that only the ones marked as slow in the test YAML/JSON itself should be marked as such, but it seems these other categories, were previously hardcoded as always slow?

@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

I was wrong believing blindly the AI agent about the reason emptyBlobhashList was failing. The simple fix from Spencer fixed it.

@leolara

leolara commented Mar 31, 2026

Copy link
Copy Markdown
Member Author

It seems that everything is ok now, with the advantage of having a script that is easier to improve without introducing problems in other tests in unexpected ways.

@leolara leolara requested a review from spencer-tb March 31, 2026 13:04

@marioevz marioevz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! The resulting tests keep improving and are more readable now. Thanks!

@marioevz marioevz merged commit f5b6d10 into ethereum:forks/amsterdam Mar 31, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tests Area: Consensus tests. C-feat Category: an improvement or new feature P-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants