Skip to content

feat(specs,tests): EIP-8037 state gas ordering and clarifications#2526

Merged
spencer-tb merged 7 commits into
ethereum:eips/amsterdam/eip-8037from
spencer-tb:eips/amsterdam/eip-8037
Mar 28, 2026
Merged

feat(specs,tests): EIP-8037 state gas ordering and clarifications#2526
spencer-tb merged 7 commits into
ethereum:eips/amsterdam/eip-8037from
spencer-tb:eips/amsterdam/eip-8037

Conversation

@spencer-tb

@spencer-tb spencer-tb commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

🗒️ Description

Note: these changes are already added to devnets/bal/3, and this PR is to highlight key changes to the EIP.

Fix EIP-8037 state gas charging order and add detection/clarification tests. Includes forks/amsterdam rebase changes.

Spec fix: When an opcode charges both regular gas and state gas, regular gas must be charged first, EIPs PR: ethereum/EIPs#11421. If regular gas OOGs, state gas is not consumed. The previous ordering (state before regular) allowed state gas spilled into gas_left to be lifted into the parent's reservoir via incorporate_child_on_error, inflating it and changing the transaction's effective gas consumption.

This was identified by @rakita (reth/revm) who found a failing test fixture on the tests-bal@v5.3.0 tag. The revm implementation already charges regular gas before state gas across all opcodes.

Opcodes fixed (state gas moved after regular gas):

  • SSTORE -- storage set (zero to non-zero)
  • CALL -- value transfer to non-existent account
  • SELFDESTRUCT -- dead beneficiary with non-zero balance

CREATE, CREATE2, and code deposit already had the correct order.

CALL memory expansion double-charge fix: CALL with value transfer to a new account was double-charging memory expansion costs — once in regular gas and again in state gas. Fixed to only charge memory costs in regular gas.

1D to 2D gas accounting fix (test_sstore_oog_no_reservoir_inflation in test_state_gas_create.py): The original test_max_initcode_size_gas_metering_via_create from tests-bal@v5.3.0 used 1D gas accounting (all gas through CALL, zero reservoir). This test uses the same pattern with reservoir=0, verifying the gas accounting is correct in the fixture for both exact_gas and short_one_gas cases.

Detection tests (test_state_gas_ordering.py): Each test gives a child frame 1 gas less than needed, then probes the parent's reservoir with SSTOREs. Clients with wrong ordering produce different storage values (not just different gas accounting), causing fixture assertion failures.

Edge case tests for ethereum/EIPs#11414 (EIP-8037 text clarifications by @nerolation ):

  • SSTORE stipend check uses gas_left only, excluding reservoir
  • CREATE does not double-charge GAS_NEW_ACCOUNT

Edge case test from @jwasinger (via @MariusVanDerWijden):

  • Calldata floor exceeding TX_MAX_GAS_LIMIT rejects the transaction at validation

Fixture format divergence fix: The blocks fixture in tests/prague/eip7002_el_triggerable_withdrawals/conftest.py mutated WithdrawalRequestContract.tx_gas_limit in-place (+=) to add EIP-8037 state gas costs. Since pytest shares parameter objects across fixture format runs, the gas cost compounded, added once for blockchain_test, then again for blockchain_test_engine on the already-modified object, producing different transactions and block hashes between formats. Fixed by deep copying the parameters before mutation. Reported by @jangko (nimbus).

🔗 Related Issues or PRs

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@spencer-tb spencer-tb added A-spec-specs Area: Specification—The Ethereum specification itself (eg. `src/ethereum/*`) C-feat Category: an improvement or new feature A-tests Area: Consensus tests. labels Mar 18, 2026
@spencer-tb spencer-tb force-pushed the eips/amsterdam/eip-8037 branch from f0813e0 to d871d66 Compare March 18, 2026 16:10
@spencer-tb spencer-tb requested a review from fselmo March 18, 2026 16:17
@spencer-tb

Copy link
Copy Markdown
Contributor Author

Tagging @fselmo re-STEEL call. This will be added to devnets/bal/3 branch nonetheless for now (unless anything major caught).

@codecov

codecov Bot commented Mar 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (eips/amsterdam/eip-8037@e4ec313). Learn more about missing BASE report.

Additional details and impacted files
@@                    Coverage Diff                     @@
##             eips/amsterdam/eip-8037    #2526   +/-   ##
==========================================================
  Coverage                           ?   83.63%           
==========================================================
  Files                              ?      642           
  Lines                              ?    39708           
  Branches                           ?     4058           
==========================================================
  Hits                               ?    33210           
  Misses                             ?     5798           
  Partials                           ?      700           
Flag Coverage Δ
unittests 83.63% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@spencer-tb spencer-tb force-pushed the eips/amsterdam/eip-8037 branch 2 times, most recently from 3a5d42e to b557be3 Compare March 18, 2026 17:02

@fselmo fselmo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one bug and one possible improvement, narrowing the calldata floor gap. I put them as commits here but feel free to drop them if I missed something, add on to them, etc. It just felt easier to sanity check the bug with a test, pairing with Claude, see that it failed and then the change itself was a one-liner... so I feel like we can use this test 👀.

Also a nit on aligning comments / logic design across similar opcodes for consistency.

Lmk your thoughts :)

value,
gas,
Uint(evm.gas_left),
extend_memory.cost,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We charge the mem cost now on line 456 so we should zero this out, as we did with extra_gas here, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up comment got lost here I think. I put this as two separate commits here. The first is a test I worked with Claude on to prove this was a bug and debug it - this did indeed fail to fill. The next commit was the fix to zero this out and it fills after this change. Lmk your thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx so much for this!

gas limit cap, the transaction must be rejected at validation —
even though the regular intrinsic gas may be within the cap.

at_cap: calldata floor exactly equals the cap — transaction accepted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is sort of a nit but we need to be more clear here regardless so also not really a nit. This isn't exactly at the cap because it's not divisible enough to be exact. I asked Claude to see what the tightest gap here can be and I got this:

The tightest gaps achievable:                                                                                                                                                                                  
                                                                       
  ┌──────────────────────────────┬───────────┬────────────┬─────┐                                                                                                                                                
  │           Approach           │  Tokens   │   Floor    │ Gap │      
  ├──────────────────────────────┼───────────┼────────────┼─────┤                                                                                                                                                
  │ All nonzero (current test)   │ 1,675,620 │ 16,777,200 │ -16 │      
  ├──────────────────────────────┼───────────┼────────────┼─────┤                                                                                                                                                
  │ Mixed (nonzero + zero bytes) │ 1,675,621 │ 16,777,210 │ -6  │                                                                                                                                                
  ├──────────────────────────────┼───────────┼────────────┼─────┤                                                                                                                                                
  │ Next token up                │ 1,675,622 │ 16,777,220 │ +4  │                                                                                                                                                
  └──────────────────────────────┴───────────┴────────────┴─────┘                                                                                                                                                
                                                                       
  So the current all-nonzero approach gives a gap of 16. Using mixed bytes could tighten it to 6. But exact boundary is unreachable — this is the tightest we can get with only nonzero bytes.                   
                                                                       
  The docstring should acknowledge this: the at_cap case is "largest calldata floor that fits within the cap" and exceeds_cap is "smallest calldata floor that exceeds the cap," not "exactly equals" and        
  "exceeds by 1."

It would be nice to tighten this gap to be as close as possible to the cap and still pass.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this ended up being trivial to tighten up so I'm putting it in a commit here. Feel free to drop any of these commits too or build on top. It's just a bit easier to do for these smaller changes.


transfer_gas_cost = Uint(0) if value == 0 else GAS_CALL_VALUE

# check static gas before state access

@fselmo fselmo Mar 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed is that these messages are inconsistent atm across the opcodes that introduced this concept with BALs. I think it would be good not only to align on the design pattern for these similar opcodes, but also this is maybe a chance to make things more clear now that we have 2D gas.

A Claude sweep analysis on this thought:

  • Current — inconsistent wording (selfdestruct), no mention of "regular" gas:
    • call() L428: # check static gas before state access
    • callcode() L548: # check static gas before state access
    • selfdestruct() L646: # check access gas cost before state access
    • delegatecall() L725: # check static gas before state access
    • staticcall() L822: # check static gas before state access

Maybe all these comments can align on "regular" gas wording? I'm not sure what we're calling this... "regular" seems weird so maybe we can say something like:

# ensure enough non-state gas before accessing state

Still not sure if that's super clear... but this is just a nit to make specs a bit more readable with 2D gas.


Another catch that Claude caught that was unrelated to my ask but fair enough since it aligns with consistent messaging across opcodes:

Current — exists in sstore (L148-150) and selfdestruct (L659-661):
 # Charge regular gas before state gas so that a regular-gas OOG
 # does not consume state gas that would inflate the parent's
 # reservoir on frame failure.

 Missing from — call() (L456), create() (L189), create2() (L241).

This is for the charge_gas() call in each opcode.

Thoughts on this? 👀. Simple one-liner comments but helps to align the design on the specs side since they are bits of logic that are trying to achieve the same goal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair! We should add these. I think the whole 8037 EIP needs a full review before we merge into forks/amsterdam in the future, there will be more things like this I'm sure! :D

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense to get the logic working first. Can skip this for this PR, take or leave :)

@fselmo fselmo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to approve here @spencer-tb. Since I added my proposed fixes, let me know if you agree or not, etc. If you want another review with any more changes / dropped commits, feel free to re-request and I'll try to prioritize this and quickly get another review in but lgtm!

code_hash = get_account(tx_state, code_address).code_hash
code = get_code(tx_state, code_hash)

charge_gas(evm, extra_gas + extend_memory.cost)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an issue with the design to charge_gas multiple times. There's a bit of context here from implementing EIP-7928. Some of this pertains to EVM trace specification where we should only charge it once. Perhaps there is more nuance there too.

I believe we can account for this gas, and pass it to calculate_message_call_gas, without calling charge_gas more than once. We can use check_gas and we can do proper accounting within the logic and pass this accurate value to the calculate_message_call_gas().

Curious on your thoughts here.

cc: @gurukamath

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a todo.

spencer-tb and others added 6 commits March 28, 2026 17:01
…L, SELFDESTRUCT

When an opcode charges both regular and state gas, regular gas must be
charged first. If regular gas OOGs, state gas is not consumed. This
prevents the parent's reservoir from being inflated on frame failure
when state gas spills into gas_left.

Fixes SSTORE, CALL, and SELFDESTRUCT ordering to match reth (revm).
CREATE, CREATE2, and code deposit already had the correct order.

Adds detection tests for all opcodes that charge both gas types.
Each test gives a child frame 1 gas less than needed, then probes
the parent's reservoir to detect inflation from incorrectly consumed
state gas.
Add tests for EIP-8037 behaviors clarified in ethereum/EIPs#11414:

- SSTORE stipend check uses gas_left only, excluding the reservoir
- CREATE does not double-charge GAS_NEW_ACCOUNT (regular and state
  gas are independent charges)
- Calldata floor exceeding TX_MAX_GAS_LIMIT rejects the transaction
Fix line length violations in forks.py and test files, remove unused
import and unused variable.
- Specs were not zeroing out mem expansion cost when
  calculating the call message gas even though this
  gas was already charged.
@spencer-tb spencer-tb force-pushed the eips/amsterdam/eip-8037 branch from 0bebc8b to 006c51e Compare March 28, 2026 17:16
Add a TODO noting that separate charge_gas + charge_state_gas calls
may produce duplicate EVM trace entries. Flagged by fselmo in ethereum#2526.
@spencer-tb spencer-tb merged commit 181cd1c into ethereum:eips/amsterdam/eip-8037 Mar 28, 2026
7 checks passed
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 1, 2026
marioevz pushed a commit that referenced this pull request Apr 9, 2026
marioevz pushed a commit that referenced this pull request Apr 10, 2026
felix314159 pushed a commit to felix314159/execution-specs that referenced this pull request Apr 14, 2026
felix314159 pushed a commit that referenced this pull request Apr 16, 2026
spencer-tb added a commit that referenced this pull request Apr 17, 2026
marioevz pushed a commit that referenced this pull request Apr 20, 2026
spencer-tb added a commit that referenced this pull request Apr 20, 2026
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 21, 2026
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request Apr 21, 2026
spencer-tb added a commit that referenced this pull request Apr 21, 2026
fselmo added a commit that referenced this pull request May 5, 2026
spencer-tb added a commit to spencer-tb/execution-specs that referenced this pull request May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-spec-specs Area: Specification—The Ethereum specification itself (eg. `src/ethereum/*`) A-tests Area: Consensus tests. C-feat Category: an improvement or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants