Skip to content

build: add Dockerfile.#8

Merged
juanbono merged 1 commit into
mainfrom
add-docker
Jun 5, 2024
Merged

build: add Dockerfile.#8
juanbono merged 1 commit into
mainfrom
add-docker

Conversation

@mpaulucci

Copy link
Copy Markdown
Collaborator

Description

  • Added Dockerfile
  • Added tracing
  • Cleaned up dependencies
  • Added job to check docker build

Closes #7

@mpaulucci mpaulucci requested a review from a team as a code owner June 5, 2024 15:34
@juanbono juanbono merged commit d19a445 into main Jun 5, 2024
@juanbono juanbono deleted the add-docker branch June 5, 2024 15:54
@fedacking fedacking mentioned this pull request Feb 5, 2026
1 task
azteca1998 added a commit that referenced this pull request Mar 19, 2026
…-limb values

Store each U256 limb individually instead of using a struct assignment or
copy_nonoverlapping. This prevents LLVM from grouping limbs[1..3] into a
[24 x i8] stack alloca that then requires a memset + memcpy round-trip to
reach the EVM stack slot.

After the fix, LLVM sees all 4 limbs as independent i64 scalars and can
prove that the upper limbs are zero constants for PUSH1-PUSH31, reducing
the EVM stack write from 5 memory ops (3 spills + 2 reloads) to 2 stp
instructions using the hardware zero register.

Before (PUSH1 fast path):
  str   x9, [x11]           ; limb[0]
  ldur  q0, [sp, #8]        ; reload zeros from frame
  stur  q0, [x11, #8]       ; limbs[1..2]
  ldr   x9, [sp, #24]       ; reload zeros from frame
  str   x9, [x11, #24]      ; limb[3]

After (PUSH1 fast path):
  stp   x9, xzr, [x11]      ; limb[0] + limb[1]
  stp   xzr, xzr, [x11, #16] ; limb[2] + limb[3]
github-merge-queue Bot pushed a commit that referenced this pull request Mar 26, 2026
## Summary

This PR applies two complementary assembly-level optimizations to the
PUSH1–PUSH32 opcode handlers, identified by analyzing the generated
aarch64 and x86_64 assembly output.

### Change 1: Use const-generic big-endian conversion (`push.rs`)

Replaces `U256::from_big_endian(&data[..N])` (runtime slice) with
`u256_from_big_endian_const::<N>(buf)` (const-generic fixed-size array).

Because `N` is a compile-time constant at each monomorphized
`OpPushHandler<N>`, the compiler can:
- Compute the padding offset `32 - N` at compile time
- Operate on a fixed-size `[u8; N]` buffer instead of a runtime slice,
enabling better autovectorization of the byte copy

### Change 2: Eliminate stack-frame spill in `Stack::push`
(`call_frame.rs`)

Replaces the previous `copy_nonoverlapping` (and later direct struct
assignment) with individual stores for each of the 4 U256 limbs.

**Root cause of the spill:** LLVM's SROA pass decomposes a `U256` value
into `limb[0]` (scalar i64) and `limbs[1..3]` (a `[24 x i8]` stack
alloca). A struct assignment or `copy_nonoverlapping` still uses
`memcpy` for the alloca portion, causing a `memset(alloca, 0) +
memcpy(alloca → slot)` round-trip.

By storing each limb individually, LLVM treats all 4 as independent i64
scalars, proves the upper limbs are zero constants for PUSH1–PUSH31, and
eliminates the alloca entirely.

**Before** (PUSH1 fast path, 5 memory ops):
```asm
str   x9, [x11]            ; limb[0]
ldur  q0, [sp, #8]         ; reload zeros from frame  ← wasted
stur  q0, [x11, #8]        ; limbs[1..2]
ldr   x9, [sp, #24]        ; reload zeros from frame  ← wasted
str   x9, [x11, #24]       ; limb[3]
```

**After** (PUSH1 fast path, 2 memory ops, no stack frame):
```asm
stp   x9, xzr, [x11]       ; limb[0] + limb[1]
stp   xzr, xzr, [x11, #16] ; limb[2] + limb[3]
```

The same spill/reload pattern was confirmed on x86_64 (using
`xorps`/`movaps`/`movq` through the stack frame).

## Test plan

- [x] `cargo check -p ethrex-levm`
- [x] `cargo test -p ethrex-levm --release`
- [x] `cargo test -p ethrex --release`
edg-l pushed a commit that referenced this pull request Mar 26, 2026
## Summary

This PR applies two complementary assembly-level optimizations to the
PUSH1–PUSH32 opcode handlers, identified by analyzing the generated
aarch64 and x86_64 assembly output.

### Change 1: Use const-generic big-endian conversion (`push.rs`)

Replaces `U256::from_big_endian(&data[..N])` (runtime slice) with
`u256_from_big_endian_const::<N>(buf)` (const-generic fixed-size array).

Because `N` is a compile-time constant at each monomorphized
`OpPushHandler<N>`, the compiler can:
- Compute the padding offset `32 - N` at compile time
- Operate on a fixed-size `[u8; N]` buffer instead of a runtime slice,
enabling better autovectorization of the byte copy

### Change 2: Eliminate stack-frame spill in `Stack::push`
(`call_frame.rs`)

Replaces the previous `copy_nonoverlapping` (and later direct struct
assignment) with individual stores for each of the 4 U256 limbs.

**Root cause of the spill:** LLVM's SROA pass decomposes a `U256` value
into `limb[0]` (scalar i64) and `limbs[1..3]` (a `[24 x i8]` stack
alloca). A struct assignment or `copy_nonoverlapping` still uses
`memcpy` for the alloca portion, causing a `memset(alloca, 0) +
memcpy(alloca → slot)` round-trip.

By storing each limb individually, LLVM treats all 4 as independent i64
scalars, proves the upper limbs are zero constants for PUSH1–PUSH31, and
eliminates the alloca entirely.

**Before** (PUSH1 fast path, 5 memory ops):
```asm
str   x9, [x11]            ; limb[0]
ldur  q0, [sp, #8]         ; reload zeros from frame  ← wasted
stur  q0, [x11, #8]        ; limbs[1..2]
ldr   x9, [sp, #24]        ; reload zeros from frame  ← wasted
str   x9, [x11, #24]       ; limb[3]
```

**After** (PUSH1 fast path, 2 memory ops, no stack frame):
```asm
stp   x9, xzr, [x11]       ; limb[0] + limb[1]
stp   xzr, xzr, [x11, #16] ; limb[2] + limb[3]
```

The same spill/reload pattern was confirmed on x86_64 (using
`xorps`/`movaps`/`movq` through the stack frame).

## Test plan

- [x] `cargo check -p ethrex-levm`
- [x] `cargo test -p ethrex-levm --release`
- [x] `cargo test -p ethrex --release`
edg-l added a commit that referenced this pull request May 5, 2026
Run #8 still failed at switch_block+2 even with the cache-aware backend
landed in f10fb7f: each post-switch block reading an account modified
at a prior post-switch block (but not the latest) returned the MPT-base
nonce, off-by-1.

Root cause: `BinaryMerkleizer::new` started from `BinaryTrieState::new()`
(empty), ignoring `_parent_root` and the `provider` (marked dead_code).
Symmetric `MptMerkleizer::new` opens 16 shard workers each rooted at
`parent_state_root` and lazy-loads via the provider, so the merkleizer's
trie at the new root is the FULL post-parent state. The binary side was
producing diffs from empty — so the in-memory trie at root R(N) contained
a path only to accounts modified at block N. Read-path gates that walk
`state.trie_get` (notably `stem_has_basic_data`) returned false for any
account not in the latest block, and `TransitionBackend::account` fell
through to the MPT base.

Make BinaryMerkleizer mirror MptMerkleizer:

- `BinaryTrieProvider::open_state()`: new trait method returning a
  `BinaryTrieState`. Default impl returns `BinaryTrieState::new()` (empty,
  for tests + `EmptyBinaryTrieProvider` + genesis bootstrap).
- `StoreBinaryTrieProvider::open_state` overrides: opens against
  `CacheAwareTrieBackend`, so the trie is rooted at the live binary head
  (cache + disk), matching MPT's `MptTrieWrapper(state_root, trie_cache,
  db, last_written)` pattern.
- `BinaryMerkleizer::{new,new_bal}` open via `provider.open_state()`
  instead of `BinaryTrieState::new()`. The `provider` field loses its
  `#[allow(dead_code)]` — load-bearing now.
- `EmptyTrieBackend` added to `binary-trie::db` for symmetry with
  `EmptyBinaryTrieProvider` (used by future test paths that want the
  default no-op `open_state`).

After this fix, each block's merkleizer trie contains parent_state +
this block's writes. `state.trie_get` works for cross-block reads.
Layer-cache + on-disk fallback flows identically to the MPT pipeline.

ethrex-binary-trie 143/143, ethrex-storage 49/49, ethrex-blockchain 7/7
all pass. fmt + clippy clean on touched code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Dockerfile

2 participants