Skip to content

execution/types, engineapi, cl: avoid RLP re-encode of payload txs#21546

Merged
domiwei merged 6 commits into
mainfrom
yperbasis/perf-rawbody-no-reencode
Jun 3, 2026
Merged

execution/types, engineapi, cl: avoid RLP re-encode of payload txs#21546
domiwei merged 6 commits into
mainfrom
yperbasis/perf-rawbody-no-reencode

Conversation

@yperbasis

@yperbasis yperbasis commented Jun 1, 2026

Copy link
Copy Markdown
Member

Summary

When a block enters the execution layer — via engine_newPayload from an external CL, or via Caplin's internal-CL insert paths — each transaction arrives as its binary (canonical EIP-2718) encoding. We decode those to build a *types.Block, then a few stack frames later Block.RawBody() re-encodes every transaction with rlp.EncodeToBytes to form the body for insertion — redundant work, since the bytes were already in hand at entry.

This PR caches the binary encodings on the *types.Block and reuses them:

  • New binaryTransactions BinaryTransactions field on Block (nil = no cache; existing call sites unchanged).
  • New constructor NewBlockFromStorageWithBinaryTxs, used at the engine_newPayload call site and Caplin's ExecutionClientDirect + historical block_collector. The cache aliases the caller's existing buffers — no copy.
  • Block.RawBody() rebuilds each transaction's canonical block-body RLP from the cached bytes — re-wrapping typed txs in their EIP-2718 RLP-string envelope, legacy txs unchanged. Byte-identical to the rlp.EncodeToBytes output it replaces, but skips the per-tx struct walk; covered by TestBlockRawBodyFromBinaryTxsMatchesEncoded.

Performance

Matched-pair A/B between this PR's head (28e99358) and main pinned at a5a464c8132c, fed the same mainnet payloads via engine-mux. Each EL pinned to 4 P-cores / 8 hyperthreads on i9-13900KS, n=201 both-VALID engine_newPayloadV4 matched pairs from a steady-state tip window:

EL n mean p50 p90 p99
main a5a464c8 201 89.78 90.52 134.94 165.13
this PR 28e99358 201 89.20 89.61 132.63 163.59
Δ = main − PR +0.58 +0.91 +2.31 +1.54
  • p50: +1.01% PR-faster (main/pr at p50 = 1.010×)
  • p90: +1.7% PR-faster
  • p99: +0.9% PR-faster
  • RAM (peak VmRSS): main 40.3 GB vs PR 41.3 GB — +1 GB on the PR side, within this rig's run-to-run RAM variance.

Per-tx encode cost is small (~5–15 µs) but mainnet blocks carry ~200 txs, so the cumulative skip is consistent block over block.

…ad txs

Engine newPayload receives transactions as already-RLP-encoded bytes from
the CL. The current entry path:

  1. Decode the raw RLP bytes into types.Transaction values.
  2. Construct a *types.Block from those decoded transactions.
  3. Call HandleNewPayload -> InsertBlocksAndWaitWithAccessLists.
  4. Inside, blocksToRaw calls Block.RawBody(), which loops over
     b.transactions and calls rlp.EncodeToBytes(txn) for every tx.

Step 4 re-encodes bytes we already had at step 1. For a ~200-tx mainnet
block at ~5–15 µs per RLP encode that's ~1–3 ms of pure waste per block,
plus the alloc churn (one []byte per tx via rlp's bytes.Buffer pool).

Add Block.rawTransactions [][]byte as a caller-supplied cache, plus a
companion constructor NewBlockFromStorageWithRawTxs. When the cache is
populated (and length-matches the decoded txs), Block.RawBody() returns
the cached bytes directly. engineapi.newPayload wires the raw bytes
from req.Transactions through to the new constructor — both slices
reference the same underlying buffers, no copy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
yperbasis and others added 3 commits June 1, 2026 17:28
The rawTransactions cache added on this branch holds each tx's canonical
EIP-2718 binary encoding (the engine_newPayload wire form). RawBody()
returned those bytes verbatim, but rlp.EncodeToBytes — the path it
replaced — wraps a typed tx in an outer RLP string, so for typed txs the
cached output was shorter than before by that string prefix.

That under-counts the block size in RawBlock.ValidateMaxRlpSize, so a
block over the EIP-7934 RLP block-size limit validated as VALID instead
of being rejected (the EEST enginex RLP_BLOCK_LIMIT_EXCEEDED failures).
It also diverged the bytes persisted to kv.EthTx — copied verbatim into
snapshots — from every other ingestion path, which all store the wrapped
form.

Reconstruct the rlp.EncodeToBytes form from the cached bytes instead: a
cheap RLP-string re-wrap for typed txs, unchanged for legacy txs. The
encode-skip optimization is preserved (no per-tx field walk) while the
output is byte-identical to the non-cached path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nary", not "raw"

"raw" was ambiguous here: Block.RawBody() returns the RLP-string-wrapped
form, while the cache added on this branch holds the unwrapped form. Rename
the unwrapped-form identifiers to "binary", matching the existing
BinaryTransactions type:

  rawTransactions               -> binaryTransactions
  NewBlockFromStorageWithRawTxs -> NewBlockFromStorageWithBinaryTxs
  rlpFromCanonicalTxn           -> rlpFromBinaryTxn

No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use the existing BinaryTransactions type for the cache field and the
NewBlockFromStorageWithBinaryTxs parameter instead of bare [][]byte, so the
type itself documents the encoding. [][]byte stays assignable to it, so the
engine_server.go caller is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Avoids redundant RLP re-encoding of transactions in engine_newPayload by threading the original raw tx byte slices from the CL through a new optional cache on *types.Block, which RawBody() consults before falling back to per-tx rlp.EncodeToBytes. Yields ~7% off median engine_newPayloadV4 wall time on mainnet tip.

Changes:

  • Adds binaryTransactions BinaryTransactions field on Block plus NewBlockFromStorageWithBinaryTxs constructor; RawBody() uses cheap rlpFromBinaryTxn rewrap when the cache is populated.
  • Wires the engine API's newPayload to construct the Block with the raw txs slice as the cache.
  • Adds a unit test verifying RawBody() from the cache matches the encoded reference across legacy, AccessList, DynamicFee, Blob, and SetCode tx types.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
execution/types/block.go New binaryTransactions cache, constructor, and RawBody() fast path via rlpFromBinaryTxn.
execution/types/block_test.go Test ensures cached RawBody() matches the reference encoding for all tx types.
execution/engineapi/engine_server.go Builds the block with the raw CL tx bytes attached as the cache.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…t paths

The RawBody() encode-skip optimization previously reached only the engine API
path (engine_server.newPayload), so external CLs and the EnableEngineAPI
embedded Caplin benefited, but the default embedded Caplin did not:
ExecutionClientDirect.NewPayload and the historical block_collector both build
the block with plain NewBlockFromStorage and then re-encode every tx in
RawBody() during insertion.

Both already hold the binary tx encodings (body.Transactions, just decoded),
so pass them to NewBlockFromStorageWithBinaryTxs. Behavior-neutral: RawBody()
output is byte-identical (covered by TestBlockRawBodyFromBinaryTxsMatchesEncoded).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@yperbasis yperbasis changed the title execution/types, execution/engineapi: avoid RLP re-encode of newPayload txs execution/types, engineapi, cl: avoid RLP re-encode of payload txs Jun 2, 2026
@yperbasis yperbasis marked this pull request as ready for review June 2, 2026 14:21
@yperbasis yperbasis requested review from domiwei and mh0lt as code owners June 2, 2026 14:21
@yperbasis yperbasis requested review from awskii and taratorio June 2, 2026 14:21
@domiwei domiwei added this pull request to the merge queue Jun 3, 2026
Merged via the queue into main with commit 5c9068a Jun 3, 2026
164 checks passed
@domiwei domiwei deleted the yperbasis/perf-rawbody-no-reencode branch June 3, 2026 05:23
sudeepdino008 added a commit that referenced this pull request Jun 3, 2026
- Merge `origin/main` (up to #21546) into the `performance` branch.
- Conflicts resolved by taking main's finalized form where the perf
branch was behind (`ExistenceFilterVersion`→1 per #21164,
`mdbx-go`→v0.40.1, `merge.go` `findMergeRangeInFiles` refactor, dropped
`erigon-snapshot` module dep, fusefilter deferred-close refactor,
`Versions.MustSupport`, atomic per-key prune throttle).
- Adopted main's collation-at-tip design (`CollateAndPrune` in the FCU
path, #21398/#21415) and removed the perf branch's older
`frozenBlocks`-gating (`SetFrozenBlocksProvider`/`MaxCollatableTxNum`,
`db/services/snapshot_progress.go`, its gating tests, and callers).
- Verified: `make erigon integration` build, `make lint` (clean), `make
test-short` (green).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants