Skip to content

caplin: unified Engine API client for standalone mode#20035

Merged
mh0lt merged 18 commits into
mainfrom
feat/caplin-engine-api-unified
Mar 26, 2026
Merged

caplin: unified Engine API client for standalone mode#20035
mh0lt merged 18 commits into
mainfrom
feat/caplin-engine-api-unified

Conversation

@mh0lt

@mh0lt mh0lt commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Inserts an ExecutionClientEngine that calls the Engine API directly over HTTP/JWT so that caplin can be run as an api complient external — the same interface external consensus clients use. .

The main motivation for this is to allow caplin to be used as a cl in assertor & hive tests - which excersize the standard test suites. It is also a precursor to using caplin as a zk proof varerified, which requires on EL.

It should also mean that caplin can be used with other EL's - but this has not been tested.

  • Block production fixes: Fixed payload attribute handling, withdrawal serialization, and execution request extraction for Caplin's builder path
  • Fork choice fixes: Caplin's GetHead() now returns the justified/finalized checkpoints needed by forkchoiceUpdated, fixing standalone mode FCU calls
  • Beacon API fixes: Block production endpoints work correctly in standalone mode
  • Embedded mode cleanup: Removed stale parent hash warnings, fixed Engine API mode detection, cleaned up unused gRPC dependencies
  • CI: Added Caplin Kurtosis assertoor suite and hive testing documentation
  • Test fixture fix: Updated Phase0 block fixture for version-aware Eth1Block serialization (correctly omits Deneb fields for pre-Deneb blocks)

Test plan

  • make lint — clean
  • make erigon integration — builds
  • make test-short — all pass (including beacon handler harness tests)
  • Hive engine + rpc-compat suites
  • Kurtosis caplin assertoor suite

@mh0lt

mh0lt commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

Caplin Kurtosis Suite — blocked on upstream

The caplin-assertoor config (.github/workflows/kurtosis/caplin-assertoor.io) is included but not wired into CI yet. The ethereum-package doesn't support cl_type: caplin as a launcher — it only recognizes lighthouse, lodestar, nimbus, prysm, teku, grandine, consensoor.

What needs to happen upstream

  1. Add a caplin CL launcher to ethereum-package — similar to the existing lighthouse/teku launchers but pointing at the caplin binary from the same Docker image as erigon
  2. The launcher needs to handle:
    • JWT auth (shared secret with the EL)
    • Engine API endpoint configuration (pointing at the co-located erigon EL)
    • Beacon API port exposure (for assertoor to query)
    • Validator client support (the config uses use_separate_vc: true with a lighthouse VC)

Once upstream is ready

Re-enable the suite in .github/workflows/test-kurtosis-assertoor.yml:

- suite: caplin
  package_args: .github/workflows/kurtosis/caplin-assertoor.io
  docker_binaries: "erigon caplin"
  ethereum_package_branch: "<version-with-caplin-support>"

And restore the BINARIES build-arg and per-suite cache scoping needed for the multi-binary Docker build.

@mh0lt

mh0lt commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

Upstream issue for caplin launcher support: ethpandaops/ethereum-package#1337

@yperbasis yperbasis added this to the 3.5.0 milestone Mar 21, 2026

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: IsCanonicalHash checks existence, not canonicality (remote mode)

Severity: High

func (cc *ExecutionClientEngine) IsCanonicalHash(ctx context.Context, hash common.Hash) (bool, error) {
...
if err := cc.rpcClient.CallContext(ctx, &header, "eth_getBlockByHash", hash, false); err != nil {
...
return header != nil, nil
}

eth_getBlockByHash returns any block with that hash, including uncles and stale/reorged blocks. The implementation in rpc/jsonrpc/eth_block.go calls blockByHashWithSenders without a canonicality check. So this
returns true for non-canonical blocks.

This is used in engine_server.go:getQuickPayloadStatusIfPossible to decide if a block is already validated. Incorrect canonicality could lead to wrong VALID responses for non-canonical blocks.

Fix: Either use eth_getBlockByNumber to read the canonical block at the same height and compare hashes, or add a dedicated eth_isCanonical check, or use the block number from the returned header to fetch the
canonical hash at that height.


Bug: ForkChoiceUpdate hardcodes ForkchoiceUpdatedV3

Severity: High

func (cc *ExecutionClientEngine) ForkChoiceUpdate(...) ([]byte, error) {
...
resp, err := cc.engine.ForkchoiceUpdatedV3(ctx, forkChoiceState, attributes)

The Engine API spec requires version-specific FCU calls:

  • V1 for Bellatrix
  • V2 for Capella
  • V3 for Deneb+

EngineServer.forkchoiceUpdated validates version against the timestamp (engine_server.go:714-727) and returns UnsupportedForkError if V3 is used with pre-Cancun blocks. In remote mode against a third-party EL,
this would also fail.

The PR already does correct version selection for NewPayload (switch on payload.Version()). ForkChoiceUpdate needs the same treatment — the caller needs to pass or derive the fork version. Currently the method
signature has no version parameter, so this requires an interface change or deriving the version from the head block's timestamp.


Memory leak: unrealizedJustifications / unrealizedFinalizations never cleaned up

Severity: High

// In ForkChoiceStore:
unrealizedJustifications sync.Map // blockRoot -> solid.Checkpoint
unrealizedFinalizations sync.Map // blockRoot -> solid.Checkpoint

These are populated in OnBlock with every new block but have no deletion path. The existing sync.Map fields (childrens, checkpointStates) are cleaned in onNewFinalized() (in forkchoice/utils.go:62-78) when
blocks get finalized. These new maps need the same treatment.

Growth: ~350 bytes/entry × 7,200 blocks/day = ~2.5 MB/day, ~900 MB/year.

Fix: Add cleanup in onNewFinalized() that deletes entries for blocks at or before the finalized epoch's start slot, matching the pattern used for childrens.


Bug: EngineAPIRPCClient missing ForkchoiceUpdatedV4

Severity: Medium (compile error if the interface requires it)

The EngineAPI interface in execution/engineapi/interface.go may require ForkchoiceUpdatedV4. The EngineAPIRPCClient only implements V1-V3. If the interface was updated to include V4 (for Fulu/Osaka), this would
be a compile error. Verify against the current interface definition.


Correctness: getQuickPayloadStatusIfPossible behavior change

Severity: Medium

// Old:
if currentHeader != nil && blockHash != currentHeader.Hash() && header != nil && isCanonical {
// New:
if newPayload && currentHeader != nil && header != nil && isCanonical {

The old code had blockHash != currentHeader.Hash() to avoid short-circuiting when FCU points to the current head (it needs to go through HandleForkChoice to call writeForkChoiceHashes). The new code uses
newPayload flag instead, which is cleaner. But verify: for the FCU case, the old code would still return VALID early if blockHash == currentHeader.Hash() was false — now it never returns early for FCU. This
means FCU always goes through the full pipeline even for already-canonical non-head blocks. This is correct but slower.

Also, the re-fetch of parent inside waitForResponse:
if parent == nil {
parent = s.chainRW.GetHeaderByHash(ctx, parentHash)
}
This fixes a real bug where a stale parent captured before an uncommitted write tx would cause false SYNCING responses. Good fix.


Design: HasBlock remote mode also uses eth_getBlockByHash

Severity: Low (same root issue as IsCanonicalHash)

func (cc *ExecutionClientEngine) HasBlock(ctx context.Context, hash common.Hash) (bool, error) {
...
return header != nil, nil
}

HasBlock checking existence via eth_getBlockByHash is semantically correct here (it asks "do you have this block?", not "is it canonical?"). This is fine.


Design: writeGenesisBeaconBlock — well-reasoned but fragile

Severity: Low

The genesis beacon block reconstruction from state is carefully documented and the assumption (empty txs/withdrawals → HashSSZ() == HeaderHashSSZ()) is correct for genesis. However, the comment notes this
invariant explicitly, which is good.

One concern: if a non-standard genesis (e.g., a devnet with genesis transactions) breaks this assumption, body_root would mismatch silently. Consider adding a verification:
bodyRoot, _ := body.HashSSZ()
if bodyRoot != header.BodyRoot {
return fmt.Errorf("genesis body root mismatch: computed %x, expected %x", bodyRoot, header.BodyRoot)
}


Correctness: Eth1Block.MarshalJSON version-gating

Severity: Low (good fix)

The old code always emitted blob_gas_used and excess_blob_gas (as "0") for pre-Deneb blocks. The new code correctly version-gates:

  • Bellatrix: base fields only
  • Capella: + withdrawals
  • Deneb+: + withdrawals + blob gas fields

This matches the Beacon API spec. The test fixture update (block_1.json) follows correctly.


Nit: Typo in engine_api_rpc_client.go

isHTTPpecified := strings.HasPrefix(addr, "http") // typo: missing 'S'
isHTTPSpecified := strings.HasPrefix(addr, "https")

The variable isHTTPpecified should be isHTTPSpecified (or better: hasScheme). Also, the logic is subtly wrong — strings.HasPrefix(addr, "http") matches both http:// and https://, so isHTTPpecified is always
true when isHTTPSpecified is true. The protocol selection still works correctly due to the else if !isHTTPpecified guard, but the names are confusing.


Nit: Duplicate comments in get_head.go

Lines 3506-3508:
// Use per-block unrealized justifications (spec: store.unrealized_justifications[block_root])
// Fall back to realized checkpoints if unrealized not available
// Use per-block unrealized justifications (spec: store.unrealized_justifications[block_root])
// Fall back to realized checkpoints if unrealized not available

The comment block is duplicated.


Nit: getAssembledBlockV4 and V5 are identical

Both methods have exactly the same body — they both parse ExecutionRequests the same way. They could share a helper, or V5 could delegate to V4 (assuming the response format is the same for both API versions).


Summary

  │                            Issue                             │ Severity │     Type      │                                                                                                                       
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤
  │ IsCanonicalHash checks existence not canonicality            │ High     │ Bug           │                                                                                                                       
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤
  │ ForkChoiceUpdate hardcodes V3                                │ High     │ Bug           │                                                                                                                       
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤
  │ unrealizedJustifications/unrealizedFinalizations memory leak │ High     │ Memory leak   │                                                                                                                       
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ EngineAPIRPCClient possibly missing ForkchoiceUpdatedV4      │ Medium   │ Compile error │
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ getQuickPayloadStatusIfPossible behavior change              │ Medium   │ Verify        │
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ Genesis beacon block body root not verified                  │ Low      │ Robustness    │
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ Eth1Block.MarshalJSON version-gating                         │ Low      │ Good fix      │
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ URL parsing typo / confusing variable names                  │ Nit      │ Style         │
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ Duplicate comments in get_head.go                            │ Nit      │ Style         │
  ├──────────────────────────────────────────────────────────────┼──────────┼───────────────┤                                                                                                                       
  │ getAssembledBlockV4/V5 duplicate bodies                      │ Nit      │ DRY           │
  └──────────────────────────────────────────────────────────────┴──────────┴───────────────┘                                                                                                                       

The three high-severity issues (IsCanonicalHash semantics, FCU version hardcoding, sync.Map leak) should be addressed before merge. The architectural approach — unified ExecutionClientEngine with local/remote
modes, using EngineAPI as the transport boundary — is well-designed and a clear improvement over the previous panicking ExecutionClientRPC.

mh0lt added 7 commits March 21, 2026 22:36
Replace the broken ExecutionClientRPC (8 panicking methods, HTTP overhead
for internal use) with a unified ExecutionClientEngine that works in two
modes:

- Local: calls EngineServer methods directly (zero HTTP/JWT/JSON overhead)
  for --caplin.use-engine-api internal mode
- Remote: uses EngineAPIRPCClient over HTTP JSON-RPC with JWT auth for
  standalone cmd/caplin

The EngineAPI interface serves as the transport boundary — EngineServer
satisfies it in-process, EngineAPIRPCClient satisfies it over HTTP.
…one mode

Fix multiple bugs preventing Caplin from producing blocks and finalizing
in standalone mode (with external VC). Verified with kurtosis devnet
(assertoor synchronized-check + block-proposal-check both pass) and
hive engine tests (401/403 pass, 2 known Cancun failures).

Bugs fixed:
- Genesis block root mismatch (stored under content hash vs state root)
- GetPayload version routing (was always V4, now version-aware)
- Missing withdrawals/transactions in executionPayloadToEth1Block
- Parent root mismatch in produced blocks (HashSSZ vs HeadRoot)
- Fulu NewPayload routing (added FuluVersion case)
- EL NewPayload quick-reject dropping valid blocks (re-fetch parent
  inside waitForResponse closure)
- Fork choice filter: sync.Map type mismatch ([32]byte vs common.Hash)
- Fork choice filter: missing unrealized finalized checkpoints causing
  leaf rejection at epoch boundaries
- Beacon API: withdrawals field omitted from execution_payload JSON
  (omitempty on nil pointer); now version-gated MarshalJSON
Add caplin assertoor test suite to kurtosis CI matrix with per-suite
docker_binaries and ethereum_package_branch support. Document the full
hive and kurtosis test setup for caplin+erigon pairs, covering local
runs, CI integration, and the embedded engine API mode.
- Add internalCL field to EngineServer to suppress "no CL requests"
  warning when any embedded CL is active (including --caplin.use-engine-api)
- Re-fetch parent header inside waitForResponse closure to avoid false
  SYNCING responses when the parent is in an uncommitted write tx
…ization

The Eth1Block.MarshalJSON change correctly omits Deneb-specific fields
(blob_gas_used, excess_blob_gas) for pre-Deneb blocks. Update the test
fixture to match.
- Add missing `consuming` parameter to NewEngineServer calls in tests
- Update produceBeaconBody call to use new (slot, root) signature
The caplin kurtosis suite requires cl_type support in ethereum-package
which doesn't exist yet. Revert workflow changes to match main and keep
the caplin config file for future use.
@mh0lt mh0lt force-pushed the feat/caplin-engine-api-unified branch from 57da39c to ce72ee0 Compare March 21, 2026 22:38
mh0lt added 4 commits March 23, 2026 16:54
- IsCanonicalHash: verify canonicality via eth_getBlockByNumber cross-check
  instead of just checking block existence with eth_getBlockByHash
- ForkChoiceUpdate: add version parameter to select V1-V4 based on fork,
  matching the existing NewPayload version selection pattern
- ForkchoiceUpdatedV4: add to EngineAPI interface and EngineAPIRPCClient
- forkchoice: clean up unrealizedJustifications/unrealizedFinalizations
  sync.Maps in onNewFinalized to prevent unbounded memory growth
- clstages: add genesis beacon block body root verification
- Fix isHTTPpecified typo, rename to hasScheme/isHTTPS for clarity
- Remove duplicate comments in get_head.go
- Extract getAssembledBlockFromResponse shared helper for V4/V5
In standalone caplin mode, the execution block reader (eth1Getter) was nil,
causing the beacon API to return null for execution_payload.transactions
when reading blocks. This made assertoor's transaction inclusion tests fail.

Add ExecutionEngineReader that implements ExecutionBlockReaderByNumber by
fetching block bodies from the EL via engine_getPayloadBodiesByRangeV1.
Wire it up in standalone caplin's main.go.
The beacon API spec defines previous_epoch_participation and
current_epoch_participation as List[ParticipationFlags] — a JSON array
of decimal strings (e.g. ["7", "7", "0"]).

Caplin was serializing them as hex-encoded SSZ bytes ("0x070700..."),
which caused assertoor to fail with "illegal base64 data" when trying
to load the validator set, breaking client tracking and causing the
transaction inclusion test to time out.

UnmarshalJSON supports both the new spec-compliant array format and
the legacy hex format for backwards compatibility.
The beacon DB stores blinded blocks (no transactions). When the beacon
API reads a block, it fetches transactions from the EL via
engine_getPayloadBodiesByRangeV1. But for recently produced blocks,
the EL hasn't committed yet — causing the API to return 0 transactions.

Add an in-memory body cache to ExecutionEngineReader. Block production
populates the cache immediately, so the beacon API can return
transactions before the EL commit completes.

This fixes the assertoor eoa-transactions-test which checks block
proposals via the beacon API for transaction inclusion.

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Issues Claude Found:

Medium: ExecutionClientRpc.GetAssembledBlock still panics

cl/phase1/execution_client/execution_client_rpc.go:288 — The old ExecutionClientRPC still has panic("unimplemented") in GetAssembledBlock. While this path shouldn't be hit if users are migrated to the new
ExecutionClientEngine, leaving a panic in production code is risky. Either implement it or return a clear error.

Medium: ExecutionEngineReader.NewExecutionEngineReader accepts nil engine

cl/persistence/format/snapshot_format/getters/execution_rpc.go — NewExecutionEngineReader is called with executionEngine in cmd/caplin/main.go:121, which could be nil if cfg.RunEngineAPI is false. The reader
will then panic on r.engine.GetBodiesByRange() when Transactions() or Withdrawals() is called and there's no cache hit. Either guard for nil engine or don't create the reader when engine is nil.

Low: CacheBody eviction is O(n)

getters/execution_rpc.go:85-90 — The eviction loop iterates over the entire map. With the cap at 64 entries this is fine in practice, but a minKey tracking approach would be cleaner. Not blocking.

Low: Block production caches transactions but not withdrawals

cl/beacon/handler/block_production.go:795 — CacheBlockBody is called with nil withdrawals in the builder body, while storeBlockAndBlobs (line 1349) also passes nil for withdrawals. If the block reader needs
withdrawals before the EL commits, this will fall through to the engine call and potentially fail. The execution payload has withdrawals available — consider passing them.

Low: ReadBlockByRoot nil check removed

db/snapshotsync/freezeblocks/beacon_block_reader.go:212 — The r.eth1Getter == nil guard was removed from ReadBlockByRoot. If eth1Getter is nil and the code path reaches snapshot deserialization that needs
execution data, this could panic. The new nil check in snapshot_format/blocks.go:117 mitigates this for the snapshot read path, but verify all callers are covered.

Nit: Copyright year

New files have Copyright 2024 but this is 2026 code.

Verdict

The core architecture is solid and the major review feedback has been addressed. The remaining issues are medium/low severity. I'd recommend addressing the nil-engine guard and the panic in GetAssembledBlock
before merge. The PR is close to ready.

- ExecutionClientRpc: replace panic("unimplemented") with ErrNotSupported
  error returns for all stub methods
- ExecutionEngineReader: guard for nil engine (returns empty when no
  engine available instead of panicking), add highestCached for O(1)
  eviction threshold
- Block production: cache both transactions and withdrawals (not just
  transactions) via shared cacheExecutionBody helper
- Fix copyright year to 2026 in new file
yperbasis
yperbasis previously approved these changes Mar 24, 2026

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, wrong PR

@yperbasis yperbasis dismissed their stale review March 24, 2026 12:43

sorry, I confused PRs

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity

  1. executionPayloadToEth1Block does not set version or beaconCfg on returned Eth1Block

cl/phase1/execution_client/execution_client_engine.go ~line 3582 in the diff:

block := &cltypes.Eth1Block{
ParentHash: ep.ParentHash,
...
}

The version and beaconCfg fields on Eth1Block are unexported and default to zero/nil. This means:

  • block.Version() always returns 0 (Phase0), regardless of actual fork
  • MarshalJSON() will never emit withdrawals, blob_gas_used, or excess_blob_gas
  • EncodingSizeSSZ() / getSchema() will undercount / omit fields
  • PayloadHeader() won't compute withdrawalsRoot or set blob gas fields
  • Any method accessing b.beaconCfg (e.g., SSZ encode/decode) will nil-panic

This only affects the remote EL path (standalone caplin). The local path goes through chainRW.GetAssembledBlock which returns properly versioned blocks. But for the stated goal of enabling external EL usage,
this is a real bug.

Fix: Use cltypes.NewEth1Block(version, beaconCfg) then set fields, or add a package-level constructor. The version is available from the clparams.StateVersion parameter already threaded through
GetAssembledBlock.

  1. FCU version mapping sends V4 for Electra/Fulu — should be V3

cl/phase1/execution_client/execution_client_engine.go FCU switch:
default: // Electra, Fulu, Gloas+
resp, err = cc.engine.ForkchoiceUpdatedV4(ctx, ...)

Per the Engine API spec:

  • forkchoiceUpdatedV3 is valid for Cancun and Prague (Deneb + Electra)
  • forkchoiceUpdatedV4 is introduced in Amsterdam (Gloas)

Sending V4 for Electra causes the server to process it as GloasVersion (see engine_api_methods.go:142), which could mis-validate payload attributes. The same issue exists in execution_client_rpc.go where
Electra also maps to V4.

Against a non-erigon EL, this would fail — a spec-compliant EL will reject forkchoiceUpdatedV4 before Amsterdam activates.


Medium Severity

  1. NewPayload does not nil-check payloadStatus before dereferencing

execution_client_engine.go ~line 3328:
if payloadStatus.Status == engine_types.AcceptedStatus {

If the engine returns (nil, nil), this panics. The existing ExecutionClientRpc avoids this by pre-allocating the status struct. Low probability in practice but worth a nil guard.

  1. Memory leak in unrealized checkpoint maps

cl/phase1/forkchoice/utils.go cleanup in onNewFinalized:

The cleanup only deletes entries where GetHeader returns true and header.Slot <= finalizedSlot. If the fork graph has already pruned a header, GetHeader returns false and the map entry is never deleted. Over a
long-running node, orphaned entries accumulate. Fix:

if !has || header.Slot <= finalizedSlot {
f.unrealizedJustifications.Delete(k)
}

  1. unrealizedFinalizations map is not in the consensus spec

cl/phase1/forkchoice/forkchoice.go: The spec defines store.unrealized_justifications but not store.unrealized_finalizations. The spec's filter_block_tree uses the block's realized finalized checkpoint for the
finalized check. Using unrealized finalized checkpoints is a spec deviation. In practice this should be safe (unrealized >= realized), but it should be verified against edge cases at epoch boundaries.

  1. GetBlobs always uses V1 in remote mode

execution_client_engine.go ~line 3673:
result, err := cc.engine.GetBlobsV1(ctx, versionedHashes)

The EngineAPI interface has GetBlobsV1, V2, V3 and the constants are defined, but GetBlobs never dispatches to V2/V3. For Fulu+, GetBlobsV2 returns CellProofs []hexutil.Bytes instead of a single Proof, so using
V1 would return the wrong proof structure.

  1. No Close() method on ExecutionClientEngine

The rpc.Client created in NewExecutionClientEngineRPC holds an HTTP connection pool and goroutines. There's no cleanup path when the engine is shut down. The other implementations have the same gap, but this
adds another one.


Low Severity

  1. Hardcoded maxWithdrawals = 16

executionPayloadToEth1Block:
block.Withdrawals = solid.NewStaticListSSZ[*cltypes.Withdrawal](16, 44)

Mainnet MaxWithdrawalsPerPayload is 16, but this should use the beacon config value for correctness on testnets/custom chains. Same issue in getters/execution_rpc.go.

  1. Hardcoded DenebVersion in block collector FCU

cl/phase1/execution_client/block_collector/persistent_block_collector.go:280:
p.engine.ForkChoiceUpdate(ctx, ..., clparams.DenebVersion)

The block's version is already decoded at line 223 (version := clparams.StateVersion(v[0])). Could propagate it instead of hardcoding.

  1. Dead constants

rpc_helper/methods.go: EngineNewPayloadV5, EngineGetBlobsV2, EngineGetBlobsV3 are defined but never referenced. Should be removed or documented as future-use.

  1. writeGenesisBeaconBlock doesn't handle Electra+ fields

cl/phase1/stages/clstages.go: The synthetic genesis block reconstruction handles fields up to Deneb. If genesis is at Electra version, body.ExecutionRequests may need initialization for the body root to match.

  1. Anchor block root not stored in unrealized maps

forkchoice.go:291-294: NewForkChoiceStore sets the atomic checkpoint values but never populates unrealizedJustifications.Store(anchorRoot, ...). Mitigated by the fallback to fork graph lookups.


Summary

The PR's core architecture is good, but issues #1 and #2 are blockers — the missing version/beaconCfg on Eth1Block will cause panics or wrong serialization in remote mode, and the FCU version mismatch will
break against non-erigon ELs. Issue #4 (memory leak) should also be fixed before merge. The rest can be addressed in follow-ups.

HIGH:
- executionPayloadToEth1Block: use NewEth1Block(version, beaconCfg) so
  the returned block has correct version and config for serialization,
  SSZ encoding, and payload header computation in remote mode
- FCU version mapping: V3 covers Deneb + Electra + Fulu (Prague), V4
  is only for Gloas+ (Amsterdam). Previously Electra/Fulu sent V4 which
  would be rejected by spec-compliant non-erigon ELs

MEDIUM:
- NewPayload: add nil check for payloadStatus before dereferencing
- Unrealized checkpoint cleanup: also delete entries whose headers have
  been pruned from the fork graph (!has), preventing orphaned map entries
- Thread beaconCfg through ExecutionClientEngine constructors and
  SetBeaconChainConfig for deferred initialization

LOW:
- Use beaconCfg.MaxWithdrawalsPerPayload instead of hardcoded 16
- Improve block collector FCU comment documenting V3 coverage
- GetBlobs: add version-aware dispatch — V1 for Deneb/Electra (single
  proof), V2 for Fulu+ (cell proofs). Adds version parameter to
  ExecutionEngine.GetBlobs interface
- ExecutionClientEngine: add Close() to release HTTP connections
- Remove dead EngineNewPayloadV5 constant from rpc_helper/methods.go
- Store anchor block root in unrealizedJustifications/Finalizations
  maps during ForkChoiceStore initialization
- writeGenesisBeaconBlock: no change needed — NewSignedBeaconBlock
  already initializes ExecutionRequests for Electra+ via NewBeaconBody

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH

  1. ExecutionClientRpc.NewPayload missing Fulu case (execution_client_rpc.go)

The old ExecutionClientRpc.NewPayload has // TODO: Add Fulu case but no actual Fulu handling — it falls through to return PayloadStatusNone, errors.New("invalid payload version"). The new ExecutionClientEngine
correctly handles Fulu (uses V4 for Electra+Fulu), but if any code path still constructs an ExecutionClientRpc, Fulu payloads will fail. If this type is now dead code, it should be removed or clearly marked as
deprecated. If it's still reachable, the TODO needs to be resolved.

  1. GetBlobs silently swallows errors (execution_client_engine.go)

Both local and remote paths return nil, nil on error:
result, err := cc.engine.GetBlobsV1(ctx, versionedHashes)
if err != nil {
return nil, nil
}
This makes "no blobs available" indistinguishable from "network failure" to callers. For blob availability checks this could mask real problems. The interface signature (([][]byte, [][][]byte) with no error)
forces this, but it's worth considering whether the interface should return an error.

  1. onNewFinalized uint64 underflow (forkchoice/utils.go)

slotToPrune := ((newFinalized.Epoch - 3) * f.beaconCfg.SlotsPerEpoch) - 1

If newFinalized.Epoch < 3 (which happens during the first 3 epochs after genesis), this wraps around to ^uint64(0). Pre-existing bug, but this PR adds genesis-aware startup code that will hit this path. Needs a
guard:
if newFinalized.Epoch <= 3 {
return // or slotToPrune = 0
}

MEDIUM

  1. ExecutionClientRpc should be removed or clearly deprecated

The old type now returns ErrNotSupported for most methods instead of panicking (improvement), but it's effectively dead. ExecutionClientEngine fully supersedes it in both local and remote modes. Keeping it
around adds confusion about which implementation is canonical.

  1. CacheBlockBody relies on concrete type assertion (beacon_block_reader.go:81-82)

if c, ok := r.eth1Getter.(*getters.ExecutionEngineReader); ok {
c.CacheBody(blockNumber, transactions, withdrawals)
}

This silently no-ops if eth1Getter is any other type. The CacheBlockBody method is on the BeaconSnapshotReader interface but only works with one implementation. Consider adding CacheBody to the
ExecutionBlockReaderByNumber interface instead.

  1. writeGenesisBeaconBlock is fragile across forks (clstages.go)

The genesis block reconstruction copies Eth1Header fields into Eth1Block one by one. Any new field added in a future fork will be silently missed, causing a body root mismatch that fails with a cryptic error.
Consider adding a constructor like NewEth1BlockFromHeader(header *Eth1Header) to keep this in sync.

  1. Per-block unrealized checkpoint maps: key type (forkchoice.go:109-110)

unrealizedJustifications sync.Map // blockRoot -> solid.Checkpoint
unrealizedFinalizations sync.Map // blockRoot -> solid.Checkpoint

The commit message mentions fixing a sync.Map type mismatch ([32]byte vs common.Hash). The code stores/loads common.Hash consistently now, which is correct. The earlier fix was critical — glad it's resolved.


LOW

  1. Copyright years — execution_client_engine.go and engine_api_rpc_client.go are new files but have Copyright 2024. Should be 2026 per the project convention (as execution_rpc.go correctly uses).

  2. FCU version mapping is correct and well-commented — V3 covers Deneb through Fulu (Prague), V4 only for Gloas+ (Amsterdam). Matches the Engine API spec.

  3. ParticipationBitList JSON fix is spec-compliant — The old hex-encoded SSZ was wrong per the beacon API spec. The new MarshalJSON produces ["7","7","0"] format, and UnmarshalJSON accepts both formats for
    backwards compat. Good.

  4. Eth1Block.MarshalJSON version-gated output — Correctly omits blob_gas_used/excess_blob_gas for pre-Deneb and ensures withdrawals is never null for Capella+. This fixes the omitempty nil-pointer issue.

Verdict

Solid work that addresses a real gap (caplin couldn't function as a standalone CL). The main concerns are: (1) cleaning up the now-superseded ExecutionClientRpc, (2) the uint64 underflow in onNewFinalized, and
(3) silent error swallowing in GetBlobs. The rest is clean, well-documented, and battle-tested against assertoor.

mh0lt added 2 commits March 25, 2026 09:55
HIGH:
- Remove ExecutionClientRpc entirely — fully superseded by
  ExecutionClientEngine. Move shared checkPayloadStatus and
  DefaultRPCHTTPTimeout to execution_client_engine.go.
- Fix uint64 underflow in onNewFinalized when epoch < 3: guard
  against wraparound during first 3 epochs after genesis.

MEDIUM:
- Move CacheBody to ExecutionBlockReaderByNumber interface instead
  of relying on concrete type assertion. All implementations
  (ExecutionEngineReader, ExecutionSnapshotReader) now satisfy it.
- Add NewEth1BlockFromExecutionHeader constructor that keeps
  header→block field mapping in sync across forks. Simplifies
  writeGenesisBeaconBlock from 25 lines to 1.

LOW:
- Fix copyright years on new files (2024 → 2026)
The GetBlobs interface previously returned (blobs, proofs) with no error,
forcing implementations to silently swallow network failures and return
nil — making "no blobs available" indistinguishable from "network error".

Add error return to ExecutionEngine.GetBlobs. The caller in on_block.go
now logs warnings on failure instead of silently treating errors as
missing blobs.

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH

  1. ExecutionClientEngine.Close() is not called on the local path (execution_client_engine.go:109-111)

  2. Close() only closes cc.rpcClient and is a no-op for local mode, which is fine for now. But ExecutionClientEngine is also not closed on the remote path in cmd/caplin/main.go -- there's no defer cc.Close(). If
    the caplin process exits ungracefully, HTTP connections won't be cleaned up. Consider adding defer cc.Close() in main.go after construction.

  3. NewExecutionClientEngineRPC passes nil beaconCfg (cmd/caplin/main.go:80)

cc, err := execution_client.NewExecutionClientEngineRPC(cfg.JwtSecret, cfg.EngineAPIAddr, cfg.EngineAPIPort, nil)

  1. beaconCfg is nil at construction time. It's set later via SetBeaconChainConfig, but if any method on ExecutionClientEngine is called before that (e.g., GetAssembledBlock which calls
    executionPayloadToEth1Block which uses beaconCfg), you'd get a nil pointer dereference. This is a latent crash risk. Consider either:
    - Requiring beaconCfg at construction (preferred)
    - Adding nil-guards in executionPayloadToEth1Block

  2. FCU version mapping may reject future forks (execution_client_engine.go:216-224)

  3. The default case in ForkChoiceUpdate sends V4 for anything >= Gloas. If a new fork is added but the Engine API adds V5, the default would silently use V4. This is acceptable for now but should be revisited
    when new forks are defined.

MEDIUM

  1. ExecutionEngineReader.cache has no upper bound enforcement guarantee (getters/execution_rpc.go:92-97)

  2. The eviction logic only triggers when len(r.cache) > 64, but it deletes entries based on r.highestCached - 64. If block numbers are sparse (e.g., blocks cached for numbers 1, 100, 200), the threshold-based
    eviction won't clean entries efficiently. For the expected use case (sequential block production) this is fine, but worth noting.

  3. Race between CacheBody and Transactions/Withdrawals (getters/execution_rpc.go)

  4. The getCached method uses RLock and CacheBody uses full Lock, which is correct. However, the pattern of "check cache, then fall through to engine" could return stale data if a block is re-orged and
    re-produced at the same number. Given the 64-block window and the fact this is for beacon API reads, the practical risk is very low.

  5. ParticipationBitList.MarshalJSON allocates per-element strings (solid/participation_bitlist.go:180-184)

  6. For large validator sets (1M+ validators), this creates 1M+ string allocations. Consider using a strings.Builder or pre-allocated byte buffer. Not a blocker, but worth noting for mainnet performance.

LOW

  1. EngineAPIRPCClient URL construction (engine_api_rpc_client.go:41-49)

  2. The scheme detection logic is correct but slightly convoluted. hasScheme checks http which would also match https, then isHTTPS refines it. The empty string for protocol when hasScheme is true works because
    the address already contains the scheme. This is fine but could benefit from a brief comment.

  3. buildExecutionPayload reverses BaseFeePerGas manually (execution_client_engine.go:117-121)

  4. This same byte-reversal pattern appears in executionPayloadToEth1Block and in eth1_block.go. Consider extracting a helper to avoid the repeated reversal logic (3 occurrences). Not critical for this PR.

  5. Test fixture (block_1.json) is 1974 lines of added JSON (cl/beacon/handler/test_data/block_1.json)

  6. This is a very large inline test fixture. It's a common pattern in Go tests, but the diff is hard to review meaningfully. The fixture itself looks properly structured.

  7. Documentation file references local paths (.github/docs/caplin-hive-testing.md)

  8. The doc references /erigon/mark/hive/ which is a local developer path. This is fine as internal documentation, but should be called out as non-portable.

Summary

This is a well-executed PR that enables a significant new capability (standalone caplin mode). The architecture is sound, the bug fixes are well-motivated by real test failures, and the code is clean. The main
risk areas are around the deferred beaconCfg initialization (issue #2) and missing Close() call (issue #1). The rest of the issues are minor or informational. The PR is ready for merge after addressing the
high-priority items.

- Add defer cc.Close() in standalone caplin main.go to clean up HTTP
  connections on exit (issue #1)
- Add nil-guard for beaconCfg in getAssembledBlockV3 and
  getAssembledBlockFromResponse — returns clear error if
  SetBeaconChainConfig was not called before block production (issue #2)

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but a follow-up created: #20151

@mh0lt mh0lt added this pull request to the merge queue Mar 25, 2026
Merged via the queue into main with commit 082697d Mar 26, 2026
114 of 125 checks passed
@mh0lt mh0lt deleted the feat/caplin-engine-api-unified branch March 26, 2026 00:06
sudeepdino008 added a commit that referenced this pull request May 21, 2026
…21310)

## Summary

Two spec-deviations in `getFilterBlockTree` compound to reject valid
current-epoch leaves at every epoch boundary, causing Caplin's `GetHead`
to fall back to the justified checkpoint root (a ~30-50 slot regression)
and triggering execution-side unwinds.

## Observed symptom (from #21301)

On bloatnet, once execution catches up to chain tip, the node enters a
steady-state cycle of **22-36 block execution unwinds every ~6 minutes**
— one per epoch boundary. Histogram from one ~3h run (110 unwind
events):

```
   1 size=11
   1 size=14
   4 size=15
   1 size=16
   4 size=18
   2 size=19
   3 size=20
   5 size=21
   9 size=22
  14 size=23     ← most common
   3 size=24
   5 size=25
   4 size=26
   4 size=27
   8 size=28
   1 size=30
   2 size=31
   4 size=32
   3 size=33
   4 size=34
   ...many size=35-36
```

The unwinds are on the **same chain** (no real reorg) — execution is
forced to roll back to an older head because Caplin regresses its
`GetHead()` return value:

```
08:38:45 Caplin FCU: headSlot=652928  currentSlot=652988  lagSlots=60   eth1Head=0xe0f3...
08:39:36 Caplin FCU: headSlot=652993  currentSlot=652993  lagSlots=0    eth1Head=0x263b... ← at tip
08:44:00 Caplin FCU: headSlot=652959  currentSlot=653015  lagSlots=56   eth1Head=0x7674... ← regressed 34 slots
```

The 08:44:00 FCU triggers a 23-block exec unwind:

```
[forkchoice] entering unwind path  fcuNum=24701328  canonHash=0x7674... fcuHash=0x7674...
                                   hashesDiffer=false  finishProgress=24701350  executionAhead=true
Unwind Execution from=24701350 to=24701327
```

`hashesDiffer=false` confirms same-chain. `executionAhead=true` because
exec raced ahead of Caplin's regressing head.

## Root cause: filter rejects mid-epoch leaves

Tracing inside `GetHead`:

```
[GetHead] cache miss, rebuilding from justifiedCheckpoint  justifiedEpoch=20410
[filterTree] reject leaf: checkpoint mismatch  blockRoot=0xe644... slot=653150
            justifiedOk=false  finalizedOk=false
            storeJustEpoch=20410  blockJustEpoch=20411
            storeFinalEpoch=20409  blockFinalEpoch=20410
[GetHead] filtered tree size  viableBlocks=0  totalHeads=1
[GetHead] rebuilt head  headHash=0x... headSlot=653119  ← fallback to justified root
Caplin is sending forkchoice  headSlot=653119  currentSlot=653174  lagSlots=55
```

Every leaf in the tree gets filtered because its
`unrealizedJustifications[blockRoot].epoch = N+1` but the store's
`justifiedCheckpoint.epoch = N` (store's realized lags by 1 epoch
mid-epoch). The walk falls back to the justified root → 50+ slot
regression.

## Spec deviations (the fix)

### 1. Voting-source selection used unrealized unconditionally

Spec
[`get_voting_source`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/fork-choice.md#get_voting_source)
picks unrealized only for **prior-epoch** blocks (pull-up justification
view); current-epoch blocks should use the block's realized state
checkpoint:

```python
def get_voting_source(store: Store, block_root: Root) -> Checkpoint:
    block = store.blocks[block_root]
    current_epoch = get_current_store_epoch(store)
    block_epoch = compute_epoch_at_slot(block.slot)
    if current_epoch > block_epoch:
        return store.unrealized_justifications[block_root]
    else:
        return store.block_states[block_root].current_justified_checkpoint
```

Erigon's code:

```go
// Use per-block unrealized justifications (spec: store.unrealized_justifications[block_root])
// Fall back to realized checkpoints if unrealized not available
currentJustifiedCheckpoint, has := f.getUnrealizedJustification(blockRoot)
if !has {
    currentJustifiedCheckpoint, has = f.forkGraph.GetCurrentJustifiedCheckpoint(blockRoot)
    ...
}
```

→ Always picks unrealized when available. The "fall back" comment
misreads the spec — spec is a conditional branch, not a fallback.

### 2. Justified/finalized check was strict equality

Spec
[`filter_block_tree`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/fork-choice.md#filter_block_tree)
has a lenient `+2 >= current_epoch` lookback:

```python
correct_justified = (
    store.justified_checkpoint.epoch == GENESIS_EPOCH
    or voting_source.epoch == store.justified_checkpoint.epoch
    or voting_source.epoch + 2 >= current_epoch  # ← lenient lookback
)
```

Erigon's code only allowed the first two clauses (strict `Equal()`).

## Regression provenance

The deviations were introduced in #20035 (`caplin: unified Engine API
client for standalone mode`, Mar 25 2026) by Mark Holt:

```
082697d  cl/phase1/forkchoice/get_head.go  +Use per-block unrealized justifications
```

Before that PR, `getFilterBlockTree` used
`f.forkGraph.GetCurrentJustifiedCheckpoint(blockRoot)` directly — which
returns the block's **realized** justified checkpoint. Block.realized
matched store.realized (both on the same epoch-boundary timeline), so
the equality check passed.

PR #20035 introduced the per-block unrealized lookup
(`store.unrealized_justifications` per the spec), but missed the
conditional branching (spec deviation #1) and didn't add the +2 lookback
(spec deviation #2). The result: block.unrealized goes ahead by 1 epoch
mid-epoch, and store.realized doesn't catch up until `OnTick`'s
epoch-boundary path runs.

## Fix

Restore spec-faithful behavior in `getFilterBlockTree`:

1. Branch the voting source selection on `currentEpoch > blockEpoch`:
   - Prior-epoch → `store.unrealized_justifications[block_root]`
- Current/future-epoch → block's realized `current_justified_checkpoint`

In both branches, return `false` (reject the leaf) if the lookup is
absent. `OnBlock` populates `unrealizedJustifications` for every
imported block above the finalized slot, so a missing entry indicates
either an invariant violation or a leaf below finalized — neither should
produce a viable head.

2. Replace the strict `Equal()` check on the justified side with the
spec's three flat disjuncts: `store.justified_checkpoint.epoch ==
GENESIS_EPOCH || voting_source.epoch == store.justified_checkpoint.epoch
|| voting_source.epoch + 2 >= current_epoch`.

3. Replace the finalized-side checkpoint-equality check with the spec's
ancestor-descent rule: `store.finalized_checkpoint.root ==
get_checkpoint_block(block_root, finalized.epoch)`, implemented via
`f.Ancestor(blockRoot, finalizedSlot)`.

4. Snapshot `currentEpoch` once at the start of the walk (in
`getFilteredBlockTree`) and thread it through the recursion so the whole
rebuild uses a single store-epoch value — `OnTick` updates `f.time`
without holding `f.mu`, so per-leaf `f.Slot()` reads can mix epochs
across leaves around a slot boundary.

## Validation

Tested on bloatnet at chain tip, on `performance` HEAD + this fix:

- **Before fix**: 22-36 block unwinds every ~6 min, one per epoch
boundary, persisting indefinitely. 110 events in 3h.
- **After fix**: zero unwinds at epoch boundaries observed across 30+
minutes (~5 epoch boundaries crossed). `[filterTree] reject` no longer
fires.

Effects:

- **CPU waste eliminated**: ~30 blocks of execution work were being
discarded and re-done every cycle.
- **db_size**: unchanged (was already not affected since rolled-back
state was in MemoryMutation overlay, not MDBX).
- **Logs**: clean — no more spurious `Unwind Execution` lines at tip.

## Test plan

- [x] Run on bloatnet to chain tip, observe no large unwinds at epoch
boundaries
- [x] Confirm no head regression in `Caplin is sending forkchoice` logs
(lagSlots stays ~0 at tip)
- [ ] Spectest still passes (`make spectest`)

## Refs

Full diagnosis trail: #21301
yperbasis added a commit that referenced this pull request May 22, 2026
…bitlist.go

Restores cl/cltypes/eth1_block.go and cl/cltypes/solid/participation_bitlist.go
to their pre-#20035 versions to test whether the perf regression for
#21008 lives in these serialization changes. eth1_block.go is on the
per-block deserialization path; participation_bitlist.go is touched on
every attestation.

This is on top of the prior forkchoice/* revert which alone did not
restore mainnet sync performance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yperbasis added a commit that referenced this pull request May 26, 2026
Restores #20035's versions of beacon_block_reader.go, snapshot_format/blocks.go,
execution_snapshot.go, test_util.go, and the new execution_rpc.go file.

If perf regresses with this batch, regression is here. If clean, it's elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request May 28, 2026
…mit (erigontech#21444)

## Summary

`ExecModule.UpdateForkChoice` spawns the forkchoice work in a goroutine
and waited on a `done` channel that closes only after that goroutine
**fully returns** — i.e. after flush + commit + prune. For a
merge-extending fork at tip the forkchoice result is already sent on
`outcomeCh` *before* the commit (`ParallelStateFlushing` is on by
default), so the `<-done` wait threw that early-send away and blocked
the consensus client for the entire EL commit.

This removes the `<-done` wait: `UpdateForkChoice` returns as soon as
the result lands on `outcomeCh`.

## Live-tip A/B (this branch vs main)

Probe at the `ExecutionClientDirect.ForkChoiceUpdate` boundary (what
Caplin's clstages loop blocks on), live mainnet tip, same
datadir/machine/instrumentation:

| FCU-response wall (ms) | n | min | p50 | p90 | max | mean |
|------------------------|--:|----:|----:|----:|----:|-----:|
| **main** | 100 | 96.7 | **260.2** | 386.3 | 12160.8 | 390.1 |
| **this branch** | 82 | 0.7 | **1.2** | 3.0 | 4.3 | 1.7 |

~217× lower p50; distributions don't overlap (branch max 4.3 ms ≪ main
min 96.7 ms). This matches release/3.4's ~1.4 ms at the same boundary.
The `Timings: Forkchoice commit=…ms` line is unchanged — the commit
still costs the same, it just no longer blocks the consensus loop.
safe/finalized cadence is identical between branch and main, and wall
does not correlate with safe/finalized changes, so this is not about
forkchoice args.

Both A/B binaries are post-erigontech#21327, so the comparison isolates only the
`<-done` change — the gap is not confounded by the erigontech#21008/erigontech#20035
regression (both sides already have its fix).

## Scope / relationship to erigontech#21008

This is an **independent FCU-response-latency improvement**, not the fix
for the erigontech#21008 in-sync regression. The bisection attributes erigontech#21008 to
erigontech#20035, which is entirely CL-side (`cl/phase1/forkchoice/*`) and touches
no `execution/execmodule/` code — its `getFilterBlockTree`
spec-deviation regressed `GetHead` at epoch boundaries and is itself
already fixed on main by erigontech#21310/erigontech#21327. The `<-done` wait predates the
bisection's good commit `19f44b15cd` (it was added in erigontech#19981, and that
commit measured 100% in-sync), so the EL-side blocking this PR removes
is **not** what caused the 100→3% collapse. It is a real, separate cost
worth removing on its own merits: at tip it returns the consensus loop
to ~1 ms instead of ~260 ms (and avoids multi-second blocks when the
commit overlaps a background merge/prune burst).

## Why it's safe

The `<-done` was added (erigontech#19981) so a follow-up `AssembleBlock` could
safely acquire the EL semaphore. That guarantee is preserved without it:

- In `updateForkChoice`, the `defer e.semaphore.Release(1)` is
registered **first**, so it runs **last** — after every cleanup defer
(`ResetPendingUpdates`, `ClearWithUnwind`, overlay `Close`). Any op that
subsequently acquires the semaphore necessarily observes fully-settled
state.
- `ExecutionClientDirect.ForkChoiceUpdate` already retries
`AssembleBlock` (30× 200 ms) on semaphore contention, so a proposer FCU
racing an in-flight commit resolves itself.
- release/3.4 has no `<-done` and is the proven-good baseline.
- Non-merge FCUs send their result at the end of `updateForkChoice`
anyway, so their timing is unchanged.

## Test plan

- [x] `execution/engineapi` `TestEngineApi*` block-production suite (the
FCU→AssembleBlock path `<-done` guarded)
- [x] `execution/execmodule/...`
- [x] `cl/beacon/handler` `TestCaplinBlockProduction*`
- [x] `make lint` clean
- [x] Live-tip A/B (this branch vs main): FCU-response wall p50 260 ms →
1.2 ms, 0 gap errors over an epoch

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants