You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On glamsterdam-devnet-5, the erigon EL on teku-erigon-1 repeatedly rejected canonical, network-finalized blocks as invalid with block access list mismatch while batch-executing during catch-up after a fresh resync. Critically, the computed BAL hash differed across re-execution attempts of the same block — the parallel executor's BAL computation is non-deterministic. Erigon's spurious INVALID verdicts propagated to teku over the engine API; teku invalidated the corresponding fork-choice branches, and once the network justified a checkpoint inside an invalidated branch, teku's fork choice wedged fatally (ProtoArray: Finalized node is unknown crash loop). The node has been dead at slot 7865 since ~16:31 UTC on 2026-06-05 (3,800+ slots behind by now), validators offline. The rest of the network was healthy (~85% participation, finalizing normally), and the sibling erigon nodes (lighthouse/lodestar/prysm/nimbus-erigon-1) executed the same blocks without BAL errors.
Node redeployed with the new build; erigon datadir wiped (eth_syncing startingBlock=0x0, OtterSync) → full resync — same deployment wave as #21650
12:59:59
First BAL validation failure during parallel batch execution near the tip: block 6292, plus exec loop error … parallel exec loop exited with 28 block(s) still pending in pe.blockExecutors (reason=ctx-done-drain: no more pending results)
12:59–16:32
Unwind → re-execute → fail loop, one failure every ~2 min: 104 BAL mismatch errors across 35 distinct canonical blocks (range ~5545–6320), failing block and computed hash varying per attempt
~15:05
teku: Payload … marked as invalid by Execution Client → Will run fork choice because head block … was invalid; also Unable to import blocks: DATA_NOT_AVAILABLE every slot
15:13
teku head freezes at slot 7865 (epoch 245)
16:13:24
teku FatalServiceFailureException: Invalid or unknown justified root: 0xec7020d2… — the network-justified checkpoint (epoch 244) was inside a branch teku had invalidated on the EL's verdict
The computed BALs also consistently contain more accounts than the stored sidecar BAL (e.g. block 6079: computed accounts=221/220 across attempts vs stored accounts=219; block 6093: computed 30 vs stored 23; block 6120: computed 26 vs stored 19), with extra precompile-range addresses appearing in the computed set.
All of these blocks executed fine on the first pass through this range (the node had reached ~6292 before the first failure), and are canonical — the network finalized well past them.
Consequence on the CL (teku)
2026-06-05 15:05:53.774 WARN - Payload for node ForkChoiceNode[blockRoot=0x46f18929…, payloadStatus=PAYLOAD_STATUS_EMPTY] marked as invalid by Execution Client
2026-06-05 15:05:53.774 WARN - Will run fork choice because head block 0x46f18929… was invalid
2026-06-05 16:13:24 … FatalServiceFailureException: Invalid or unknown justified root: 0xec7020d2142c0e663d0941332c48b2cc16bedf54e39e7336667569dfd866d0c4
2026-06-05 16:31:09 ERROR - Job DEFAULT.Timer-N threw an unhandled Exception: … IllegalArgumentException: ProtoArray: Finalized node is unknown ForkChoiceNode[blockRoot=0xec7020d2…, payloadStatus=PAYLOAD_STATUS_PE…
A spurious INVALID is the worst engine API answer an EL can give: the CL prunes canonical branches and, as seen here, can wedge unrecoverably.
Analysis / pointers
Validation + logging path: execution/stagedsync/bal_create.go (ProcessBAL). Its comment asserts "the BalancePath cross-check in VersionMap.validateRead ensures deterministic parallel execution" — violated here.
This is the bug class TestEngineApiBALParallelConsistencyStress (execution/engineapi/engine_api_bal_test.go) was written to surface: parallel-executor BAL diverging from the assembler/serial BAL under concurrent write pressure — "If this test flakes, it's the same class of bug that makes the glamsterdam assertoor suite fail."
The field signature — first mismatch at the tip during a parallel batch, then re-exec attempts after unwind failing at random earlier blocks with varying computed hashes — points at per-block BAL accumulation state not being correctly reset/isolated across conflict re-runs, retries, or unwinds in the parallel executor.
BAL computation must be deterministic and match the serial/assembler result; canonical blocks must never be rejected as INVALID. When a BAL mismatch is detected, failing loudly is correct only if the computation is trustworthy — a non-deterministic checker converts an internal race into consensus-level self-destruction.
Notes
Mitigation for devnet nodes until fixed: EXEC3_PARALLEL=false.
Recovery of teku-erigon-1 likely needs a teku DB wipe/resync — its protoarray has invalidated canonical branches — plus letting erigon finish (or redo) its sync.
Full debug report (raw Dora/ClickHouse/RPC evidence with re-derivation commands) available on request.
Summary
On
glamsterdam-devnet-5, the erigon EL onteku-erigon-1repeatedly rejected canonical, network-finalized blocks as invalid withblock access list mismatchwhile batch-executing during catch-up after a fresh resync. Critically, the computed BAL hash differed across re-execution attempts of the same block — the parallel executor's BAL computation is non-deterministic. Erigon's spurious INVALID verdicts propagated to teku over the engine API; teku invalidated the corresponding fork-choice branches, and once the network justified a checkpoint inside an invalidated branch, teku's fork choice wedged fatally (ProtoArray: Finalized node is unknowncrash loop). The node has been dead at slot 7865 since ~16:31 UTC on 2026-06-05 (3,800+ slots behind by now), validators offline. The rest of the network was healthy (~85% participation, finalizing normally), and the sibling erigon nodes (lighthouse/lodestar/prysm/nimbus-erigon-1) executed the same blocks without BAL errors.Environment
glamsterdam-devnet-5, commit1ca634d4b094f6b3932ab27227a1fa34895753b1(erigon/3.5.0/linux-amd64/go1.25.11) — same build as [glamsterdam-devnet-5] no change sets for unwinding after initial sync causes node to get stuck #21650teku/v26.4.0+137-g766dcdaefeglamsterdam-devnet-5(12s slots, genesis 2026-06-04 13:00:00 UTC,GLOAS_FORK_EPOCH=30)teku-erigon-1Timeline (UTC, 2026-06-05)
eth_syncing startingBlock=0x0, OtterSync) → full resync — same deployment wave as #21650exec loop error … parallel exec loop exited with 28 block(s) still pending in pe.blockExecutors (reason=ctx-done-drain: no more pending results)BAL mismatcherrors across 35 distinct canonical blocks (range ~5545–6320), failing block and computed hash varying per attemptPayload … marked as invalid by Execution Client→Will run fork choice because head block … was invalid; alsoUnable to import blocks: DATA_NOT_AVAILABLEevery slotFatalServiceFailureException: Invalid or unknown justified root: 0xec7020d2…— the network-justified checkpoint (epoch 244) was inside a branch teku had invalidated on the EL's verdictProtoArray: Finalized node is unknown(166k+ log lines); engine API traffic stops, erigon parked mid-sync (Execution stage 4041, Headers 6320)Key logs (erigon)
First failure:
Non-determinism evidence
Same block, same expected header BAL hash, different computed hash on each re-execution attempt (after unwinds):
0xff6f4970…0x5aec9998…(13:11:19),0x19932689…(13:13:07, 13:23:08, 13:36:30)0x127c2929…0x99d0958b…(13:07:43),0xdfffbe15…(13:28:31),0xc5346d98…(13:56:56)0xecae7d49…0xd239753d…(13:21:19),0x7e1a11c8…(13:49:44),0xd81162c8…(13:51:33),0x72764c55…(14:02:21)0xc7a34401…0xddd4c662…(14:00:31, 14:04:07),0x23a4418f…(14:11:17)The computed BALs also consistently contain more accounts than the stored sidecar BAL (e.g. block 6079: computed
accounts=221/220across attempts vs storedaccounts=219; block 6093: computed 30 vs stored 23; block 6120: computed 26 vs stored 19), with extra precompile-range addresses appearing in the computed set.All of these blocks executed fine on the first pass through this range (the node had reached ~6292 before the first failure), and are canonical — the network finalized well past them.
Consequence on the CL (teku)
A spurious INVALID is the worst engine API answer an EL can give: the CL prunes canonical branches and, as seen here, can wedge unrecoverably.
Analysis / pointers
execution/stagedsync/bal_create.go(ProcessBAL). Its comment asserts "the BalancePath cross-check inVersionMap.validateReadensures deterministic parallel execution" — violated here.TestEngineApiBALParallelConsistencyStress(execution/engineapi/engine_api_bal_test.go) was written to surface: parallel-executor BAL diverging from the assembler/serial BAL under concurrent write pressure — "If this test flakes, it's the same class of bug that makes the glamsterdam assertoor suite fail."Expected behavior
BAL computation must be deterministic and match the serial/assembler result; canonical blocks must never be rejected as INVALID. When a BAL mismatch is detected, failing loudly is correct only if the computation is trustworthy — a non-deterministic checker converts an internal race into consensus-level self-destruction.
Notes
EXEC3_PARALLEL=false.nimbus-erigon-1, unwind/changesets-38006 Too deep reorgwedge).teku-erigon-1likely needs a teku DB wipe/resync — its protoarray has invalidated canonical branches — plus letting erigon finish (or redo) its sync.