[glamsterdam-devnet-5] no change sets for unwinding after initial sync causes node to get stuck

## Summary

On `glamsterdam-devnet-5`, the erigon EL on `nimbus-erigon-1` got **permanently stuck** at block 6137 after a small (4-block) reorg orphaned its initial-sync tip. The canonical chain required unwinding to block 6133, but erigon had no unwind data below its batch-executed tip (`minUnwindableBlock=6137`), so it rejected every `engine_forkchoiceUpdatedV4` with `-38006 "Too deep reorg"` — **3,876 times over ~16h** with no recovery. The CL (nimbus) stayed at head but `is_optimistic=true` the entire time, so the node's validators stopped attesting. The rest of the network was healthy (~90% participation, finalizing normally).

## Environment

- erigon: branch `glamsterdam-devnet-5`, commit `1ca634d4b094f6b3932ab27227a1fa34895753b1` (`erigon/3.5.0/linux-amd64/go1.25.11`)
- CL: `Nimbus/v26.5.0-05f88a`
- Network: `glamsterdam-devnet-5` (12s slots, genesis 2026-06-04 13:00:00 UTC)
- Node: `nimbus-erigon-1`

## Timeline (UTC, 2026-06-05)

| Time | Event |
|---|---|
| Jun 4 13:00 | Devnet genesis |
| 08:13 | Node comes online ~19h after genesis; erigon starts OtterSync from scratch |
| 09:59:17 | Erigon shuts down **gracefully** (`Exiting Engine...`) — external restart by deployment tooling, not a crash |
| 09:59:27 | After restart (head=320), nimbus FCU targets block 6137 `0x137faa…`; erigon backward-downloads and batch-executes 321→6137 (~20 blk/s) |
| 10:04:53 | `head updated number=6137 … age=12m29s` — **the sync target was already a stale tip** |
| 10:05:14 | Canonical branch arrives via `NewPayloadTrigger` (fork point: common ancestor 6133); first `no unwindable block found from changesets` |
| 10:29:40 | First `engine_forkchoiceUpdatedV4 err="Too deep reorg"` — repeats on every FCU (~10–20s) thereafter |
| Jun 6 02:10 | Still stuck at 6137; canonical head 10545; CL optimistic for ~16h |

## Key logs

Erigon (repeating on every FCU):

```
[WARN] no unwindable block found from changesets, falling back to latest with commitment block=6137 err=nil
[WARN] reorg target below minimum unwindable block unwindTarget=6133 minUnwindableBlock=6137
[WARN] [rpc] served method=engine_forkchoiceUpdatedV4 err="Too deep reorg"
```

Nimbus side:

```
DBG Failed EL Request topics="elman" requestName=forkchoiceUpdated statusCode=0 err="{\"code\":-38006,\"message\":\"Too deep reorg\"}"
```

The engine block downloader did attempt the canonical branch once and gave up:

```
[INFO] [backward-block-downloader] starting forward downloading of blocks count=44 fromNum=6134 fromHash=0x89137b50… toNum=6177
[WARN] [EngineBlockDownloader] could not process backward download request hash=0x7bcc9321… trigger=NewPayloadTrigger chainTipNum=6178
```

## Fork proof (cross-node RPC)

| Block | nimbus-erigon-1 (stuck) | teku-nethermind-1 (canonical) | Match |
|---|---|---|---|
| 6133 | `0x995491e501e77fadb4d4cdb79748c47d098e865bf8e90c3028bac9a921bbf4e8` | `0x995491e501e77fadb4d4cdb79748c47d098e865bf8e90c3028bac9a921bbf4e8` | ✅ common ancestor |
| 6137 | `0x137faa67d6e4a99dd5b9667a6cfc77bf24390f1b54b4be5d0126d84437804e6b` | `0x754487678a35269b90b30da13d7bc7520598327961ee8ab7cce29f90ffe3865b` | ❌ orphaned tip |

The orphaned 6137's EL timestamp is 120s newer than canonical 6137 — a minority-fork branch with more empty slots, which the optimistic CL fed as the FCU target during initial sync.

## Expected behavior

When an FCU requires unwinding below `minUnwindableBlock` (state with no unwind history, e.g. right after initial-cycle batch execution), erigon should have a recovery path — e.g. backward-download the canonical branch from the common ancestor and re-execute — instead of permanently rejecting every FCU with `-38006`. As-is, any small reorg that orphans the initial-sync tip bricks the node until a manual datadir wipe + resync.

## Notes

- The trigger combination: external mid-sync restart + CL optimistically targeting a soon-to-be-orphaned tip + reorg landing inside the just-executed batch range.
- Full debug report (raw Dora/ClickHouse/RPC evidence with re-derivation commands) available on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[glamsterdam-devnet-5] no change sets for unwinding after initial sync causes node to get stuck #21650

Summary

Environment

Timeline (UTC, 2026-06-05)

Key logs

Fork proof (cross-node RPC)

Expected behavior

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time	Event
Jun 4 13:00	Devnet genesis
08:13	Node comes online ~19h after genesis; erigon starts OtterSync from scratch
09:59:17	Erigon shuts down gracefully (`Exiting Engine...`) — external restart by deployment tooling, not a crash
09:59:27	After restart (head=320), nimbus FCU targets block 6137 `0x137faa…`; erigon backward-downloads and batch-executes 321→6137 (~20 blk/s)
10:04:53	`head updated number=6137 … age=12m29s` — the sync target was already a stale tip
10:05:14	Canonical branch arrives via `NewPayloadTrigger` (fork point: common ancestor 6133); first `no unwindable block found from changesets`
10:29:40	First `engine_forkchoiceUpdatedV4 err="Too deep reorg"` — repeats on every FCU (~10–20s) thereafter
Jun 6 02:10	Still stuck at 6137; canonical head 10545; CL optimistic for ~16h

Block	nimbus-erigon-1 (stuck)	teku-nethermind-1 (canonical)	Match
6133	`0x995491e501e77fadb4d4cdb79748c47d098e865bf8e90c3028bac9a921bbf4e8`	`0x995491e501e77fadb4d4cdb79748c47d098e865bf8e90c3028bac9a921bbf4e8`	✅ common ancestor
6137	`0x137faa67d6e4a99dd5b9667a6cfc77bf24390f1b54b4be5d0126d84437804e6b`	`0x754487678a35269b90b30da13d7bc7520598327961ee8ab7cce29f90ffe3865b`	❌ orphaned tip

[glamsterdam-devnet-5] no change sets for unwinding after initial sync causes node to get stuck #21650

Description

Summary

Environment

Timeline (UTC, 2026-06-05)

Key logs

Fork proof (cross-node RPC)

Expected behavior

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions