Pull from go-ethereum up to 26aea73 (7 Feb 2019) by AlexeyAkhunov · Pull Request #1 · erigontech/erigon

AlexeyAkhunov · 2019-05-28T18:36:27Z

No description provided.

* node: close AccountsManager in new Close method * p2p/simulations, p2p/simulations/adapters: handle node close on shutdown * node: move node ephemeralKeystore cleanup to stop method * node: call Stop in Node.Close method * cmd/geth: close node.Node created with makeFullNode in cli commands * node: close Node instances in tests * cmd/geth, node: minor code style fixes * cmd, console, miner, mobile: proper node Close() termination

Circle CI

Add Go package option

the ankr's implementation of erigon for bsc

Bor consensus implementation

* progress #1 * progress #2 * proper file naming * more mature memory mutation

* rpc: add eth_callMany (#1) * clean the repo * clean style * remove unwanted err check * fix header bug * Add RPC `debug_traceCallMany` (#4) * update submodule * fix error msg

* experiment #1 * experiment #2 * experiment #3 * experiment 4

feat(ci): run hive tests as part of CI

Upstream changes

Upgrade shell script

- Remove dead LightCollector.DeleteAccount IncarnationPath emit (concern #3): IBS.Selfdestruct's versionWritten already publishes IncarnationPath via the version map; CollectorWrites is not the rawWrites source. - Add MIRROR-OF tag on versionedWriteCollector.DeleteAccount cross- referencing LightCollector.DeleteAccount and noting that IncarnationPath is emitted via IBS.Selfdestruct, not either DeleteAccount (concern #7). - Rewrite TestSDOfPreExistingContract_FullPipeline to populate vm with IBS.Selfdestruct's versionWritten emits; assert BalancePath=0 comes from vm.Read (not preBlockBalance from stateReader fallback) and that the trie sees a leaf with {Balance=0, Nonce=preBlock, CodeHash=preBlock} through the default FlushToUpdates branch (concern #1). - Add TestSDStorageCascade_EmitsPerSlotDeletes locking in the storage- cascade load-bearing invariant: normalizeWriteSet's vm.StorageKeys loop emits StoragePath=0 entries that overwrite cs.storageState's pre-SD slot values, so each slot emits DeleteUpdate (concern #6). - Update defensive-test docstrings to spell out unreachability from real production writesets and cross-link to the realistic-pipeline tests that cover the actual fix mechanism (concern #4).

…tyRemoval (erigontech#21032) ## Summary Fixes a wrong trie-root in the parallel commitment calculator when post-tx writesets are indistinguishable between two cases: 1. **SELFDESTRUCT of a pre-existing contract** — serial keeps the leaf with `incarnation>0`, zero balance/nonce/empty codeHash. Parallel was emitting `DeleteUpdate` and removing the leaf. 2. **EIP-161 emptyRemoval** of a touched-empty account (e.g. `0xff…fe` after the EIP-4788 system call) — serial emits `DeleteUpdate`. Parallel was emitting a zero-account UPDATE. The discriminator is the **pre-block incarnation**, which the writeset alone didn't carry. Fix wires it through `LightCollector.DeleteAccount` → `IncarnationPath` write → `calcAccountState.Incarnation` → 3-way branch in `FlushToUpdates`. ## Dependency direction This PR is a **precursor for erigontech#21017** (the CI matrix that runs every test under both `serial` and `parallel` exec modes). Without this fix, parallel-mode tests on hive `rpc-compat`, `engine api/cancun/withdrawals`, and the eest selfdestruct sub-suites all fail with wrong-trie-root errors at SD/empty-removal blocks. **erigontech#21017 cannot land until this PR lands.** The matrix in erigontech#21017 will validate this PR end-to-end via the hive parallel sub-suites — meaning this PR's parallel-exec changes don't run in *this* PR's own CI, only in erigontech#21017's. The rebased erigontech#21017 is the meaningful go/no-go signal. ## Changes * `LightCollector.DeleteAccount` now emits `IncarnationPath` alongside `SelfDestructPath=true` when `original.Incarnation > 0`. Calc receives the pre-deletion incarnation through the same channel as every other write — no exec-side state leakage. * `calc_state.ApplyWrites` consumes `IncarnationPath` into `calcAccountState.Incarnation` via direct type-assertion (panic on mismatch — see *Concern 3* below). * `calc_state.FlushToUpdates` switches on a 3-way `Deleted` branch with `isAllZero` defense-in-depth: * `Deleted && Incarnation>0 && all-zero` → zero-account UPDATE (matches serial's `DomainDel`-leaves-post-CREATE-encoding for SD'd contracts) * `Deleted && all-zero && Incarnation==0` → `DeleteUpdate` (matches serial's `DomainDel`-truly-empties for EIP-161 emptyRemoval) * `Deleted` with retained non-zero values → regular UPDATE — defensive coverage for OOG-CREATE2-with-retained-balance and any future write-ordering race * `SelfDestructPath` also marks all tracked storage slots dirty so `FlushToUpdates` emits per-slot updates alongside the account reset. Load-bearing invariant: `normalizeWriteSet`'s `vm.StorageKeys(addr)` loop emits `StoragePath=0` entries that arrive in `ApplyWrites` after `SelfDestructPath`, overwriting the marked slots' values to zero so they emit `DeleteUpdate` not `StorageUpdate(pre-SD value)`. Inline comment in `calc_state.go` spells this out — see *Concern 2* below. ## Earlier draft snags (resolved) The first draft also added an `IncarnationPath > 0` exclusion to `normalizeWriteSet`'s empty-account check. This was **redundant** (the empty-check already requires `Nonce == 0`, which excludes successful CREATE/CREATE2) and **broke OOG-during-CREATE2 cases** (which leave `Nonce=0/Balance=0/Code=empty/Incarnation=1` and *must* still be deleted). Removed in `9539998f14`. The `exec3_parallel.go` diff in this PR is now comments-only. ## Reviewer concerns addressed ### erigontech#1 (yperbasis): PR description was stale This body. ✓ ### erigontech#2 (yperbasis + Copilot): SD storage-cascade load-bearing invariant Inline comment in `calc_state.go`'s `SelfDestructPath` case now spells out the dependency on `normalizeWriteSet`'s `vm.StorageKeys(addr)` loop. ✓ ### erigontech#3 (yperbasis + Copilot): IncarnationPath guarded type-assertion Changed to direct `w.Val.(uint64)` to match the other paths. Silent zero would route a real SD into the EIP-161 branch and reproduce the very wrong-root bug — better to panic at the source. ✓ ### erigontech#4 (yperbasis): `TestFlushToUpdates_DeletedWithRetainedBalance_EmitsRegularUpdate` docstring Updated to clarify: this is **defensive coverage** for the third `FlushToUpdates` branch in isolation, NOT a direct repro of the eest_devnet OOG path. The actual OOG fix is the removal of the redundant `IncarnationPath > 0` clause from `normalizeWriteSet` (the OOG writeset has `Nonce=0` → empty-account → `DeleteUpdate`, not `Deleted+RetainedBalance`). End-to-end coverage of that path lives in the eest_devnet suite, surfaced via erigontech#21017's matrix. ✓ ### erigontech#5 (yperbasis): `versionedWriteCollector.DeleteAccount` asymmetry — *intentional non-fix* Decision: **keep the asymmetry, document why.** Inline comment added on `versionedWriteCollector.DeleteAccount` explaining: * It's wired only into `txResult.finalize` (fee calc + post-Cancun system calls). * Neither path SDs a pre-existing contract today, so the SD-with-incarnation differentiator is unreachable from here. * If a future caller ever does emit `DeleteAccount` on a pre-existing contract through this collector, the comment flags that this code should mirror `LightCollector.DeleteAccount`'s `IncarnationPath` emit. Adding the emit speculatively was rejected because: (a) it changes the writeset shape for paths that today don't need it, (b) any test exercising the new emit would be vacuous since no production caller hits the `original.Incarnation > 0` branch, and (c) the comment is enough to attribute the bug at first sight if someone *does* reach that code path in the future. ## Intentional non-fixes * **Concern erigontech#5 above** — `versionedWriteCollector.DeleteAccount` left without the `IncarnationPath` emit (rationale above). * **Defensive `TestFlushToUpdates_DeletedWithRetainedBalance` test kept** despite the state being unreachable from real LightCollector writesets today — protects the FlushToUpdates branch in isolation against future ApplyWrites refactors that might drop the `BalancePath`-clears-`Deleted` invariant. ## Test plan - [x] All 6 unit tests in `calc_state_test.go` pass (`TestFlushToUpdates_DeletedWithIncarnation_EmitsZeroAccountUpdate`, `TestFlushToUpdates_DeletedWithoutIncarnation_EmitsDelete`, `TestFlushToUpdates_DeletedWithRetainedBalance_EmitsRegularUpdate`, `TestFlushToUpdates_LiveAccount_EmitsFullUpdate`, `TestApplyWrites_IncarnationPath`, `TestApplyWrites_BalancePathClearsDeleted`) - [x] eest_devnet `for_amsterdam/constantinople/eip1052_extcodehash/extcodehash/extcodehash_subcall_create2_oog` all 6 variants pass locally - [x] Full `for_amsterdam/constantinople` eest_devnet suite passes - [x] `make lint` clean - [x] CI on `9539998f14` was green End-to-end validation comes via erigontech#21017's CI matrix once it rebases on top of this PR.

…21310) ## Summary Two spec-deviations in `getFilterBlockTree` compound to reject valid current-epoch leaves at every epoch boundary, causing Caplin's `GetHead` to fall back to the justified checkpoint root (a ~30-50 slot regression) and triggering execution-side unwinds. ## Observed symptom (from #21301) On bloatnet, once execution catches up to chain tip, the node enters a steady-state cycle of **22-36 block execution unwinds every ~6 minutes** — one per epoch boundary. Histogram from one ~3h run (110 unwind events): ``` 1 size=11 1 size=14 4 size=15 1 size=16 4 size=18 2 size=19 3 size=20 5 size=21 9 size=22 14 size=23 ← most common 3 size=24 5 size=25 4 size=26 4 size=27 8 size=28 1 size=30 2 size=31 4 size=32 3 size=33 4 size=34 ...many size=35-36 ``` The unwinds are on the **same chain** (no real reorg) — execution is forced to roll back to an older head because Caplin regresses its `GetHead()` return value: ``` 08:38:45 Caplin FCU: headSlot=652928 currentSlot=652988 lagSlots=60 eth1Head=0xe0f3... 08:39:36 Caplin FCU: headSlot=652993 currentSlot=652993 lagSlots=0 eth1Head=0x263b... ← at tip 08:44:00 Caplin FCU: headSlot=652959 currentSlot=653015 lagSlots=56 eth1Head=0x7674... ← regressed 34 slots ``` The 08:44:00 FCU triggers a 23-block exec unwind: ``` [forkchoice] entering unwind path fcuNum=24701328 canonHash=0x7674... fcuHash=0x7674... hashesDiffer=false finishProgress=24701350 executionAhead=true Unwind Execution from=24701350 to=24701327 ``` `hashesDiffer=false` confirms same-chain. `executionAhead=true` because exec raced ahead of Caplin's regressing head. ## Root cause: filter rejects mid-epoch leaves Tracing inside `GetHead`: ``` [GetHead] cache miss, rebuilding from justifiedCheckpoint justifiedEpoch=20410 [filterTree] reject leaf: checkpoint mismatch blockRoot=0xe644... slot=653150 justifiedOk=false finalizedOk=false storeJustEpoch=20410 blockJustEpoch=20411 storeFinalEpoch=20409 blockFinalEpoch=20410 [GetHead] filtered tree size viableBlocks=0 totalHeads=1 [GetHead] rebuilt head headHash=0x... headSlot=653119 ← fallback to justified root Caplin is sending forkchoice headSlot=653119 currentSlot=653174 lagSlots=55 ``` Every leaf in the tree gets filtered because its `unrealizedJustifications[blockRoot].epoch = N+1` but the store's `justifiedCheckpoint.epoch = N` (store's realized lags by 1 epoch mid-epoch). The walk falls back to the justified root → 50+ slot regression. ## Spec deviations (the fix) ### 1. Voting-source selection used unrealized unconditionally Spec [`get_voting_source`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/fork-choice.md#get_voting_source) picks unrealized only for **prior-epoch** blocks (pull-up justification view); current-epoch blocks should use the block's realized state checkpoint: ```python def get_voting_source(store: Store, block_root: Root) -> Checkpoint: block = store.blocks[block_root] current_epoch = get_current_store_epoch(store) block_epoch = compute_epoch_at_slot(block.slot) if current_epoch > block_epoch: return store.unrealized_justifications[block_root] else: return store.block_states[block_root].current_justified_checkpoint ``` Erigon's code: ```go // Use per-block unrealized justifications (spec: store.unrealized_justifications[block_root]) // Fall back to realized checkpoints if unrealized not available currentJustifiedCheckpoint, has := f.getUnrealizedJustification(blockRoot) if !has { currentJustifiedCheckpoint, has = f.forkGraph.GetCurrentJustifiedCheckpoint(blockRoot) ... } ``` → Always picks unrealized when available. The "fall back" comment misreads the spec — spec is a conditional branch, not a fallback. ### 2. Justified/finalized check was strict equality Spec [`filter_block_tree`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/fork-choice.md#filter_block_tree) has a lenient `+2 >= current_epoch` lookback: ```python correct_justified = ( store.justified_checkpoint.epoch == GENESIS_EPOCH or voting_source.epoch == store.justified_checkpoint.epoch or voting_source.epoch + 2 >= current_epoch # ← lenient lookback ) ``` Erigon's code only allowed the first two clauses (strict `Equal()`). ## Regression provenance The deviations were introduced in #20035 (`caplin: unified Engine API client for standalone mode`, Mar 25 2026) by Mark Holt: ``` 082697d cl/phase1/forkchoice/get_head.go +Use per-block unrealized justifications ``` Before that PR, `getFilterBlockTree` used `f.forkGraph.GetCurrentJustifiedCheckpoint(blockRoot)` directly — which returns the block's **realized** justified checkpoint. Block.realized matched store.realized (both on the same epoch-boundary timeline), so the equality check passed. PR #20035 introduced the per-block unrealized lookup (`store.unrealized_justifications` per the spec), but missed the conditional branching (spec deviation #1) and didn't add the +2 lookback (spec deviation #2). The result: block.unrealized goes ahead by 1 epoch mid-epoch, and store.realized doesn't catch up until `OnTick`'s epoch-boundary path runs. ## Fix Restore spec-faithful behavior in `getFilterBlockTree`: 1. Branch the voting source selection on `currentEpoch > blockEpoch`: - Prior-epoch → `store.unrealized_justifications[block_root]` - Current/future-epoch → block's realized `current_justified_checkpoint` In both branches, return `false` (reject the leaf) if the lookup is absent. `OnBlock` populates `unrealizedJustifications` for every imported block above the finalized slot, so a missing entry indicates either an invariant violation or a leaf below finalized — neither should produce a viable head. 2. Replace the strict `Equal()` check on the justified side with the spec's three flat disjuncts: `store.justified_checkpoint.epoch == GENESIS_EPOCH || voting_source.epoch == store.justified_checkpoint.epoch || voting_source.epoch + 2 >= current_epoch`. 3. Replace the finalized-side checkpoint-equality check with the spec's ancestor-descent rule: `store.finalized_checkpoint.root == get_checkpoint_block(block_root, finalized.epoch)`, implemented via `f.Ancestor(blockRoot, finalizedSlot)`. 4. Snapshot `currentEpoch` once at the start of the walk (in `getFilteredBlockTree`) and thread it through the recursion so the whole rebuild uses a single store-epoch value — `OnTick` updates `f.time` without holding `f.mu`, so per-leaf `f.Slot()` reads can mix epochs across leaves around a slot boundary. ## Validation Tested on bloatnet at chain tip, on `performance` HEAD + this fix: - **Before fix**: 22-36 block unwinds every ~6 min, one per epoch boundary, persisting indefinitely. 110 events in 3h. - **After fix**: zero unwinds at epoch boundaries observed across 30+ minutes (~5 epoch boundaries crossed). `[filterTree] reject` no longer fires. Effects: - **CPU waste eliminated**: ~30 blocks of execution work were being discarded and re-done every cycle. - **db_size**: unchanged (was already not affected since rolled-back state was in MemoryMutation overlay, not MDBX). - **Logs**: clean — no more spurious `Unwind Execution` lines at tip. ## Test plan - [x] Run on bloatnet to chain tip, observe no large unwinds at epoch boundaries - [x] Confirm no head regression in `Caplin is sending forkchoice` logs (lagSlots stays ~0 at tip) - [ ] Spectest still passes (`make spectest`) ## Refs Full diagnosis trail: #21301

Findings from Copilot (3) and yperbasis (8). The Copilot findings on Peers routing, empty NodeInfo, and close(statusReady) are real; yperbasis identified the matching doc/code mismatches and a few polish items. 1. Peers() / message routing (Copilot #1). The multi-sentry client uses Peers() to decide which sentry owns each peer and routes SendMessageById via that sentry's gRPC. With the previous reporter-only gating, every peer mapped to Servers[0] (which is the *lowest* configured protocol, ETH69 in the default — yperbasis #1 noted the comment said "highest"). Servers[0]'s goodPeers doesn't have eth/70 or eth/71 peers, so SendMessageById silently no-op'd. Fix: each GrpcServer.Peers() now reports its own goodPeers, filtered to skip entries where both protocol and witProtocol are zero. Each peer ends up in exactly one eth-sentry's goodPeers (the negotiated version) plus, at most, the sentry hosting the wit sideprotocol, so admin_peers aggregation is naturally non-duplicating and routing is correct. 2. NodeInfo() (Copilot #3). Non-reporters returning empty replies polluted admin_nodeInfo with blank entries that sorted first. With the shared p2p.Server every sentry has the same Node ID and the same enode, so they now return identical NodeInfo. node/eth.NodesInfo deduplicates by Enode before sorting. 3. SetStatus's close(ss.statusReady) (Copilot #2) panicked for callers that construct GrpcServer outside NewGrpcServer (existing TestSentryServerImpl_* does this). Guarded the close with a nil check; awaitStatus tolerates a nil channel via the select's other cases. 4. SetP2PServer returns an error instead of panicking on double-call (yperbasis #3). The "ownership decided up front" invariant is still enforced, just propagated up the Provider.Initialize path cleanly. 5. SimplePeerCount filters protocol=0 ghosts (yperbasis #4). With wit/0 deduped to one sentry, peers that negotiate eth/N on a different sentry end up as protocol=0/witProtocol=0 ghosts on the wit-hosting sentry. Counting them would emit a bogus eth.ProtocolToString[0] bucket in the GoodPeers log. The Peers() filter already drops them from admin_peers; this aligns SimplePeerCount. 6. awaitStatus logs a Debug line when its timeout fires (yperbasis #5) so operators can tell "core didn't send status in time" from "core never tried" when the caller disconnects the peer with PeerErrorLocalStatusNeeded. Doc comment clarifies the ctx.Done case too (yperbasis #7). Drops the reportsPeers flag and IsPeerReporter accessor — no longer needed once per-sentry goodPeers replaces the reporter gating. SetP2PServer signature loses its second argument. Tests updated for the new shape; new TestGrpcServer_PeersReturnsPerSentryGoodPeers (per-sentry view + ghost-entry filter) and TestGrpcServer_SetStatus_NilStatusReadyIsSafe (close-nil guard). Deferred to follow-up: - BootstrapNodes/DNS resolution helper to dedupe logic between makeP2PServer and startSharedP2PServer (yperbasis #6). - End-to-end Provider.Initialize test in local mode (yperbasis #8). - [r3.4] backport PR (yperbasis #9). Co-Authored-By: Claude

…rgs) (erigontech#21211) ## Summary First incremental cut toward [erigontech#21138](erigontech#21138 structural goal: **one finalize function per parallel-exec result, with `IntraBlockState` used nowhere outside workers**. This PR removes two finalize variants that are already unreachable from production: | Function | LOC | Production callers on main | |---|---|---| | `finalizeWithIBS` (full IBS reconstruction, BAL-compat path) | ~120 | 0 | | `finalizeTx` (delta-args variant, direct fee-balance path) | ~250 | 0 (only `TestFinalizeTx_AllScenarios`) | Plus the test suite that exclusively exercised the delta-args path (`TestFinalizeTx_*`, fixture builders `coinbaseIsRecipientScenario` / `selfTransferScenario`, helpers `hasCoinbaseDelta` / `adjustForTransferDelta` / `buildWriteMap` / `fmtWriteVal` / `extractBalanceReads`) and one stale comment in `engine_api_bal_test.go`. Net: **-690 lines**, **+1 line**, no semantic change. ## Why now The parallel-exec correctness stack landed in erigontech#21153 (merged 2026-05-15). The combined effect of that PR plus erigontech#21177 routed all production finalize flows through `finalizeTxSimple` — these two functions became unreachable. Removing them shrinks `exec3_parallel.go` from 3640 → 3268 lines, making subsequent IBS-dependency drains easier to review. The next steps in the erigontech#21138 sequence: - **PR 2** — drain IBS dependency erigontech#1 (SD address lookup): `LogSelfDestructedAccounts` consumes `result.SelfDestructedWithBalance` only, no `ibs.GetRemovedAccountsWithBalance()` call. - **PR 3** — drain IBS deps erigontech#2 (`AddLog` → return logs) and erigontech#3 (`AddBalance` bookkeeping → already on `CollectorWrites`); `finalizeTxSimple` becomes IBS-free. - Later — `normalizeWriteSet` → `filterWritesByVersionMap`; `calcState.ApplyWrites` → `VersionedWrites.TouchUpdates`; move EIP-7002/7251 syscall execution into the worker pool. End state: one `finalizeTx`, no IBS outside workers. ## Test plan - [x] `make lint` clean - [x] `make test-short` (full `execution/stagedsync`, `execution/state`, `execution/tests`, `rpc/jsonrpc` packages) green under `EXEC3_PARALLEL=true` - [x] BAL family (`TestEngineApiBAL*`) 8/8 parallel - [x] `TestEIP7708BurnLogWhenCoinbaseSelfDestructs` green - [x] Surviving `TestFinalizeTxSimple_*` family green - [ ] CI: race-tests, kurtosis, hive matrix legs green on both serial and parallel ## Related - erigontech#21017 — serial/parallel CI matrix that surfaces parallel-leg failures (now rebuilt on post-erigontech#21153 main; CI fresh-running) - erigontech#21153 — parallel-exec correctness stack (merged) - erigontech#21138 — heuristic-removal / IBS-dependency-removal tracker (the parent)

…ash LRU Two surgical commits bundled (both touch the code-read hot path): 1. IntraBlockState.GetCodeSize now loads the full bytes via stateReader.ReadAccountCode on first touch and populates stateObject.code, so subsequent same-addr EXTCODESIZE / EXTCODEHASH / CALL within the tx are in-struct slice-len calls (~50 ns), not full reader round-trips. Mirrors geth's pattern at core/state/state_object.go ~Code() — pay one read per addr per tx, free for the rest. 2. CodeCache.addrToHash switched from a no-op-when-full maphash.Map[versionedAddressID] to an LRU lru.Cache[[20]byte, versionedAddressID] (hashicorp/golang-lru/v2, already imported elsewhere). Cap derived from the existing byte budget at ~28 bytes/entry (~580 k entries for the 16 MB default). Fresh-address workloads (mainnet thousands of new addrs per block) now warm up the addr layer over time instead of silently dropping new entries forever; matches geth's lru.Cache at core/state/database_code.go. The hashToCode layer is unchanged (content-addressed bytes, immutable, byte-capped with new-entry no-op when full — the same semantic as before since code bytes by codeHash never change). Bench on the EXTCODESIZE-EXISTING_CONTRACT-30M family: 62.34 mgas/s (was 61.50). The marginal gain is small on this bench because BAL prefetch already populates the cache layers; neither lever fires heavily. The expected wins are on non-BAL workloads where EXTCODESIZE-loop patterns repeat within a tx (#1) and fresh-address-churn mainnet blocks fill the addr layer (#2). Updated TestCodeCache_AddrCapacityLimit to assert LRU eviction (was asserting no-op-when-full); the prior behaviour was the bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ckly (erigontech#21483) ## Problem When a merge-queue run has a hive-eest shard fail, the failing job calls `gh run cancel ${{ github.run_id }}` (added in erigontech#21445). That sends SIGTERM to all in-flight matrix siblings, but the Docker-bound hive simulators take ~20 minutes to actually drain. `ci-gate` is `if: always()` and waits for every `needs` job to reach a terminal state, so the broken PR sits at `AWAITING_CHECKS` for the full drain time — blocking the head of the merge queue. Concrete example from today (PR erigontech#21470 at position erigontech#1): - 08:29:57 — `hive-eest / test-hive-eest (paris+shanghai, serial)` fails, calls `gh run cancel 26562610423`, emits the "Merge-queue root-cause failure" annotation from erigontech#21445. - 08:48 (~19 min later) — paris+shanghai-parallel, prague-serial/parallel, cancun-serial/parallel, osaka-parallel, rlp-serial/parallel, and glamsterdam-devnet-parallel were all still `in_progress`. Every other ci-gate child (tests, race-tests, eest-spec-tests, kurtosis, hive, lint, bench, repro, sonar, caplin) had already completed. The bottleneck was specifically the hive-eest matrix siblings. ## Fix ```yaml strategy: fail-fast: ${{ github.event_name == 'merge_group' }} ``` - **In `merge_group`**: first failed shard immediately cancels all siblings at the GitHub API layer — much faster than the `gh run cancel` → SIGTERM → runner-drain path. ci-gate's `needs` reach terminal state in seconds, ci-gate fails, the broken PR is evicted. - **In PR runs**: stays `false`, so authors still see the full failure breakdown across every shard. No regression in PR feedback. ## What's left in place and why The per-job `gh run cancel` step (test-hive-eest.yml lines 311-317) stays. Two reasons: - Matrix `fail-fast` only cancels siblings **within the same matrix** — it doesn't cancel sibling reusable workflows. If a future failure pattern leaks across workflows, `gh run cancel` still covers it. - ci-gate.yml's root-cause annotator (line 188) keys off "the leaf that ran `gh run cancel` successfully" to single out the true root cause among collateral cancellations. Removing the step would silently regress erigontech#21445's attribution. ## Scope choice Only `test-hive-eest.yml` is changed. Other matrix-bearing reusable workflows (`test-all-erigon.yml`, `test-all-erigon-race.yml`, `test-eest-spec.yml`, `test-kurtosis-assertoor.yml`, `test-hive.yml`, `test-bench.yml`) all use `fail-fast: false` too, but none of them were the queue-blocking long pole in this incident. Keeping the patch minimal; we can generalize if another workflow becomes the bottleneck. ## Tradeoff to be aware of Queue runs will now show siblings as `cancelled` instead of `failed` whenever any one shard fails. That's the correct tradeoff in `merge_group` — the goal is fast eviction, not detailed diagnostics; full per-shard breakdown remains available on the PR run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

profiling at tip (MDBX, 60s window) pinned bytes.Clone as the #1 single allocation source at 64.3 GB / 60s (~16.85% of total alloc traffic). 37.86 GB / 60s (~631 MB/sec) of that came from the defensive common.Copy at TrieContext.Branch. The clone was redundant: every downstream consumer either reads the slice inline (cell.fillFromFields copies into pre-allocated arrays; merger.Merge consumes and produces a fresh buffer; trie_reader parses bytes into cells; unfoldBranchNode similar) or clones it itself at the queue boundary (getDeferredUpdate clones both prefix and prev when storing in the deferred-update pool). Branch's clone was a third copy of bytes that nothing needed to retain. Document the new contract (borrowed slice valid for the current ComputeCommitment scope) and update the test that exercised the old "returns owned bytes" guarantee to verify the new aliasing guarantee instead. After-measurement on the rig is blocked by an unrelated stage-loop persistence inconsistency (chaindata head pointer ahead of state writes on restart) that's reproducing on every restart cycle today; the change is mathematically minimal (single Clone removal + test contract update) and unit tests + make lint pass, so shipping on the math. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addresses the CHANGES_REQUESTED review, verified against `main`: database.md - Receipts ARE stored: ReceiptDomain (required) + RCacheDomain (optional). Reword the "not stored / re-computed" claim to: compact per-txn receipt metadata persisted in a required domain, full receipts optionally cached, logs re-derived by re-execution only when not cached. - snapshots/domain holds 6 domains (4 state + 2 receipt), not 4. - Soften "rm -rf chaindata/": recoverable, but re-derives state from snapshots and resyncs the tip from the CL over the Engine API (devp2p block download removed in #21505) — not a quick rebuild. architecture.md - prune `full` keeps a rolling ~262k-block window (DefaultPruneDistance), not all post-Merge blocks; add the `blocks` prune mode to the table. - Pipeline: Snapshots is stage #1; no separate Commitment stage (it runs inside Execution); add Senders. - Mermaid: Downloader (BitTorrent) is independent of Sentry (devp2p); blocks arrive from Caplin via the Engine API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…#21579) Two tweaks to how Claude Code operates in this repo. ## 1. Bare `#N` references in GitHub text Claude has used a bare `#`+number to mean "point N" — e.g. writing "`erigontech#1`" for "the nit from point 1" — in PR descriptions, issue descriptions, and comments. GitHub auto-links that to the issue/PR with that number, so "`erigontech#1`" turns into a link to the repo's first-ever PR. This came up in [PR 21510](erigontech#21510 (comment)) and has happened on other occasions. `agents.md` now instructs Claude to write "point 1" / "item 1" / "the first nit" and reserve `#N` for genuine issue/PR references. ## 2. Claude attribution in commits, PRs, and issues Stop adding `Co-Authored-By: Claude` and `🤖 Generated with Claude Code` lines. Enforced deterministically in `.claude/settings.json` rather than as a prose instruction, so it does not depend on the model remembering: - `includeCoAuthoredBy: false` — the one switch the installed Claude Code (v2.1.160) honors across *both* the commit and PR code paths; it returns empty attribution for each. - `attribution: { commit: "", pr: "" }` — the newer, forward-looking mechanism; empties the footer text. **Why both:** `attribution` alone is insufficient today — the PR code path does a truthy check on `attribution.pr`, so an empty string is ignored and the "Generated with" footer still leaks into PR bodies. `includeCoAuthoredBy: false` is the reliable global off-switch. `agents.md` documents the rule for human readers too. --- Docs/config only — no Go changes, so build/lint are unaffected. This PR follows both rules itself: no attribution trailers on the commit, and the `erigontech#1` examples above are shown as code so they don't auto-link.

…LOAS (erigontech#21698) ## Problem Lido Hoodi validators lost attestation score after switching to `release/3.5`. They missed **head votes** at ~30–60% per epoch (≈2× a random network sample) while **target and source votes were always correct** — the attestations were produced, published, and included on-chain on time, but voted for a **stale head**. ## Root cause The GLOAS merge (erigontech#18956) added `indexedWeightStore` (`cl/phase1/forkchoice/weight_store_indexed.go`). It is instantiated unconditionally, and its `IndexVote`/`RemoveVote` are called per validator index on **every** attestation via `setLatestMessage`, regardless of fork. However its results are only consumed by GLOAS `get_head`. Pre-GLOAS `get_head` (and `timing.go`) use the non-indexed `weightStore`, and `GetIndexedWeightStore()` has no callers. So on pre-GLOAS chains the index is maintained but never read. On a high-validator-count network (Hoodi, ~1.2M active validators) this maintenance was the single largest CPU consumer (`RemoveVote`, ~15% of CPU, fresh slice allocation per call) plus a large GC load — all under the fork-choice write lock. The lock contention delayed `OnBlock` (block import) and `get_head` past the attestation deadline, so the head served to the validator client was stale → wrong head votes. ## Fix Gate the indexed-store maintenance to the GLOAS vote path only, via an explicit `maintainIndexedVotes` flag on `setLatestMessage` (mirroring the existing `updateLatestMessages` pre-GLOAS/GLOAS dispatch). ## Validation (live Hoodi node, before vs after rebuild) - Node CPU (operator Grafana): ~7.5% → <2%. - 60s CPU profile: total samples 148.9% → 41.6%; `indexedWeightStore.RemoveVote` (was erigontech#1 hotspot) and the GC storm both gone. - Head-at-attestation-deadline staleness: 64% → 4%. - Validator head-vote misses: 40% → 0% in the first fully post-fix epoch (network sample ~23% in the same epoch); target/source unaffected throughout. Affects all pre-GLOAS networks on this branch (mainnet/Sepolia/Gnosis), with impact scaling by validator count. ## Tests - `TestPreGloasDoesNotMaintainIndexedWeightStore` — fails before the fix (the pre-GLOAS path populates the index), passes after. - `TestGloasMaintainsIndexedWeightStore` — locks in that the GLOAS path still indexes votes. - `go test ./cl/phase1/forkchoice/...`, `make lint`, `make erigon` all clean. ## Follow-up `indexedWeightStore` is currently unused even on GLOAS (`get_head` uses the non-indexed store). The Caplin team should either wire it into GLOAS `get_head` or remove it; tracking separately.

atsushi-ishibashi and others added 3 commits February 7, 2019 10:44

core/state: more memory efficient preimage allocation (#16663)

81801cc

Resolve

ea3155c

AlexeyAkhunov merged this pull request into master May 28, 2019

yperbasis added a commit that referenced this pull request Jun 3, 2019

Merge pull request #1 from ledgerwatch/master

e1fa134

Circle CI

AlexeyAkhunov added a commit that referenced this pull request Feb 14, 2021

Merge pull request #1 from ledgerwatch/go-package

fe462b7

Add Go package option

manuelvillar mentioned this pull request May 13, 2021

Chain stuck after Berlin hardfork #1929

Closed

MiraWells mentioned this pull request Aug 10, 2021

erigon panic at windows #2507

Closed

mfw78 mentioned this pull request Sep 16, 2021

Stuck with p2p GoodPeers log messages, failing to reach synced block height #2693

Closed

avocadochicken mentioned this pull request Dec 8, 2021

Crash - maybe a bug, maybe nothing at all... #3104

Closed

dmitry123 pushed a commit that referenced this pull request Jan 11, 2022

Merge pull request #1 from binance-chain/bsc-ankr

2345b38

the ankr's implementation of erigon for bsc

enriavil1 pushed a commit that referenced this pull request Jan 29, 2022

Merge pull request #1 from maticnetwork/krishna/bor-consensus

fe45265

Bor consensus implementation

Giulio2002 added a commit that referenced this pull request Jun 14, 2022

progress #1

519fd20

Giulio2002 added a commit that referenced this pull request Jun 14, 2022

Made in memory mutation compatible with all buckets (#4454)

ff5cbcb

* progress #1 * progress #2 * proper file naming * more mature memory mutation

Giulio2002 added a commit that referenced this pull request Jul 3, 2022

experiment #1

0eb0260

Giulio2002 added a commit that referenced this pull request Jul 3, 2022

Fixed hive test on invalid transition payload (#4618)

b980280

* experiment #1 * experiment #2 * experiment #3 * experiment 4

revitteth referenced this pull request in revitteth/erigon Jul 13, 2022

Merge pull request #1 from revittm/docker/hive-ci-tests

3ae19b4

feat(ci): run hive tests as part of CI

BlinkyStitt referenced this pull request in llamanodes/erigon Jan 3, 2023

Merge pull request #1 from ledgerwatch/devel

b513a03

Upstream changes

somnergy mentioned this pull request Jul 12, 2023

Sync from scratch is broken for Gnosis mainnet #7881

Closed

Brando753 mentioned this pull request Sep 14, 2023

OOM kills occurring #8189

Closed

zzif mentioned this pull request Sep 14, 2023

Recurring out of memory kills #8193

Closed

battlmonstr pushed a commit that referenced this pull request Sep 14, 2023

Merge pull request #1 from ledgerwatch/on_peer_connect

4df3c6b

manav2401 mentioned this pull request Dec 1, 2023

eth/stagedsync: fixes for mining on devnet #8874

Closed

zelonH mentioned this pull request Jun 27, 2024

Error when deploying a node using snapshot，dbx_env_open: MDBX_CORRUPTED，Looks like there's enough space #10814

Closed

taratorio pushed a commit that referenced this pull request Jul 23, 2024

Merge pull request #1 from quickchase/upgrade-shell-script

b44ae5a

Upgrade shell script

JkLondon mentioned this pull request Nov 21, 2024

TxLookup index per-txn-granularity #12424

Merged

dvovk mentioned this pull request Apr 2, 2025

[diag] Expose Transactions from Newly Received Blocks #14399

Closed

bonze82 mentioned this pull request Apr 5, 2025

Crashing debian journald! #14454

Closed

yperbasis mentioned this pull request May 7, 2026

[r3.4] docs: add llms.txt generator script and update root llms.txt #21000

Merged

17 tasks

mh0lt mentioned this pull request May 7, 2026

execution/stagedsync: don't flag maxBlockNum on partial-batch apply-loop exit #21039

Merged

5 tasks

yperbasis mentioned this pull request May 8, 2026

state, stagedsync: parallel-exec wrong-root fix for SD vs EIP-161 emptyRemoval #21032

Merged

5 tasks

Copilot AI mentioned this pull request May 11, 2026

[r3.4] ci(docs): auto-update hardware requirements disk sizes from sync CI #21030

Merged

This was referenced May 12, 2026

Parallel-exec: residual functional failures skipped to land the exec-mode CI matrix #21136

Open

Parallel-exec: route the per-tx writeset through one faithful path; remove normalizeWriteSet/calcState; carry serial-finalize signals on ExecutionResult #21138

Open

This was referenced May 13, 2026

commitment/nibbles: add V2 key encoder/decoder #21146

Draft

execution: fix unwind edge cases for parallel exec and add benchmark-parallel exec shards to CI #21163

Merged

This was referenced May 15, 2026

ci: matrix-test serial vs parallel exec across the test workflows #21017

Merged

execution/stagedsync: drop dead finalizeWithIBS + finalizeTx (delta-args) #21211

Merged

sudeepdino008 mentioned this pull request May 20, 2026

cl/phase1/forkchoice: align getFilterBlockTree with consensus spec #21310

Merged

3 tasks

mh0lt mentioned this pull request May 26, 2026

BAL-driven parallel commitment (PR #5 of the perf stack) #21416

Open

This was referenced May 27, 2026

ci: bump assertoor v0.0.17→v0.1.2 + lighthouse/teku for gloas-spec compatibility #21449

Draft

ci: fail-fast hive-eest matrix on merge_group so broken PRs evict quickly #21483

Merged

taratorio mentioned this pull request May 29, 2026

execution/state, execution/protocol: avoid heap escape of traced uint256 values #21510

Merged

yperbasis mentioned this pull request Jun 1, 2026

docs(fundamentals): add Architecture and Database pages [main] #21500

Merged

taratorio mentioned this pull request Jun 9, 2026

State Cache Consolidation (PR #1 of the perf stack) #21380

Open

lystopad mentioned this pull request Jun 9, 2026

cl/phase1/forkchoice: don't maintain GLOAS indexed weight store pre-GLOAS #21698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull from go-ethereum up to 26aea73 (7 Feb 2019)#1

Pull from go-ethereum up to 26aea73 (7 Feb 2019)#1
AlexeyAkhunov merged 3 commits into
masterfrom
upstream_1

AlexeyAkhunov commented May 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AlexeyAkhunov commented May 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants