Pull from go-ethereum up to 26aea73 (7 Feb 2019)#1
Merged
Conversation
* node: close AccountsManager in new Close method * p2p/simulations, p2p/simulations/adapters: handle node close on shutdown * node: move node ephemeralKeystore cleanup to stop method * node: call Stop in Node.Close method * cmd/geth: close node.Node created with makeFullNode in cli commands * node: close Node instances in tests * cmd/geth, node: minor code style fixes * cmd, console, miner, mobile: proper node Close() termination
dmitry123
pushed a commit
that referenced
this pull request
Jan 11, 2022
the ankr's implementation of erigon for bsc
enriavil1
pushed a commit
that referenced
this pull request
Jan 29, 2022
Bor consensus implementation
Giulio2002
added a commit
that referenced
this pull request
Jun 14, 2022
Giulio2002
added a commit
that referenced
this pull request
Jul 3, 2022
revitteth
referenced
this pull request
in revitteth/erigon
Jul 13, 2022
feat(ci): run hive tests as part of CI
Closed
battlmonstr
pushed a commit
that referenced
this pull request
Sep 14, 2023
17 tasks
5 tasks
5 tasks
mh0lt
added a commit
that referenced
this pull request
May 8, 2026
- Remove dead LightCollector.DeleteAccount IncarnationPath emit (concern #3): IBS.Selfdestruct's versionWritten already publishes IncarnationPath via the version map; CollectorWrites is not the rawWrites source. - Add MIRROR-OF tag on versionedWriteCollector.DeleteAccount cross- referencing LightCollector.DeleteAccount and noting that IncarnationPath is emitted via IBS.Selfdestruct, not either DeleteAccount (concern #7). - Rewrite TestSDOfPreExistingContract_FullPipeline to populate vm with IBS.Selfdestruct's versionWritten emits; assert BalancePath=0 comes from vm.Read (not preBlockBalance from stateReader fallback) and that the trie sees a leaf with {Balance=0, Nonce=preBlock, CodeHash=preBlock} through the default FlushToUpdates branch (concern #1). - Add TestSDStorageCascade_EmitsPerSlotDeletes locking in the storage- cascade load-bearing invariant: normalizeWriteSet's vm.StorageKeys loop emits StoragePath=0 entries that overwrite cs.storageState's pre-SD slot values, so each slot emits DeleteUpdate (concern #6). - Update defensive-test docstrings to spell out unreachability from real production writesets and cross-link to the realistic-pipeline tests that cover the actual fix mechanism (concern #4).
Sahil-4555
pushed a commit
to Sahil-4555/erigon
that referenced
this pull request
May 9, 2026
…tyRemoval (erigontech#21032) ## Summary Fixes a wrong trie-root in the parallel commitment calculator when post-tx writesets are indistinguishable between two cases: 1. **SELFDESTRUCT of a pre-existing contract** — serial keeps the leaf with `incarnation>0`, zero balance/nonce/empty codeHash. Parallel was emitting `DeleteUpdate` and removing the leaf. 2. **EIP-161 emptyRemoval** of a touched-empty account (e.g. `0xff…fe` after the EIP-4788 system call) — serial emits `DeleteUpdate`. Parallel was emitting a zero-account UPDATE. The discriminator is the **pre-block incarnation**, which the writeset alone didn't carry. Fix wires it through `LightCollector.DeleteAccount` → `IncarnationPath` write → `calcAccountState.Incarnation` → 3-way branch in `FlushToUpdates`. ## Dependency direction This PR is a **precursor for erigontech#21017** (the CI matrix that runs every test under both `serial` and `parallel` exec modes). Without this fix, parallel-mode tests on hive `rpc-compat`, `engine api/cancun/withdrawals`, and the eest selfdestruct sub-suites all fail with wrong-trie-root errors at SD/empty-removal blocks. **erigontech#21017 cannot land until this PR lands.** The matrix in erigontech#21017 will validate this PR end-to-end via the hive parallel sub-suites — meaning this PR's parallel-exec changes don't run in *this* PR's own CI, only in erigontech#21017's. The rebased erigontech#21017 is the meaningful go/no-go signal. ## Changes * `LightCollector.DeleteAccount` now emits `IncarnationPath` alongside `SelfDestructPath=true` when `original.Incarnation > 0`. Calc receives the pre-deletion incarnation through the same channel as every other write — no exec-side state leakage. * `calc_state.ApplyWrites` consumes `IncarnationPath` into `calcAccountState.Incarnation` via direct type-assertion (panic on mismatch — see *Concern 3* below). * `calc_state.FlushToUpdates` switches on a 3-way `Deleted` branch with `isAllZero` defense-in-depth: * `Deleted && Incarnation>0 && all-zero` → zero-account UPDATE (matches serial's `DomainDel`-leaves-post-CREATE-encoding for SD'd contracts) * `Deleted && all-zero && Incarnation==0` → `DeleteUpdate` (matches serial's `DomainDel`-truly-empties for EIP-161 emptyRemoval) * `Deleted` with retained non-zero values → regular UPDATE — defensive coverage for OOG-CREATE2-with-retained-balance and any future write-ordering race * `SelfDestructPath` also marks all tracked storage slots dirty so `FlushToUpdates` emits per-slot updates alongside the account reset. Load-bearing invariant: `normalizeWriteSet`'s `vm.StorageKeys(addr)` loop emits `StoragePath=0` entries that arrive in `ApplyWrites` after `SelfDestructPath`, overwriting the marked slots' values to zero so they emit `DeleteUpdate` not `StorageUpdate(pre-SD value)`. Inline comment in `calc_state.go` spells this out — see *Concern 2* below. ## Earlier draft snags (resolved) The first draft also added an `IncarnationPath > 0` exclusion to `normalizeWriteSet`'s empty-account check. This was **redundant** (the empty-check already requires `Nonce == 0`, which excludes successful CREATE/CREATE2) and **broke OOG-during-CREATE2 cases** (which leave `Nonce=0/Balance=0/Code=empty/Incarnation=1` and *must* still be deleted). Removed in `9539998f14`. The `exec3_parallel.go` diff in this PR is now comments-only. ## Reviewer concerns addressed ### erigontech#1 (yperbasis): PR description was stale This body. ✓ ### erigontech#2 (yperbasis + Copilot): SD storage-cascade load-bearing invariant Inline comment in `calc_state.go`'s `SelfDestructPath` case now spells out the dependency on `normalizeWriteSet`'s `vm.StorageKeys(addr)` loop. ✓ ### erigontech#3 (yperbasis + Copilot): IncarnationPath guarded type-assertion Changed to direct `w.Val.(uint64)` to match the other paths. Silent zero would route a real SD into the EIP-161 branch and reproduce the very wrong-root bug — better to panic at the source. ✓ ### erigontech#4 (yperbasis): `TestFlushToUpdates_DeletedWithRetainedBalance_EmitsRegularUpdate` docstring Updated to clarify: this is **defensive coverage** for the third `FlushToUpdates` branch in isolation, NOT a direct repro of the eest_devnet OOG path. The actual OOG fix is the removal of the redundant `IncarnationPath > 0` clause from `normalizeWriteSet` (the OOG writeset has `Nonce=0` → empty-account → `DeleteUpdate`, not `Deleted+RetainedBalance`). End-to-end coverage of that path lives in the eest_devnet suite, surfaced via erigontech#21017's matrix. ✓ ### erigontech#5 (yperbasis): `versionedWriteCollector.DeleteAccount` asymmetry — *intentional non-fix* Decision: **keep the asymmetry, document why.** Inline comment added on `versionedWriteCollector.DeleteAccount` explaining: * It's wired only into `txResult.finalize` (fee calc + post-Cancun system calls). * Neither path SDs a pre-existing contract today, so the SD-with-incarnation differentiator is unreachable from here. * If a future caller ever does emit `DeleteAccount` on a pre-existing contract through this collector, the comment flags that this code should mirror `LightCollector.DeleteAccount`'s `IncarnationPath` emit. Adding the emit speculatively was rejected because: (a) it changes the writeset shape for paths that today don't need it, (b) any test exercising the new emit would be vacuous since no production caller hits the `original.Incarnation > 0` branch, and (c) the comment is enough to attribute the bug at first sight if someone *does* reach that code path in the future. ## Intentional non-fixes * **Concern erigontech#5 above** — `versionedWriteCollector.DeleteAccount` left without the `IncarnationPath` emit (rationale above). * **Defensive `TestFlushToUpdates_DeletedWithRetainedBalance` test kept** despite the state being unreachable from real LightCollector writesets today — protects the FlushToUpdates branch in isolation against future ApplyWrites refactors that might drop the `BalancePath`-clears-`Deleted` invariant. ## Test plan - [x] All 6 unit tests in `calc_state_test.go` pass (`TestFlushToUpdates_DeletedWithIncarnation_EmitsZeroAccountUpdate`, `TestFlushToUpdates_DeletedWithoutIncarnation_EmitsDelete`, `TestFlushToUpdates_DeletedWithRetainedBalance_EmitsRegularUpdate`, `TestFlushToUpdates_LiveAccount_EmitsFullUpdate`, `TestApplyWrites_IncarnationPath`, `TestApplyWrites_BalancePathClearsDeleted`) - [x] eest_devnet `for_amsterdam/constantinople/eip1052_extcodehash/extcodehash/extcodehash_subcall_create2_oog` all 6 variants pass locally - [x] Full `for_amsterdam/constantinople` eest_devnet suite passes - [x] `make lint` clean - [x] CI on `9539998f14` was green End-to-end validation comes via erigontech#21017's CI matrix once it rebases on top of this PR.
This was referenced May 13, 2026
This was referenced May 15, 2026
3 tasks
sudeepdino008
added a commit
that referenced
this pull request
May 21, 2026
…21310) ## Summary Two spec-deviations in `getFilterBlockTree` compound to reject valid current-epoch leaves at every epoch boundary, causing Caplin's `GetHead` to fall back to the justified checkpoint root (a ~30-50 slot regression) and triggering execution-side unwinds. ## Observed symptom (from #21301) On bloatnet, once execution catches up to chain tip, the node enters a steady-state cycle of **22-36 block execution unwinds every ~6 minutes** — one per epoch boundary. Histogram from one ~3h run (110 unwind events): ``` 1 size=11 1 size=14 4 size=15 1 size=16 4 size=18 2 size=19 3 size=20 5 size=21 9 size=22 14 size=23 ← most common 3 size=24 5 size=25 4 size=26 4 size=27 8 size=28 1 size=30 2 size=31 4 size=32 3 size=33 4 size=34 ...many size=35-36 ``` The unwinds are on the **same chain** (no real reorg) — execution is forced to roll back to an older head because Caplin regresses its `GetHead()` return value: ``` 08:38:45 Caplin FCU: headSlot=652928 currentSlot=652988 lagSlots=60 eth1Head=0xe0f3... 08:39:36 Caplin FCU: headSlot=652993 currentSlot=652993 lagSlots=0 eth1Head=0x263b... ← at tip 08:44:00 Caplin FCU: headSlot=652959 currentSlot=653015 lagSlots=56 eth1Head=0x7674... ← regressed 34 slots ``` The 08:44:00 FCU triggers a 23-block exec unwind: ``` [forkchoice] entering unwind path fcuNum=24701328 canonHash=0x7674... fcuHash=0x7674... hashesDiffer=false finishProgress=24701350 executionAhead=true Unwind Execution from=24701350 to=24701327 ``` `hashesDiffer=false` confirms same-chain. `executionAhead=true` because exec raced ahead of Caplin's regressing head. ## Root cause: filter rejects mid-epoch leaves Tracing inside `GetHead`: ``` [GetHead] cache miss, rebuilding from justifiedCheckpoint justifiedEpoch=20410 [filterTree] reject leaf: checkpoint mismatch blockRoot=0xe644... slot=653150 justifiedOk=false finalizedOk=false storeJustEpoch=20410 blockJustEpoch=20411 storeFinalEpoch=20409 blockFinalEpoch=20410 [GetHead] filtered tree size viableBlocks=0 totalHeads=1 [GetHead] rebuilt head headHash=0x... headSlot=653119 ← fallback to justified root Caplin is sending forkchoice headSlot=653119 currentSlot=653174 lagSlots=55 ``` Every leaf in the tree gets filtered because its `unrealizedJustifications[blockRoot].epoch = N+1` but the store's `justifiedCheckpoint.epoch = N` (store's realized lags by 1 epoch mid-epoch). The walk falls back to the justified root → 50+ slot regression. ## Spec deviations (the fix) ### 1. Voting-source selection used unrealized unconditionally Spec [`get_voting_source`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/fork-choice.md#get_voting_source) picks unrealized only for **prior-epoch** blocks (pull-up justification view); current-epoch blocks should use the block's realized state checkpoint: ```python def get_voting_source(store: Store, block_root: Root) -> Checkpoint: block = store.blocks[block_root] current_epoch = get_current_store_epoch(store) block_epoch = compute_epoch_at_slot(block.slot) if current_epoch > block_epoch: return store.unrealized_justifications[block_root] else: return store.block_states[block_root].current_justified_checkpoint ``` Erigon's code: ```go // Use per-block unrealized justifications (spec: store.unrealized_justifications[block_root]) // Fall back to realized checkpoints if unrealized not available currentJustifiedCheckpoint, has := f.getUnrealizedJustification(blockRoot) if !has { currentJustifiedCheckpoint, has = f.forkGraph.GetCurrentJustifiedCheckpoint(blockRoot) ... } ``` → Always picks unrealized when available. The "fall back" comment misreads the spec — spec is a conditional branch, not a fallback. ### 2. Justified/finalized check was strict equality Spec [`filter_block_tree`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/fork-choice.md#filter_block_tree) has a lenient `+2 >= current_epoch` lookback: ```python correct_justified = ( store.justified_checkpoint.epoch == GENESIS_EPOCH or voting_source.epoch == store.justified_checkpoint.epoch or voting_source.epoch + 2 >= current_epoch # ← lenient lookback ) ``` Erigon's code only allowed the first two clauses (strict `Equal()`). ## Regression provenance The deviations were introduced in #20035 (`caplin: unified Engine API client for standalone mode`, Mar 25 2026) by Mark Holt: ``` 082697d cl/phase1/forkchoice/get_head.go +Use per-block unrealized justifications ``` Before that PR, `getFilterBlockTree` used `f.forkGraph.GetCurrentJustifiedCheckpoint(blockRoot)` directly — which returns the block's **realized** justified checkpoint. Block.realized matched store.realized (both on the same epoch-boundary timeline), so the equality check passed. PR #20035 introduced the per-block unrealized lookup (`store.unrealized_justifications` per the spec), but missed the conditional branching (spec deviation #1) and didn't add the +2 lookback (spec deviation #2). The result: block.unrealized goes ahead by 1 epoch mid-epoch, and store.realized doesn't catch up until `OnTick`'s epoch-boundary path runs. ## Fix Restore spec-faithful behavior in `getFilterBlockTree`: 1. Branch the voting source selection on `currentEpoch > blockEpoch`: - Prior-epoch → `store.unrealized_justifications[block_root]` - Current/future-epoch → block's realized `current_justified_checkpoint` In both branches, return `false` (reject the leaf) if the lookup is absent. `OnBlock` populates `unrealizedJustifications` for every imported block above the finalized slot, so a missing entry indicates either an invariant violation or a leaf below finalized — neither should produce a viable head. 2. Replace the strict `Equal()` check on the justified side with the spec's three flat disjuncts: `store.justified_checkpoint.epoch == GENESIS_EPOCH || voting_source.epoch == store.justified_checkpoint.epoch || voting_source.epoch + 2 >= current_epoch`. 3. Replace the finalized-side checkpoint-equality check with the spec's ancestor-descent rule: `store.finalized_checkpoint.root == get_checkpoint_block(block_root, finalized.epoch)`, implemented via `f.Ancestor(blockRoot, finalizedSlot)`. 4. Snapshot `currentEpoch` once at the start of the walk (in `getFilteredBlockTree`) and thread it through the recursion so the whole rebuild uses a single store-epoch value — `OnTick` updates `f.time` without holding `f.mu`, so per-leaf `f.Slot()` reads can mix epochs across leaves around a slot boundary. ## Validation Tested on bloatnet at chain tip, on `performance` HEAD + this fix: - **Before fix**: 22-36 block unwinds every ~6 min, one per epoch boundary, persisting indefinitely. 110 events in 3h. - **After fix**: zero unwinds at epoch boundaries observed across 30+ minutes (~5 epoch boundaries crossed). `[filterTree] reject` no longer fires. Effects: - **CPU waste eliminated**: ~30 blocks of execution work were being discarded and re-done every cycle. - **db_size**: unchanged (was already not affected since rolled-back state was in MemoryMutation overlay, not MDBX). - **Logs**: clean — no more spurious `Unwind Execution` lines at tip. ## Test plan - [x] Run on bloatnet to chain tip, observe no large unwinds at epoch boundaries - [x] Confirm no head regression in `Caplin is sending forkchoice` logs (lagSlots stays ~0 at tip) - [ ] Spectest still passes (`make spectest`) ## Refs Full diagnosis trail: #21301
lystopad
added a commit
that referenced
this pull request
May 21, 2026
Findings from Copilot (3) and yperbasis (8). The Copilot findings on Peers routing, empty NodeInfo, and close(statusReady) are real; yperbasis identified the matching doc/code mismatches and a few polish items. 1. Peers() / message routing (Copilot #1). The multi-sentry client uses Peers() to decide which sentry owns each peer and routes SendMessageById via that sentry's gRPC. With the previous reporter-only gating, every peer mapped to Servers[0] (which is the *lowest* configured protocol, ETH69 in the default — yperbasis #1 noted the comment said "highest"). Servers[0]'s goodPeers doesn't have eth/70 or eth/71 peers, so SendMessageById silently no-op'd. Fix: each GrpcServer.Peers() now reports its own goodPeers, filtered to skip entries where both protocol and witProtocol are zero. Each peer ends up in exactly one eth-sentry's goodPeers (the negotiated version) plus, at most, the sentry hosting the wit sideprotocol, so admin_peers aggregation is naturally non-duplicating and routing is correct. 2. NodeInfo() (Copilot #3). Non-reporters returning empty replies polluted admin_nodeInfo with blank entries that sorted first. With the shared p2p.Server every sentry has the same Node ID and the same enode, so they now return identical NodeInfo. node/eth.NodesInfo deduplicates by Enode before sorting. 3. SetStatus's close(ss.statusReady) (Copilot #2) panicked for callers that construct GrpcServer outside NewGrpcServer (existing TestSentryServerImpl_* does this). Guarded the close with a nil check; awaitStatus tolerates a nil channel via the select's other cases. 4. SetP2PServer returns an error instead of panicking on double-call (yperbasis #3). The "ownership decided up front" invariant is still enforced, just propagated up the Provider.Initialize path cleanly. 5. SimplePeerCount filters protocol=0 ghosts (yperbasis #4). With wit/0 deduped to one sentry, peers that negotiate eth/N on a different sentry end up as protocol=0/witProtocol=0 ghosts on the wit-hosting sentry. Counting them would emit a bogus eth.ProtocolToString[0] bucket in the GoodPeers log. The Peers() filter already drops them from admin_peers; this aligns SimplePeerCount. 6. awaitStatus logs a Debug line when its timeout fires (yperbasis #5) so operators can tell "core didn't send status in time" from "core never tried" when the caller disconnects the peer with PeerErrorLocalStatusNeeded. Doc comment clarifies the ctx.Done case too (yperbasis #7). Drops the reportsPeers flag and IsPeerReporter accessor — no longer needed once per-sentry goodPeers replaces the reporter gating. SetP2PServer signature loses its second argument. Tests updated for the new shape; new TestGrpcServer_PeersReturnsPerSentryGoodPeers (per-sentry view + ghost-entry filter) and TestGrpcServer_SetStatus_NilStatusReadyIsSafe (close-nil guard). Deferred to follow-up: - BootstrapNodes/DNS resolution helper to dedupe logic between makeP2PServer and startSharedP2PServer (yperbasis #6). - End-to-end Provider.Initialize test in local mode (yperbasis #8). - [r3.4] backport PR (yperbasis #9). Co-Authored-By: Claude
AskAlexSharov
pushed a commit
to HoustonOla35/erigon
that referenced
this pull request
May 22, 2026
…rgs) (erigontech#21211) ## Summary First incremental cut toward [erigontech#21138](erigontech#21138 structural goal: **one finalize function per parallel-exec result, with `IntraBlockState` used nowhere outside workers**. This PR removes two finalize variants that are already unreachable from production: | Function | LOC | Production callers on main | |---|---|---| | `finalizeWithIBS` (full IBS reconstruction, BAL-compat path) | ~120 | 0 | | `finalizeTx` (delta-args variant, direct fee-balance path) | ~250 | 0 (only `TestFinalizeTx_AllScenarios`) | Plus the test suite that exclusively exercised the delta-args path (`TestFinalizeTx_*`, fixture builders `coinbaseIsRecipientScenario` / `selfTransferScenario`, helpers `hasCoinbaseDelta` / `adjustForTransferDelta` / `buildWriteMap` / `fmtWriteVal` / `extractBalanceReads`) and one stale comment in `engine_api_bal_test.go`. Net: **-690 lines**, **+1 line**, no semantic change. ## Why now The parallel-exec correctness stack landed in erigontech#21153 (merged 2026-05-15). The combined effect of that PR plus erigontech#21177 routed all production finalize flows through `finalizeTxSimple` — these two functions became unreachable. Removing them shrinks `exec3_parallel.go` from 3640 → 3268 lines, making subsequent IBS-dependency drains easier to review. The next steps in the erigontech#21138 sequence: - **PR 2** — drain IBS dependency erigontech#1 (SD address lookup): `LogSelfDestructedAccounts` consumes `result.SelfDestructedWithBalance` only, no `ibs.GetRemovedAccountsWithBalance()` call. - **PR 3** — drain IBS deps erigontech#2 (`AddLog` → return logs) and erigontech#3 (`AddBalance` bookkeeping → already on `CollectorWrites`); `finalizeTxSimple` becomes IBS-free. - Later — `normalizeWriteSet` → `filterWritesByVersionMap`; `calcState.ApplyWrites` → `VersionedWrites.TouchUpdates`; move EIP-7002/7251 syscall execution into the worker pool. End state: one `finalizeTx`, no IBS outside workers. ## Test plan - [x] `make lint` clean - [x] `make test-short` (full `execution/stagedsync`, `execution/state`, `execution/tests`, `rpc/jsonrpc` packages) green under `EXEC3_PARALLEL=true` - [x] BAL family (`TestEngineApiBAL*`) 8/8 parallel - [x] `TestEIP7708BurnLogWhenCoinbaseSelfDestructs` green - [x] Surviving `TestFinalizeTxSimple_*` family green - [ ] CI: race-tests, kurtosis, hive matrix legs green on both serial and parallel ## Related - erigontech#21017 — serial/parallel CI matrix that surfaces parallel-leg failures (now rebuilt on post-erigontech#21153 main; CI fresh-running) - erigontech#21153 — parallel-exec correctness stack (merged) - erigontech#21138 — heuristic-removal / IBS-dependency-removal tracker (the parent)
mh0lt
pushed a commit
that referenced
this pull request
May 24, 2026
…ash LRU Two surgical commits bundled (both touch the code-read hot path): 1. IntraBlockState.GetCodeSize now loads the full bytes via stateReader.ReadAccountCode on first touch and populates stateObject.code, so subsequent same-addr EXTCODESIZE / EXTCODEHASH / CALL within the tx are in-struct slice-len calls (~50 ns), not full reader round-trips. Mirrors geth's pattern at core/state/state_object.go ~Code() — pay one read per addr per tx, free for the rest. 2. CodeCache.addrToHash switched from a no-op-when-full maphash.Map[versionedAddressID] to an LRU lru.Cache[[20]byte, versionedAddressID] (hashicorp/golang-lru/v2, already imported elsewhere). Cap derived from the existing byte budget at ~28 bytes/entry (~580 k entries for the 16 MB default). Fresh-address workloads (mainnet thousands of new addrs per block) now warm up the addr layer over time instead of silently dropping new entries forever; matches geth's lru.Cache at core/state/database_code.go. The hashToCode layer is unchanged (content-addressed bytes, immutable, byte-capped with new-entry no-op when full — the same semantic as before since code bytes by codeHash never change). Bench on the EXTCODESIZE-EXISTING_CONTRACT-30M family: 62.34 mgas/s (was 61.50). The marginal gain is small on this bench because BAL prefetch already populates the cache layers; neither lever fires heavily. The expected wins are on non-BAL workloads where EXTCODESIZE-loop patterns repeat within a tx (#1) and fresh-address-churn mainnet blocks fill the addr layer (#2). Updated TestCodeCache_AddrCapacityLimit to assert LRU eviction (was asserting no-op-when-full); the prior behaviour was the bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mh0lt
pushed a commit
that referenced
this pull request
May 25, 2026
…ash LRU Two surgical commits bundled (both touch the code-read hot path): 1. IntraBlockState.GetCodeSize now loads the full bytes via stateReader.ReadAccountCode on first touch and populates stateObject.code, so subsequent same-addr EXTCODESIZE / EXTCODEHASH / CALL within the tx are in-struct slice-len calls (~50 ns), not full reader round-trips. Mirrors geth's pattern at core/state/state_object.go ~Code() — pay one read per addr per tx, free for the rest. 2. CodeCache.addrToHash switched from a no-op-when-full maphash.Map[versionedAddressID] to an LRU lru.Cache[[20]byte, versionedAddressID] (hashicorp/golang-lru/v2, already imported elsewhere). Cap derived from the existing byte budget at ~28 bytes/entry (~580 k entries for the 16 MB default). Fresh-address workloads (mainnet thousands of new addrs per block) now warm up the addr layer over time instead of silently dropping new entries forever; matches geth's lru.Cache at core/state/database_code.go. The hashToCode layer is unchanged (content-addressed bytes, immutable, byte-capped with new-entry no-op when full — the same semantic as before since code bytes by codeHash never change). Bench on the EXTCODESIZE-EXISTING_CONTRACT-30M family: 62.34 mgas/s (was 61.50). The marginal gain is small on this bench because BAL prefetch already populates the cache layers; neither lever fires heavily. The expected wins are on non-BAL workloads where EXTCODESIZE-loop patterns repeat within a tx (#1) and fresh-address-churn mainnet blocks fill the addr layer (#2). Updated TestCodeCache_AddrCapacityLimit to assert LRU eviction (was asserting no-op-when-full); the prior behaviour was the bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 27, 2026
Sahil-4555
pushed a commit
to Sahil-4555/erigon
that referenced
this pull request
May 29, 2026
…ckly (erigontech#21483) ## Problem When a merge-queue run has a hive-eest shard fail, the failing job calls `gh run cancel ${{ github.run_id }}` (added in erigontech#21445). That sends SIGTERM to all in-flight matrix siblings, but the Docker-bound hive simulators take ~20 minutes to actually drain. `ci-gate` is `if: always()` and waits for every `needs` job to reach a terminal state, so the broken PR sits at `AWAITING_CHECKS` for the full drain time — blocking the head of the merge queue. Concrete example from today (PR erigontech#21470 at position erigontech#1): - 08:29:57 — `hive-eest / test-hive-eest (paris+shanghai, serial)` fails, calls `gh run cancel 26562610423`, emits the "Merge-queue root-cause failure" annotation from erigontech#21445. - 08:48 (~19 min later) — paris+shanghai-parallel, prague-serial/parallel, cancun-serial/parallel, osaka-parallel, rlp-serial/parallel, and glamsterdam-devnet-parallel were all still `in_progress`. Every other ci-gate child (tests, race-tests, eest-spec-tests, kurtosis, hive, lint, bench, repro, sonar, caplin) had already completed. The bottleneck was specifically the hive-eest matrix siblings. ## Fix ```yaml strategy: fail-fast: ${{ github.event_name == 'merge_group' }} ``` - **In `merge_group`**: first failed shard immediately cancels all siblings at the GitHub API layer — much faster than the `gh run cancel` → SIGTERM → runner-drain path. ci-gate's `needs` reach terminal state in seconds, ci-gate fails, the broken PR is evicted. - **In PR runs**: stays `false`, so authors still see the full failure breakdown across every shard. No regression in PR feedback. ## What's left in place and why The per-job `gh run cancel` step (test-hive-eest.yml lines 311-317) stays. Two reasons: - Matrix `fail-fast` only cancels siblings **within the same matrix** — it doesn't cancel sibling reusable workflows. If a future failure pattern leaks across workflows, `gh run cancel` still covers it. - ci-gate.yml's root-cause annotator (line 188) keys off "the leaf that ran `gh run cancel` successfully" to single out the true root cause among collateral cancellations. Removing the step would silently regress erigontech#21445's attribution. ## Scope choice Only `test-hive-eest.yml` is changed. Other matrix-bearing reusable workflows (`test-all-erigon.yml`, `test-all-erigon-race.yml`, `test-eest-spec.yml`, `test-kurtosis-assertoor.yml`, `test-hive.yml`, `test-bench.yml`) all use `fail-fast: false` too, but none of them were the queue-blocking long pole in this incident. Keeping the patch minimal; we can generalize if another workflow becomes the bottleneck. ## Tradeoff to be aware of Queue runs will now show siblings as `cancelled` instead of `failed` whenever any one shard fails. That's the correct tradeoff in `merge_group` — the goal is fast eviction, not detailed diagnostics; full per-shard breakdown remains available on the PR run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yperbasis
added a commit
that referenced
this pull request
May 29, 2026
profiling at tip (MDBX, 60s window) pinned bytes.Clone as the #1 single allocation source at 64.3 GB / 60s (~16.85% of total alloc traffic). 37.86 GB / 60s (~631 MB/sec) of that came from the defensive common.Copy at TrieContext.Branch. The clone was redundant: every downstream consumer either reads the slice inline (cell.fillFromFields copies into pre-allocated arrays; merger.Merge consumes and produces a fresh buffer; trie_reader parses bytes into cells; unfoldBranchNode similar) or clones it itself at the queue boundary (getDeferredUpdate clones both prefix and prev when storing in the deferred-update pool). Branch's clone was a third copy of bytes that nothing needed to retain. Document the new contract (borrowed slice valid for the current ComputeCommitment scope) and update the test that exercised the old "returns owned bytes" guarantee to verify the new aliasing guarantee instead. After-measurement on the rig is blocked by an unrelated stage-loop persistence inconsistency (chaindata head pointer ahead of state writes on restart) that's reproducing on every restart cycle today; the change is mathematically minimal (single Clone removal + test contract update) and unit tests + make lint pass, so shipping on the math. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bloxster
added a commit
that referenced
this pull request
Jun 1, 2026
Addresses the CHANGES_REQUESTED review, verified against `main`: database.md - Receipts ARE stored: ReceiptDomain (required) + RCacheDomain (optional). Reword the "not stored / re-computed" claim to: compact per-txn receipt metadata persisted in a required domain, full receipts optionally cached, logs re-derived by re-execution only when not cached. - snapshots/domain holds 6 domains (4 state + 2 receipt), not 4. - Soften "rm -rf chaindata/": recoverable, but re-derives state from snapshots and resyncs the tip from the CL over the Engine API (devp2p block download removed in #21505) — not a quick rebuild. architecture.md - prune `full` keeps a rolling ~262k-block window (DefaultPruneDistance), not all post-Merge blocks; add the `blocks` prune mode to the table. - Pipeline: Snapshots is stage #1; no separate Commitment stage (it runs inside Execution); add Senders. - Mermaid: Downloader (BitTorrent) is independent of Sentry (devp2p); blocks arrive from Caplin via the Engine API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sahil-4555
pushed a commit
to Sahil-4555/erigon
that referenced
this pull request
Jun 3, 2026
…#21579) Two tweaks to how Claude Code operates in this repo. ## 1. Bare `#N` references in GitHub text Claude has used a bare `#`+number to mean "point N" — e.g. writing "`erigontech#1`" for "the nit from point 1" — in PR descriptions, issue descriptions, and comments. GitHub auto-links that to the issue/PR with that number, so "`erigontech#1`" turns into a link to the repo's first-ever PR. This came up in [PR 21510](erigontech#21510 (comment)) and has happened on other occasions. `agents.md` now instructs Claude to write "point 1" / "item 1" / "the first nit" and reserve `#N` for genuine issue/PR references. ## 2. Claude attribution in commits, PRs, and issues Stop adding `Co-Authored-By: Claude` and `🤖 Generated with Claude Code` lines. Enforced deterministically in `.claude/settings.json` rather than as a prose instruction, so it does not depend on the model remembering: - `includeCoAuthoredBy: false` — the one switch the installed Claude Code (v2.1.160) honors across *both* the commit and PR code paths; it returns empty attribution for each. - `attribution: { commit: "", pr: "" }` — the newer, forward-looking mechanism; empties the footer text. **Why both:** `attribution` alone is insufficient today — the PR code path does a truthy check on `attribution.pr`, so an empty string is ignored and the "Generated with" footer still leaks into PR bodies. `includeCoAuthoredBy: false` is the reliable global off-switch. `agents.md` documents the rule for human readers too. --- Docs/config only — no Go changes, so build/lint are unaffected. This PR follows both rules itself: no attribution trailers on the commit, and the `erigontech#1` examples above are shown as code so they don't auto-link.
yperbasis
pushed a commit
to fr0mano/erigon
that referenced
this pull request
Jun 9, 2026
…LOAS (erigontech#21698) ## Problem Lido Hoodi validators lost attestation score after switching to `release/3.5`. They missed **head votes** at ~30–60% per epoch (≈2× a random network sample) while **target and source votes were always correct** — the attestations were produced, published, and included on-chain on time, but voted for a **stale head**. ## Root cause The GLOAS merge (erigontech#18956) added `indexedWeightStore` (`cl/phase1/forkchoice/weight_store_indexed.go`). It is instantiated unconditionally, and its `IndexVote`/`RemoveVote` are called per validator index on **every** attestation via `setLatestMessage`, regardless of fork. However its results are only consumed by GLOAS `get_head`. Pre-GLOAS `get_head` (and `timing.go`) use the non-indexed `weightStore`, and `GetIndexedWeightStore()` has no callers. So on pre-GLOAS chains the index is maintained but never read. On a high-validator-count network (Hoodi, ~1.2M active validators) this maintenance was the single largest CPU consumer (`RemoveVote`, ~15% of CPU, fresh slice allocation per call) plus a large GC load — all under the fork-choice write lock. The lock contention delayed `OnBlock` (block import) and `get_head` past the attestation deadline, so the head served to the validator client was stale → wrong head votes. ## Fix Gate the indexed-store maintenance to the GLOAS vote path only, via an explicit `maintainIndexedVotes` flag on `setLatestMessage` (mirroring the existing `updateLatestMessages` pre-GLOAS/GLOAS dispatch). ## Validation (live Hoodi node, before vs after rebuild) - Node CPU (operator Grafana): ~7.5% → <2%. - 60s CPU profile: total samples 148.9% → 41.6%; `indexedWeightStore.RemoveVote` (was erigontech#1 hotspot) and the GC storm both gone. - Head-at-attestation-deadline staleness: 64% → 4%. - Validator head-vote misses: 40% → 0% in the first fully post-fix epoch (network sample ~23% in the same epoch); target/source unaffected throughout. Affects all pre-GLOAS networks on this branch (mainnet/Sepolia/Gnosis), with impact scaling by validator count. ## Tests - `TestPreGloasDoesNotMaintainIndexedWeightStore` — fails before the fix (the pre-GLOAS path populates the index), passes after. - `TestGloasMaintainsIndexedWeightStore` — locks in that the GLOAS path still indexes votes. - `go test ./cl/phase1/forkchoice/...`, `make lint`, `make erigon` all clean. ## Follow-up `indexedWeightStore` is currently unused even on GLOAS (`get_head` uses the non-indexed store). The Caplin team should either wire it into GLOAS `get_head` or remove it; tracking separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.