Skip to content

cl/phase1/forkchoice: don't maintain GLOAS indexed weight store pre-GLOAS#21698

Merged
domiwei merged 1 commit into
mainfrom
feature/lystopad/caplin-gate-gloas-indexed-weightstore
Jun 9, 2026
Merged

cl/phase1/forkchoice: don't maintain GLOAS indexed weight store pre-GLOAS#21698
domiwei merged 1 commit into
mainfrom
feature/lystopad/caplin-gate-gloas-indexed-weightstore

Conversation

@lystopad

@lystopad lystopad commented Jun 9, 2026

Copy link
Copy Markdown
Member

Problem

Lido Hoodi validators lost attestation score after switching to release/3.5. They missed head votes at ~30–60% per epoch (≈2× a random network sample) while target and source votes were always correct — the attestations were produced, published, and included on-chain on time, but voted for a stale head.

Root cause

The GLOAS merge (#18956) added indexedWeightStore (cl/phase1/forkchoice/weight_store_indexed.go). It is instantiated unconditionally, and its IndexVote/RemoveVote are called per validator index on every attestation via setLatestMessage, regardless of fork.

However its results are only consumed by GLOAS get_head. Pre-GLOAS get_head (and timing.go) use the non-indexed weightStore, and GetIndexedWeightStore() has no callers. So on pre-GLOAS chains the index is maintained but never read.

On a high-validator-count network (Hoodi, ~1.2M active validators) this maintenance was the single largest CPU consumer (RemoveVote, ~15% of CPU, fresh slice allocation per call) plus a large GC load — all under the fork-choice write lock. The lock contention delayed OnBlock (block import) and get_head past the attestation deadline, so the head served to the validator client was stale → wrong head votes.

Fix

Gate the indexed-store maintenance to the GLOAS vote path only, via an explicit maintainIndexedVotes flag on setLatestMessage (mirroring the existing updateLatestMessages pre-GLOAS/GLOAS dispatch).

Validation (live Hoodi node, before vs after rebuild)

  • Node CPU (operator Grafana): ~7.5% → <2%.
  • 60s CPU profile: total samples 148.9% → 41.6%; indexedWeightStore.RemoveVote (was Pull from go-ethereum up to 26aea73 (7 Feb 2019) #1 hotspot) and the GC storm both gone.
  • Head-at-attestation-deadline staleness: 64% → 4%.
  • Validator head-vote misses: 40% → 0% in the first fully post-fix epoch (network sample ~23% in the same epoch); target/source unaffected throughout.

Affects all pre-GLOAS networks on this branch (mainnet/Sepolia/Gnosis), with impact scaling by validator count.

Tests

  • TestPreGloasDoesNotMaintainIndexedWeightStore — fails before the fix (the pre-GLOAS path populates the index), passes after.
  • TestGloasMaintainsIndexedWeightStore — locks in that the GLOAS path still indexes votes.
  • go test ./cl/phase1/forkchoice/..., make lint, make erigon all clean.

Follow-up

indexedWeightStore is currently unused even on GLOAS (get_head uses the non-indexed store). The Caplin team should either wire it into GLOAS get_head or remove it; tracking separately.

…LOAS

The indexedWeightStore (new in the GLOAS merge) was instantiated
unconditionally and its IndexVote/RemoveVote were called per validator
index on every attestation via setLatestMessage, regardless of fork. Its
results are only consumed by GLOAS get_head; pre-GLOAS get_head uses the
non-indexed weightStore, so this was pure overhead — a top CPU hotspot
(RemoveVote) plus heavy allocation/GC, all under the fork-choice write
lock. On high-validator-count networks this stalls block import and head
updates past the attestation deadline, causing stale head votes.

Gate the index maintenance on the GLOAS vote path only.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces pre-GLOAS fork-choice CPU/GC overhead by avoiding maintenance of the GLOAS-only indexedWeightStore on the pre-GLOAS attestation vote path, while keeping indexed vote tracking enabled for the GLOAS path.

Changes:

  • Add an explicit maintainIndexedVotes flag to setLatestMessage and disable indexed vote maintenance in updateLatestMessagesPreGloas.
  • Keep indexed vote maintenance enabled in updateLatestMessagesGloas.
  • Add tests to assert the pre-GLOAS path does not populate the indexed store, while the GLOAS path does.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cl/phase1/forkchoice/on_attestation.go Gates indexedWeightStore maintenance behind a flag so pre-GLOAS attestations don’t update the indexed store.
cl/phase1/forkchoice/weight_store_indexed_test.go Adds regression tests covering pre-GLOAS vs GLOAS indexed-store maintenance behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +155 to +156
// The indexed weight store is only read by GLOAS get_head; maintaining it on
// the pre-GLOAS path is wasted per-vote work under the fork-choice lock.
@domiwei domiwei added this pull request to the merge queue Jun 9, 2026
domiwei pushed a commit that referenced this pull request Jun 9, 2026
…e pre-GLOAS (#21699)

Cherry-pick of #21698 to release/3.5.

Fixes the Caplin pre-GLOAS attestation head-vote regression: the GLOAS
`indexedWeightStore` was maintained on every vote regardless of fork
(top CPU hotspot + GC pressure under the fork-choice lock) but never
read pre-GLOAS, delaying head updates past the attestation deadline.
Validated on a live Hoodi node (node CPU ~7.5%->`<2%`, head-vote misses
40%->0%).

Clean cherry-pick; `on_attestation.go` is identical between `main` and
`release/3.5`.
Merged via the queue into main with commit bda1652 Jun 9, 2026
91 of 92 checks passed
@domiwei domiwei deleted the feature/lystopad/caplin-gate-gloas-indexed-weightstore branch June 9, 2026 13:37
yperbasis added a commit that referenced this pull request Jun 9, 2026
Resolves the on_attestation.go conflict by taking #21698's GLOAS-gated index
maintenance (the stale IndexVote comment this branch fixed was removed there).

Since #21698 gates indexed-vote maintenance to the GLOAS vote path, the
differential test's pre-GLOAS fixtures no longer populate the index, so
buildExAnteStore now mirrors latestMessages into the index directly.
yperbasis added a commit that referenced this pull request Jun 9, 2026
RemoveVote allocated a fresh slice per call to filter out a validator's entry.
On the GLOAS vote path it runs per attestation, so on high-validator-count
networks it is a top CPU consumer and GC source under the fork-choice lock
(the cost #21698 removed pre-GLOAS but which returns under GLOAS now that
getHeadGloas reads the index). Compact the entry out in place instead.
yperbasis added a commit that referenced this pull request Jun 10, 2026
getHeadGloas reads the indexedWeightStore, but index maintenance is GLOAS-gated
(#21698): pre-GLOAS attestations fill latestMessages without touching the index.
At the pre-GLOAS->GLOAS boundary GetHead flips from the full-scan store (which
reads all of latestMessages) to the index, still empty — so attestation weight
is understated until validators re-attest over ~1 epoch, risking a wrong head if
a fork straddles the transition.

The spec keeps store.latest_messages as a single dict never reset across forks,
and Prysm/Lighthouse keep one continuous per-validator vote store; Erigon already
maintains latestMessages continuously, so make the derived index match it at
first use: headWeightStore imports the full latestMessages snapshot into the
index once (clear-then-rebuild, so early GLOAS votes are not double-counted).

The differential test now starts from a cold index and asserts headWeightStore
seeds it to equal the full-scan store; a new test covers seed idempotency.

Interim fix; superseded by the delta-propagation rework tracked in #21704.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Caplin Caplin: Consensus Layer, Beacon API performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants