node/app: address PR 20450 review comments#20583
Merged
Merged
Conversation
This was referenced Apr 15, 2026
mh0lt
added a commit
that referenced
this pull request
Apr 15, 2026
… commit
The glamsterdam suite was tracking upstream
ethpandaops/ethereum-package@main unpinned, while other suites
(regular, pectra) pin to a specific version. Upstream commit 835dd9b
("feat: support gpu ere prover in zkboost", 2026-04-15) introduced an
undefined GpuConfig reference in zkboost_launcher.star:272, breaking
every Erigon PR that ran glamsterdam from that point on — regardless
of what the PR changed. Six unrelated PRs (#20471, #20472, #20526,
#20583, #20584, #20585) all failed identically.
Pin to e07503d16b (2026-04-13, the commit before the break) to stop
the bleeding. This was the last state under which our main successfully
ran glamsterdam. Revisit once upstream stabilises, or switch to a
tagged release that supports gloas_fork_epoch / fulu_fork_epoch which
our glamsterdam.io config requires.
2 tasks
AskAlexSharov
approved these changes
Apr 16, 2026
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 16, 2026
…20584) ## Summary Follow-up from #20583 — addresses the deferred "Publish → atomic.Pointer" suggestion from the #20450 review. Replaces the `sync.RWMutex` + clone-on-publish model with copy-on-write on `atomic.Pointer[handlerMap]`: - **Publishers** (`Publish`, `HasCallback`) load the pointer atomically with no lock and no clone — the loaded map is used as an immutable snapshot. - **Writers** (`Subscribe`/`Unsubscribe`) serialize on a new `writerLock` to compose the mutation, clone the current root, mutate the clone, and `Store` the new root. - **Once-handler cleanup** after `Publish` uses a `CompareAndSwap` loop so concurrent writers are not lost. ## Why - Eliminates the `RLock`/`Lock` dance in every `Publish` call. `Publish` is the hot path — previously it took the lock to clone the entire map just to avoid holding the lock across handler execution. - Makes re-entrant `Publish` from inside handlers safe without any locking. Previously async handlers used `RLock` to read `handler.bus`, which conflicted with any concurrent `Subscribe`/`Unsubscribe`. ## Incidental `eventHandler.bus` is now `atomic.Pointer[eventBus]` so async callbacks can observe `Unsubscribe` (which calls `bus.Store(nil)`) without any lock. Added a `newEventHandler` helper so the atomic pointer is always initialized consistently at construction. ## Test plan - [x] `go test ./node/app/event/...` - [x] `go test -race ./node/app/event/...` — clean - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 16, 2026
## Summary Follow-up from #20583 — addresses the deferred "relations → []\*component" suggestion from the #20450 review. Changes the internal `relations` type from `[]relation` (interface slice) to `[]*component`. The `relation` interface is still used as the public parameter type for `Add`/`Remove`/`Contains`/`WithDependencies`/`WithDependent` — callers pass any `relation` and we unwrap to `*component` at the boundary via the existing `asComponent` helper. ## Why - Pointer identity comparison becomes a direct `*component == *component`, eliminating the `uintptr(unsafe.Pointer(...))` dance via the `relationPtr` helper. - Avoids the interface-header allocation that happens every time `*component` was converted to `relation` for storage. - `cmp`/`samePtr` now take `*component` directly, making the intent explicit. ## Scope Intentionally narrow: the `relation` interface itself is not changed, so there's no churn in method signatures that take `relation` as a parameter. Only the internal storage type changes. ## Test plan - [x] `go test ./node/app/...` - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
Fixes seven unresolved review comments from PR #20450 (componentization framework). All fixes are focused and localized; two larger refactors (Publish → atomic.Pointer[handlerMap] and relations → []*component) are deferred to separate follow-up PRs because they touch many call sites and benefit from independent review. - event/eventbus.go: HasCallback uses RLock instead of Lock. It only reads handlerMap and a full write lock blocks concurrent Publish/Subscribe unnecessarily. - component/componentdomain.go: Guard nil dereference of opts.execPoolSize in the root-domain fallback path. Also return false from the WithDependentDomain option callback when the passed ComponentDomain is not a *componentDomain, instead of panicking on the type assertion. - event/managedeventbus.go: Reject non-pointer inputs to Register and UnregisterAll with a descriptive error instead of panicking inside reflect.Value.Pointer(). - event/servicebus.go: Add objectPointer helper that validates reflect.Value kind. Register/Unregister/UnregisterAll/Registrations all use it. Registrations now returns a copy of the internal slice so callers cannot mutate shared state without the lock. - util/compare.go: Equal guards against == on uncomparable types (slices, maps, funcs). Falls back to reflect.DeepEqual when either operand has a non-comparable type, preventing runtime panics. - typeid.go: Initialize LocalTypeDomain to a real named domain at package init instead of leaving it as a nil TODO. Previously TypeIdOf / newIdFor would build IDs with a nil domain. All node/app tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Five unprotected .state field reads on foreign components (not self)
caused data races detected by -race under concurrent deactivation.
The component.State() method is thread-safe (acquires RLock), but
several call sites bypassed it by reading the .state field directly
on other component instances:
- addDependent (line 1240-1250): read dependent.state without lock
- onComponentStateChanged (lines 1325, 1329): read dependency.state
and dependent.state without lock, while deactivateProvider was
writing .state on a separate goroutine via setState
Fix: replace all direct .state reads on foreign components with the
thread-safe .State() accessor. Self-reads under self-lock (c.state
inside c.RLock) are unchanged.
Race manifested as TestAddRemoveDeps expecting Deactivated (8) but
getting Deactivating (7) — the deactivation cascade hadn't completed
because onComponentStateChanged saw stale state and skipped the
dependency-deactivated signal.
Verified: go test -race -run TestAddRemoveDeps -count=20 — all clean.
Fixes the deadlock where concurrent activate/deactivate operations caused
lock-ordering violations between background goroutines fighting for the
component's RWMutex.
Design: each component has an inbox channel processed by a single actor
goroutine (runActor). All state-mutating lifecycle operations are
serialized through the inbox:
- activate → sends msgActivate, blocks on reply for synchronous
configure/initialize errors. Dependency activation is fire-and-forget
within the actor; completion arrives via stateChanged events.
- deactivate → sends msgDeactivate, returns immediately. Provider
deactivation and dependency cascade run in the actor.
- onComponentStateChanged → sends msgStateChanged. The event bus
callback no longer runs state logic directly; it routes through
the actor so state checks and transitions don't interleave with
activate/deactivate.
Key changes from previous design:
- Removed go func() from activate (lines 868/873) and deactivate
(line 1034) — these spawned background goroutines that caused the
deadlock. The actor goroutine replaces them.
- Removed errgroup.Wait() from activateDependencies and
deactivateDependencies — the actor must not block on cross-component
operations or its inbox fills up. Dependency cascade is now fully
event-driven: fire activate/deactivate on dependencies, let
ComponentStateChanged events signal completion.
- activate returns synchronous errors (configure/initialize failures)
via a reply channel. This preserves the existing API contract where
Activate() returns an error if setup fails.
Fuzz tests:
- TestFuzzLifecycleConcurrent: UNSKIPPED, passes (no deadlock)
- TestFuzzLifecycleSerial: UNSKIPPED, passes
- TestFuzzAddRemoveConcurrent: still skipped (AddDependency not yet
routed through actor — separate follow-up)
Remaining: event bus has a data race in handlerMap.publish when called
from multiple actor goroutines simultaneously — that's #20584's scope,
not this commit.
57fcf84 to
b3e117b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up PR addressing seven unresolved review comments from #20450 (componentization framework). All fixes are focused and localized.
Fixes included
event/eventbus.go:HasCallbacknow usesRLock/RUnlockinstead ofLock/Unlock— it only readshandlerMapand a full write lock blocks concurrentPublish/Subscribeunnecessarily.component/componentdomain.go: Guard nil dereference ofopts.execPoolSizein the root-domain fallback path. Also returnfalsefrom theWithDependentDomainoption callback when the passedComponentDomainis not a*componentDomain, instead of panicking on the type assertion.event/managedeventbus.go: Reject non-pointer inputs toRegisterandUnregisterAllwith a descriptive error instead of panicking insidereflect.Value.Pointer().event/servicebus.go: AddobjectPointerhelper that validatesreflect.Valuekind.Register/Unregister/UnregisterAll/Registrationsall use it.Registrationsnow returns a copy of the internal slice so callers cannot mutate shared state without the lock.util/compare.go:Equalguards against==on uncomparable types (slices, maps, funcs). Falls back toreflect.DeepEqualwhen either operand has a non-comparable type, preventing runtime panics.typeid.go: InitializeLocalTypeDomainto a real named domain at package init instead of leaving it as a nilTODO. PreviouslyTypeIdOf/newIdForwould build IDs with a nil domain.Larger refactors split out
Two review suggestions warranted their own PRs because they touch many call sites and benefit from independent review:
Publish→atomic.Pointer[handlerMap]— lock-freePublish/HasCallback, copy-on-write writers. Eliminates theRWMutexand clone-on-publish entirely.relations→[]*component— removes interface-header allocation andunsafe.Pointerindirection in relations storage.Test plan
go build ./...go test ./node/app/...🤖 Generated with Claude Code