Skip to content

node/app: address PR 20450 review comments#20583

Merged
mh0lt merged 3 commits into
mainfrom
fix/pr-20450-review-comments-v2
Apr 16, 2026
Merged

node/app: address PR 20450 review comments#20583
mh0lt merged 3 commits into
mainfrom
fix/pr-20450-review-comments-v2

Conversation

@mh0lt

@mh0lt mh0lt commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up PR addressing seven unresolved review comments from #20450 (componentization framework). All fixes are focused and localized.

Fixes included

  • event/eventbus.go: HasCallback now uses RLock/RUnlock instead of Lock/Unlock — it only reads handlerMap and a full write lock blocks concurrent Publish/Subscribe unnecessarily.

  • component/componentdomain.go: Guard nil dereference of opts.execPoolSize in the root-domain fallback path. Also return false from the WithDependentDomain option callback when the passed ComponentDomain is not a *componentDomain, instead of panicking on the type assertion.

  • event/managedeventbus.go: Reject non-pointer inputs to Register and UnregisterAll with a descriptive error instead of panicking inside reflect.Value.Pointer().

  • event/servicebus.go: Add objectPointer helper that validates reflect.Value kind. Register/Unregister/UnregisterAll/Registrations all use it. Registrations now returns a copy of the internal slice so callers cannot mutate shared state without the lock.

  • util/compare.go: Equal guards against == on uncomparable types (slices, maps, funcs). Falls back to reflect.DeepEqual when either operand has a non-comparable type, preventing runtime panics.

  • typeid.go: Initialize LocalTypeDomain to a real named domain at package init instead of leaving it as a nil TODO. Previously TypeIdOf / newIdFor would build IDs with a nil domain.

Larger refactors split out

Two review suggestions warranted their own PRs because they touch many call sites and benefit from independent review:

Test plan

  • go build ./...
  • go test ./node/app/...
  • CI passes

🤖 Generated with Claude Code

@mh0lt mh0lt requested a review from taratorio April 15, 2026 15:18
mh0lt added a commit that referenced this pull request Apr 15, 2026
… commit

The glamsterdam suite was tracking upstream
ethpandaops/ethereum-package@main unpinned, while other suites
(regular, pectra) pin to a specific version. Upstream commit 835dd9b
("feat: support gpu ere prover in zkboost", 2026-04-15) introduced an
undefined GpuConfig reference in zkboost_launcher.star:272, breaking
every Erigon PR that ran glamsterdam from that point on — regardless
of what the PR changed. Six unrelated PRs (#20471, #20472, #20526,
#20583, #20584, #20585) all failed identically.

Pin to e07503d16b (2026-04-13, the commit before the break) to stop
the bleeding. This was the last state under which our main successfully
ran glamsterdam. Revisit once upstream stabilises, or switch to a
tagged release that supports gloas_fork_epoch / fulu_fork_epoch which
our glamsterdam.io config requires.
github-merge-queue Bot pushed a commit that referenced this pull request Apr 16, 2026
…20584)

## Summary

Follow-up from #20583 — addresses the deferred "Publish →
atomic.Pointer" suggestion from the #20450 review.

Replaces the `sync.RWMutex` + clone-on-publish model with copy-on-write
on `atomic.Pointer[handlerMap]`:
- **Publishers** (`Publish`, `HasCallback`) load the pointer atomically
with no lock and no clone — the loaded map is used as an immutable
snapshot.
- **Writers** (`Subscribe`/`Unsubscribe`) serialize on a new
`writerLock` to compose the mutation, clone the current root, mutate the
clone, and `Store` the new root.
- **Once-handler cleanup** after `Publish` uses a `CompareAndSwap` loop
so concurrent writers are not lost.

## Why

- Eliminates the `RLock`/`Lock` dance in every `Publish` call. `Publish`
is the hot path — previously it took the lock to clone the entire map
just to avoid holding the lock across handler execution.
- Makes re-entrant `Publish` from inside handlers safe without any
locking. Previously async handlers used `RLock` to read `handler.bus`,
which conflicted with any concurrent `Subscribe`/`Unsubscribe`.

## Incidental

`eventHandler.bus` is now `atomic.Pointer[eventBus]` so async callbacks
can observe `Unsubscribe` (which calls `bus.Store(nil)`) without any
lock. Added a `newEventHandler` helper so the atomic pointer is always
initialized consistently at construction.

## Test plan

- [x] `go test ./node/app/event/...`
- [x] `go test -race ./node/app/event/...` — clean
- [ ] CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
github-merge-queue Bot pushed a commit that referenced this pull request Apr 16, 2026
## Summary

Follow-up from #20583 — addresses the deferred "relations →
[]\*component" suggestion from the #20450 review.

Changes the internal `relations` type from `[]relation` (interface
slice) to `[]*component`. The `relation` interface is still used as the
public parameter type for
`Add`/`Remove`/`Contains`/`WithDependencies`/`WithDependent` — callers
pass any `relation` and we unwrap to `*component` at the boundary via
the existing `asComponent` helper.

## Why

- Pointer identity comparison becomes a direct `*component ==
*component`, eliminating the `uintptr(unsafe.Pointer(...))` dance via
the `relationPtr` helper.
- Avoids the interface-header allocation that happens every time
`*component` was converted to `relation` for storage.
- `cmp`/`samePtr` now take `*component` directly, making the intent
explicit.

## Scope

Intentionally narrow: the `relation` interface itself is not changed, so
there's no churn in method signatures that take `relation` as a
parameter. Only the internal storage type changes.

## Test plan

- [x] `go test ./node/app/...`
- [ ] CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
mh0lt and others added 3 commits April 16, 2026 11:48
Fixes seven unresolved review comments from PR #20450 (componentization
framework). All fixes are focused and localized; two larger refactors
(Publish → atomic.Pointer[handlerMap] and relations → []*component) are
deferred to separate follow-up PRs because they touch many call sites
and benefit from independent review.

- event/eventbus.go: HasCallback uses RLock instead of Lock. It only
  reads handlerMap and a full write lock blocks concurrent
  Publish/Subscribe unnecessarily.

- component/componentdomain.go: Guard nil dereference of
  opts.execPoolSize in the root-domain fallback path. Also return
  false from the WithDependentDomain option callback when the passed
  ComponentDomain is not a *componentDomain, instead of panicking on
  the type assertion.

- event/managedeventbus.go: Reject non-pointer inputs to Register and
  UnregisterAll with a descriptive error instead of panicking inside
  reflect.Value.Pointer().

- event/servicebus.go: Add objectPointer helper that validates
  reflect.Value kind. Register/Unregister/UnregisterAll/Registrations
  all use it. Registrations now returns a copy of the internal slice
  so callers cannot mutate shared state without the lock.

- util/compare.go: Equal guards against == on uncomparable types
  (slices, maps, funcs). Falls back to reflect.DeepEqual when either
  operand has a non-comparable type, preventing runtime panics.

- typeid.go: Initialize LocalTypeDomain to a real named domain at
  package init instead of leaving it as a nil TODO. Previously
  TypeIdOf / newIdFor would build IDs with a nil domain.

All node/app tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Five unprotected .state field reads on foreign components (not self)
caused data races detected by -race under concurrent deactivation.

The component.State() method is thread-safe (acquires RLock), but
several call sites bypassed it by reading the .state field directly
on other component instances:

  - addDependent (line 1240-1250): read dependent.state without lock
  - onComponentStateChanged (lines 1325, 1329): read dependency.state
    and dependent.state without lock, while deactivateProvider was
    writing .state on a separate goroutine via setState

Fix: replace all direct .state reads on foreign components with the
thread-safe .State() accessor. Self-reads under self-lock (c.state
inside c.RLock) are unchanged.

Race manifested as TestAddRemoveDeps expecting Deactivated (8) but
getting Deactivating (7) — the deactivation cascade hadn't completed
because onComponentStateChanged saw stale state and skipped the
dependency-deactivated signal.

Verified: go test -race -run TestAddRemoveDeps -count=20 — all clean.
Fixes the deadlock where concurrent activate/deactivate operations caused
lock-ordering violations between background goroutines fighting for the
component's RWMutex.

Design: each component has an inbox channel processed by a single actor
goroutine (runActor). All state-mutating lifecycle operations are
serialized through the inbox:

  - activate → sends msgActivate, blocks on reply for synchronous
    configure/initialize errors. Dependency activation is fire-and-forget
    within the actor; completion arrives via stateChanged events.
  - deactivate → sends msgDeactivate, returns immediately. Provider
    deactivation and dependency cascade run in the actor.
  - onComponentStateChanged → sends msgStateChanged. The event bus
    callback no longer runs state logic directly; it routes through
    the actor so state checks and transitions don't interleave with
    activate/deactivate.

Key changes from previous design:
  - Removed go func() from activate (lines 868/873) and deactivate
    (line 1034) — these spawned background goroutines that caused the
    deadlock. The actor goroutine replaces them.
  - Removed errgroup.Wait() from activateDependencies and
    deactivateDependencies — the actor must not block on cross-component
    operations or its inbox fills up. Dependency cascade is now fully
    event-driven: fire activate/deactivate on dependencies, let
    ComponentStateChanged events signal completion.
  - activate returns synchronous errors (configure/initialize failures)
    via a reply channel. This preserves the existing API contract where
    Activate() returns an error if setup fails.

Fuzz tests:
  - TestFuzzLifecycleConcurrent: UNSKIPPED, passes (no deadlock)
  - TestFuzzLifecycleSerial: UNSKIPPED, passes
  - TestFuzzAddRemoveConcurrent: still skipped (AddDependency not yet
    routed through actor — separate follow-up)

Remaining: event bus has a data race in handlerMap.publish when called
from multiple actor goroutines simultaneously — that's #20584's scope,
not this commit.
@mh0lt mh0lt force-pushed the fix/pr-20450-review-comments-v2 branch from 57fcf84 to b3e117b Compare April 16, 2026 11:50
@mh0lt mh0lt enabled auto-merge April 16, 2026 12:23
@mh0lt mh0lt added this pull request to the merge queue Apr 16, 2026
Merged via the queue into main with commit 32c6123 Apr 16, 2026
36 checks passed
@mh0lt mh0lt deleted the fix/pr-20450-review-comments-v2 branch April 16, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants