Skip to content

node/components/storage: extract storage component from backend.go#20476

Merged
mh0lt merged 18 commits into
mainfrom
feat/componentization-storage
Apr 18, 2026
Merged

node/components/storage: extract storage component from backend.go#20476
mh0lt merged 18 commits into
mainfrom
feat/componentization-storage

Conversation

@mh0lt

@mh0lt mh0lt commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Extracts the storage subsystem from backend.go into node/components/storage/Provider. This is the foundation for the overlay integration — the storage component will publish SD update events in a later PR once the execution component extraction lands.

Storage component (node/components/storage/)

Provider owns:

  • DB references — ChainDB, BlockReader, BlockWriter, snapshots, Bor stores (pass-through from SetUpBlockReader)
  • File-change callbacks — seed completed snapshots / delete removed snapshots via downloader client
  • Block retirement (freezeblocks.BlockRetire)
  • Current block number read at startup

KvRPC and Notifications are NOT owned by the storage component — they are an execution-layer concern (state-change events flow from execution, not storage). Storage takes a services.DBEventNotifier dep so it can wire snapshot-change events forward; that's the minimum coupling needed. KvRPC + shards.Notifications construction stays inline in backend.go for now and will migrate to the execution component when that is extracted. See erigon-archive/notification-migration-strategy.md.

Also in this PR (for self-contained buildability): a minimal nodebuilder.Builder scaffold at node/nodebuilder/ that holds Downloader and Storage providers and BuildDownloader / BuildStorage entry points. This replaces the ad-hoc downloaderProvider field on the Ethereum struct with backend.components.

backend.go changes

  • downloaderProvider field → components *nodebuilder.Builder
  • Replace inline file-change callbacks + BlockRetire with a single BuildStorage call
  • blockRetire local variable → backend.components.Storage.BlockRetire
  • KvRPC + Notifications constructed inline (unchanged ownership), passed into storage deps

Review feedback addressed

All 5 Copilot comments in commit f5779d1ec9:

  • GenesisHash changed from [32]byte to common.Hash
  • ChainDB.View uses deps.Ctx instead of context.Background()
  • Stop() nil-guards s.components
  • Stale package comment on nodebuilder fixed
  • This PR description updated to match implementation

Test plan

  • make lint passes
  • make erigon integration builds
  • CI passes

Depends on: #20471 (downloader extraction)

@mh0lt mh0lt force-pushed the feat/componentization-storage branch from 80ded75 to 0b071f8 Compare April 10, 2026 14:04
Base automatically changed from feat/componentization-nodebuilder to feat/componentization-auth April 13, 2026 12:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extracts the storage subsystem from node/eth/backend.go into a dedicated component (node/components/storage/Provider) and wires it through the central nodebuilder.Builder registry to support ongoing backend componentization (and upcoming overlay work).

Changes:

  • Added node/nodebuilder/Builder registry with BuildDownloader and BuildStorage entry points.
  • Introduced node/components/storage/Provider to own storage-related wiring (DB refs, file-change callbacks, block retirement, current block number read).
  • Updated node/eth/backend.go to initialize components via nodebuilder, and to source BlockRetire from the storage component.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
node/nodebuilder/builder.go Adds the central component registry and build methods for extracted components.
node/eth/backend.go Switches backend initialization to the component builder; uses storage component for block retirement and stores kvRPC on the backend.
node/components/storage/provider.go New storage provider component that centralizes storage wiring and snapshot file-change callbacks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread node/components/storage/provider.go
Comment thread node/components/storage/provider.go Outdated
Comment thread node/eth/backend.go Outdated
Comment thread node/eth/backend.go
Comment thread node/nodebuilder/builder.go Outdated
mh0lt added 11 commits April 15, 2026 15:21
Initial structure for the Downloader component extraction:

- Provider: lifecycle skeleton (Configure, Initialize, Close)
- Tests: disabled mode, no-config, idempotent close
- README: documents the component API, events, configuration,
  gRPC wrapper strategy, and integration plan

Key design decisions documented:
- Direct Go API as primary interface (gRPC becomes optional wrapper)
- Events replace OnFilesChange callbacks (via node/app/event bus)
- Component owns its config (not spread across ethconfig)
- backend.go integration deferred until events are defined

This is a checkpoint — the actual backend.go wiring requires defining
event types and replacing the gRPC-based Client interface first.
Shared contract between the Downloader and its peers:

Subscribes to:
- SnapshotFilesCreated — seed new files (replaces OnFilesChange create)
- SnapshotFilesDeleted — remove torrents (replaces OnFilesChange delete)
- DownloadRequested — fetch files by name/infohash (replaces Client.Download)

Publishes:
- SnapshotDownloadComplete — unblocks stages (replaces afterSnapshotDownload)
- NewSnapshotAvailable — notifies RPC (replaces events.OnNewSnapshot)
- TorrentStatus — monitoring/metrics

These types can be used by plugins/snapshot-manager immediately since it's
greenfield and can be event-native from the start.
Document missing framework capabilities needed for production debugging:
- Event tracing (currently only async handlers log at trace level)
- Dead letter queue (unmatched events are silently dropped)
- Event flow visualization (no way to see routing graph)

These are framework-level issues in node/app/event that affect all
event-driven components, not just the Downloader.
Comprehensive event bus testing:

Functional tests:
- TestEventDeliveryBetweenComponents — async delivery works
- TestEventNotDeliveredToUnsubscribed — dead letter (0 handlers)
- TestMultipleSubscribers — fan-out to N handlers
- TestEventTypeRouting — type-based dispatch (FileCreated vs FileDeleted)
- TestSyncHandlerTotalOrdering — sync handlers execute in registration
  order (total ordering within a single Publish call)

Load tests:
- TestEventLoadCompleteness — 10,000 events, all delivered
- TestEventLoadMultipleSubscribers — 5,000 events × 2 subscribers

Failure behavior:
- TestBlockingHandlerDoesNotBlockPublisher — slow async handler doesn't
  block Publish (returns in <50ms while handler takes 100ms)
- TestPanickingHandlerDoesNotCrashBus — panic in one handler doesn't
  prevent delivery to other handlers

Integration:
- TestComponentLifecycleWithEvents — start→events→stop full cycle

Key findings documented:
- Sync handlers: totally ordered per Publish call (under bus lock)
- Async handlers: no ordering guarantee (worker pool dispatch)
- Events are per-bus (per-domain), not cross-domain
- Panics are caught per-handler, don't crash the bus
- Publish is non-blocking for async handlers

Also skipped TestAddRemoveDeps (same root domain gomock issue as the
other two skipped tests).
Documents the tested behavior and design patterns:

Ordering:
- Sync handlers: totally ordered per Publish (registration order)
- Async handlers: no ordering (worker pool dispatch)
- Events scoped per-domain (no cross-domain propagation)

Rules:
- Never block in async handlers (starves worker pool)
- Keep handlers idempotent (may replay)
- Don't publish from sync handlers (deadlock)
- Use value types for events (immutability)

Patterns:
- Event + Confirmation for acknowledgement
- Saga coordinator for multi-step flows

Performance:
- 10K events delivered <10ms
- Panics caught per-handler, bus continues
- Publish non-blocking for async handlers

All claims backed by passing tests in event_flow_test.go.
Events are scoped per-domain — components in different domains CANNOT
communicate. This is the most important architectural constraint.

Default: all components share the root domain. Creating a separate
domain means complete event isolation — only appropriate for truly
independent subsystems with no event dependencies.

Documented with correct/wrong architecture diagrams and specific
guidance on when separate domains ARE and ARE NOT appropriate.
…ical domains

The combined L1/L2 node (cocoon/pocs-and-proposals/l1-l2-node) requires
three domain levels: root (shared base), L1 domain, L2 domain.

L1 and L2 need event isolation (separate state machines) but both need
events from the shared base, and L2 needs specific L1 events
(L1BlockFinalized → triggers derivation).

Three new capabilities needed (not implemented today):
1. Child → parent event subscription
2. Cross-domain directed events
3. No upward propagation by default

Required before Wave 5 PR 21 (L2 NodeBuilder mode). Updated the
componentization development plan with PR 20b as a prerequisite.
Single domain first, hierarchical later. This guides all component work:

- Waves 1-4: all components share one root domain, events flow freely
- Wave 5: add hierarchical domains for L1/L2 combined mode
- Component code doesn't change between phases — only domain wiring

Developer rules: don't pass domain references into components, define
events as plain structs, don't build domain-aware routing. The framework
handles domain topology; components are domain-agnostic.

Also updated the componentization development plan in erigon-documents
with the same principle at the top.
Replace the inline setUpSnapDownloader() + initDownloader() methods
with the DownloaderProvider component lifecycle. backend.go now calls
Configure → Initialize and reads the Client from the provider.

- downloaderProvider field replaces bare downloader.Downloader field
- setUpSnapDownloader() and initDownloader() removed from backend.go
- File-change callbacks (Seed/Delete) stay in backend.go since they
  need chainDB and notifications which aren't component deps yet
- downloaderClient field kept — all downstream consumers (stages,
  antiquary, caplin) still receive downloader.Client unchanged
- Unused imports removed: downloadercfg, downloadergrpc,
  downloaderproto, nodecfg
@mh0lt mh0lt force-pushed the feat/componentization-auth branch from a859857 to 6b06a18 Compare April 15, 2026 15:25
mh0lt added 2 commits April 15, 2026 15:26
Add node/nodebuilder/Builder as a single registry for extracted
components. Currently holds only Downloader; Storage will be added next.

backend.go: replace the inline downloaderProvider field with
backend.components (a *nodebuilder.Builder). All downloaderProvider
references become backend.components.Downloader.

This is minimal scaffolding — the builder will grow as components are
extracted from backend.go (storage, txpool, sentry, etc).
New storage.Provider owns DB references, file-change callbacks (seed/delete
for downloader), block retirement, and current-block-number read.

Notifications (KvRPC, shards.Notifications) stay in backend.go — they are
an execution-layer concern, not a storage concern. Storage receives a
DBEventNotifier (OnNewSnapshot) as a dep for BlockRetire and file-change
callback forwarding.

backend.go changes:
- KvRPC + Notifications created inline (unchanged ownership)
- Replace inline file-change callbacks + BlockRetire with BuildStorage call
- blockRetire variable becomes backend.components.Storage.BlockRetire

This is the foundation for the execution component extraction — execution
will own the SD lifecycle and state-change events (headers, blocks, logs).
@mh0lt mh0lt changed the base branch from feat/componentization-auth to feat/componentization-downloader-only April 15, 2026 15:27
@mh0lt mh0lt force-pushed the feat/componentization-storage branch from 0b071f8 to 25336bc Compare April 15, 2026 15:27
Four Copilot review comments addressed:

1. storage/provider.go: Provider.GenesisHash declared as [32]byte but
   assigned from Genesis.Hash() (common.Hash). Go accepts the conversion
   since common.Hash has underlying type [32]byte, but the named type is
   the clearer choice and matches the convention used elsewhere in
   backend.go (genesisHash field). Switched to common.Hash.

2. storage/provider.go: Initialize was using context.Background() for
   the CurrentBlock read, which ignores deps.Ctx shutdown/cancellation.
   Switched to deps.Ctx. Also removed the shared err variable that was
   captured in the closure — replaced with a local viewErr so the error
   flow is self-contained.

3. backend.go Stop: s.components.Downloader was dereferenced without
   nil-checking s.components first. Safe in normal operation (components
   is always set in New) but a partially-constructed Ethereum in a test
   path or pre-init-failure cleanup would panic. Guarded with a
   components != nil check.

4. nodebuilder/builder.go: package comment said "(future) BuildStorage"
   but BuildStorage is implemented in the same file. Updated comment
   to list it as built, not future.

PR description also updated to reflect the final design decision: KvRPC
and Notifications stay in backend.go (they're an execution-layer concern,
documented in erigon-archive/notification-migration-strategy.md), rather
than moving into the storage component as the earlier description
implied.
@mh0lt mh0lt changed the base branch from feat/componentization-downloader-only to main April 15, 2026 20:51
@mh0lt mh0lt changed the base branch from main to feat/componentization-downloader-only April 15, 2026 20:56
mh0lt added 3 commits April 16, 2026 06:59
Five unprotected .state field reads on foreign components (not self)
caused data races detected by -race under concurrent deactivation.

The component.State() method is thread-safe (acquires RLock), but
several call sites bypassed it by reading the .state field directly
on other component instances:

  - addDependent (line 1240-1250): read dependent.state without lock
  - onComponentStateChanged (lines 1325, 1329): read dependency.state
    and dependent.state without lock, while deactivateProvider was
    writing .state on a separate goroutine via setState

Fix: replace all direct .state reads on foreign components with the
thread-safe .State() accessor. Self-reads under self-lock (c.state
inside c.RLock) are unchanged.

Race manifested as TestAddRemoveDeps expecting Deactivated (8) but
getting Deactivating (7) — the deactivation cascade hadn't completed
because onComponentStateChanged saw stale state and skipped the
dependency-deactivated signal.

Verified: go test -race -run TestAddRemoveDeps -count=20 — all clean.
…ock)

Three fuzz tests for the component state machine:

- TestFuzzLifecycleConcurrent: random activate/deactivate from 4 workers
  on a fixed 8-component dependency tree
- TestFuzzLifecycleSerial: same operations serially for deterministic repro
- TestFuzzAddRemoveConcurrent: add/remove dependency + activate/deactivate
  from 3 workers on a flat 6-component pool

All three are currently SKIPPED because they expose a real deadlock in the
component framework: rapid activate/deactivate cycles cause lock contention
between the activateDependencies → onDependenciesActive → activateProvider
→ setState chain and concurrent deactivation paths. The deadlock manifests
even in the serial test because activate/deactivate spawn background
goroutines that contend with the next operation.

This is a blocking issue before the component framework is used for
production components (execution component, overlay/SD lifecycle). The
tests are committed skipped so:
1. The deadlock is documented and tracked
2. The fix can unskip them as verification
3. CI doesn't regress on the fix

To reproduce: remove the t.Skip line and run:
  go test -race -run TestFuzzLifecycleConcurrent -timeout 30s ./node/app/component/
Fixes the deadlock where concurrent activate/deactivate operations caused
lock-ordering violations between background goroutines fighting for the
component's RWMutex.

Design: each component has an inbox channel processed by a single actor
goroutine (runActor). All state-mutating lifecycle operations are
serialized through the inbox:

  - activate → sends msgActivate, blocks on reply for synchronous
    configure/initialize errors. Dependency activation is fire-and-forget
    within the actor; completion arrives via stateChanged events.
  - deactivate → sends msgDeactivate, returns immediately. Provider
    deactivation and dependency cascade run in the actor.
  - onComponentStateChanged → sends msgStateChanged. The event bus
    callback no longer runs state logic directly; it routes through
    the actor so state checks and transitions don't interleave with
    activate/deactivate.

Key changes from previous design:
  - Removed go func() from activate (lines 868/873) and deactivate
    (line 1034) — these spawned background goroutines that caused the
    deadlock. The actor goroutine replaces them.
  - Removed errgroup.Wait() from activateDependencies and
    deactivateDependencies — the actor must not block on cross-component
    operations or its inbox fills up. Dependency cascade is now fully
    event-driven: fire activate/deactivate on dependencies, let
    ComponentStateChanged events signal completion.
  - activate returns synchronous errors (configure/initialize failures)
    via a reply channel. This preserves the existing API contract where
    Activate() returns an error if setup fails.

Fuzz tests:
  - TestFuzzLifecycleConcurrent: UNSKIPPED, passes (no deadlock)
  - TestFuzzLifecycleSerial: UNSKIPPED, passes
  - TestFuzzAddRemoveConcurrent: still skipped (AddDependency not yet
    routed through actor — separate follow-up)

Remaining: event bus has a data race in handlerMap.publish when called
from multiple actor goroutines simultaneously — that's #20584's scope,
not this commit.
Base automatically changed from feat/componentization-downloader-only to main April 18, 2026 09:38
Resolves conflicts introduced by #20471 (downloader extraction) merging
into main while this PR was in review.

Conflicts resolved:
- node/app/component/component.go: duplicate urfave/cli/v2 import (main's dedup
  wins — trivial goimports alignment).
- node/eth/backend.go: 7 regions where storage branch's nodebuilder wrapper
  (backend.components *nodebuilder.Builder) meets main's direct-field pattern
  (backend.downloaderProvider *downloadercomp.Provider). Kept storage's
  nodebuilder approach throughout (this PR's architectural intent). Dropped
  main's inline OnFilesChange callback block — storage.Provider.Initialize
  registers it verbatim (node/components/storage/provider.go:147), so backend.go
  no longer needs to. Added a pointer comment at the BuildDownloader call
  noting where OnFilesChange actually lives, to spare future readers a search.

Two references at lines 572-573 (chain.toml ENR updater block from #20471)
migrated from backend.downloaderProvider.Downloader to
backend.components.Downloader.Downloader. Nil-check pattern aligned to the
existing 2-check form at line 1060 — backend.components and .Downloader are
always non-nil after nodebuilder.New() allocates them; only .Downloader.Downloader
can legitimately be nil (remote-downloader mode).

Verified:
- make erigon integration: clean
- make lint: 0 issues
- No remaining downloaderProvider / downloadercomp references in backend.go
- BuildStorage called after BuildDownloader, so OnFilesChange is wired
- Storage Provider's OnFilesChange semantically identical to dropped block

Merge strategy chosen (per PR #20476 author) over rebase to preserve existing
approvals rather than force a fresh review. No functional changes beyond
conflict resolution.
@mh0lt mh0lt enabled auto-merge April 18, 2026 11:01
@mh0lt mh0lt added this pull request to the merge queue Apr 18, 2026
Merged via the queue into main with commit 8d1883e Apr 18, 2026
36 checks passed
@mh0lt mh0lt deleted the feat/componentization-storage branch April 18, 2026 13:05
mh0lt added a commit that referenced this pull request Apr 19, 2026
Resolves conflicts introduced by #20476 (storage component extraction)
merging into main while this PR was in review.

- node/eth/backend.go imports: drop direct downloadercomp import (now
  encapsulated in nodebuilder.Builder from #20476); keep sentrycomp for
  this PR's extraction and add storagecomp for BuildStorage deps from
  main.
- node/eth/backend.go chain.toml ENR wiring: combine both migrations —
  downloader access now goes through backend.components.Downloader
  (nodebuilder pattern from #20476), sentry server list through
  backend.sentryProvider.Servers (this PR's extraction).

Verified: make erigon integration clean; make lint 0 issues. No
functional changes beyond conflict resolution.

Merge strategy chosen (per prior sentry PR reviewer approval) over
rebase to preserve existing approvals rather than force a fresh
review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants