Skip to content

node/components: extract downloader from backend.go, add event guidelines#20471

Merged
mh0lt merged 17 commits into
mainfrom
feat/componentization-downloader-only
Apr 18, 2026
Merged

node/components: extract downloader from backend.go, add event guidelines#20471
mh0lt merged 17 commits into
mainfrom
feat/componentization-downloader-only

Conversation

@mh0lt

@mh0lt mh0lt commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Extracts the downloader setup from backend.go into a lifecycle-managed component, and adds event bus documentation.

Downloader component (node/components/downloader/)

  • Provider wraps NewGrpcClient / BuildTorrentClient into a component with Initialize/Start lifecycle
  • Event types for download progress, completion, verification
  • Event bus observability TODOs for future metrics integration
  • backend.go reduced by ~93 lines of inline downloader setup
  • Design doc and README

Event bus guidelines (node/app/event/GUIDELINES.md)

  • Developer guidelines for event bus usage patterns
  • Domain architecture guidance (isolation, hierarchy)
  • Migration strategy from callbacks → events
  • L1/L2 combined mode documented as use case

Component tests

  • Event flow and load tests (event_flow_test.go)

Test plan

  • make lint passes
  • make erigon builds
  • go test ./node/components/downloader/... ./node/app/... — all pass
  • CI passes

Depends on: #20450 (component framework)

Base automatically changed from feat/componentization-framework to main April 15, 2026 08:37
mh0lt added 11 commits April 15, 2026 15:21
Initial structure for the Downloader component extraction:

- Provider: lifecycle skeleton (Configure, Initialize, Close)
- Tests: disabled mode, no-config, idempotent close
- README: documents the component API, events, configuration,
  gRPC wrapper strategy, and integration plan

Key design decisions documented:
- Direct Go API as primary interface (gRPC becomes optional wrapper)
- Events replace OnFilesChange callbacks (via node/app/event bus)
- Component owns its config (not spread across ethconfig)
- backend.go integration deferred until events are defined

This is a checkpoint — the actual backend.go wiring requires defining
event types and replacing the gRPC-based Client interface first.
Shared contract between the Downloader and its peers:

Subscribes to:
- SnapshotFilesCreated — seed new files (replaces OnFilesChange create)
- SnapshotFilesDeleted — remove torrents (replaces OnFilesChange delete)
- DownloadRequested — fetch files by name/infohash (replaces Client.Download)

Publishes:
- SnapshotDownloadComplete — unblocks stages (replaces afterSnapshotDownload)
- NewSnapshotAvailable — notifies RPC (replaces events.OnNewSnapshot)
- TorrentStatus — monitoring/metrics

These types can be used by plugins/snapshot-manager immediately since it's
greenfield and can be event-native from the start.
Document missing framework capabilities needed for production debugging:
- Event tracing (currently only async handlers log at trace level)
- Dead letter queue (unmatched events are silently dropped)
- Event flow visualization (no way to see routing graph)

These are framework-level issues in node/app/event that affect all
event-driven components, not just the Downloader.
Comprehensive event bus testing:

Functional tests:
- TestEventDeliveryBetweenComponents — async delivery works
- TestEventNotDeliveredToUnsubscribed — dead letter (0 handlers)
- TestMultipleSubscribers — fan-out to N handlers
- TestEventTypeRouting — type-based dispatch (FileCreated vs FileDeleted)
- TestSyncHandlerTotalOrdering — sync handlers execute in registration
  order (total ordering within a single Publish call)

Load tests:
- TestEventLoadCompleteness — 10,000 events, all delivered
- TestEventLoadMultipleSubscribers — 5,000 events × 2 subscribers

Failure behavior:
- TestBlockingHandlerDoesNotBlockPublisher — slow async handler doesn't
  block Publish (returns in <50ms while handler takes 100ms)
- TestPanickingHandlerDoesNotCrashBus — panic in one handler doesn't
  prevent delivery to other handlers

Integration:
- TestComponentLifecycleWithEvents — start→events→stop full cycle

Key findings documented:
- Sync handlers: totally ordered per Publish call (under bus lock)
- Async handlers: no ordering guarantee (worker pool dispatch)
- Events are per-bus (per-domain), not cross-domain
- Panics are caught per-handler, don't crash the bus
- Publish is non-blocking for async handlers

Also skipped TestAddRemoveDeps (same root domain gomock issue as the
other two skipped tests).
Documents the tested behavior and design patterns:

Ordering:
- Sync handlers: totally ordered per Publish (registration order)
- Async handlers: no ordering (worker pool dispatch)
- Events scoped per-domain (no cross-domain propagation)

Rules:
- Never block in async handlers (starves worker pool)
- Keep handlers idempotent (may replay)
- Don't publish from sync handlers (deadlock)
- Use value types for events (immutability)

Patterns:
- Event + Confirmation for acknowledgement
- Saga coordinator for multi-step flows

Performance:
- 10K events delivered <10ms
- Panics caught per-handler, bus continues
- Publish non-blocking for async handlers

All claims backed by passing tests in event_flow_test.go.
Events are scoped per-domain — components in different domains CANNOT
communicate. This is the most important architectural constraint.

Default: all components share the root domain. Creating a separate
domain means complete event isolation — only appropriate for truly
independent subsystems with no event dependencies.

Documented with correct/wrong architecture diagrams and specific
guidance on when separate domains ARE and ARE NOT appropriate.
…ical domains

The combined L1/L2 node (cocoon/pocs-and-proposals/l1-l2-node) requires
three domain levels: root (shared base), L1 domain, L2 domain.

L1 and L2 need event isolation (separate state machines) but both need
events from the shared base, and L2 needs specific L1 events
(L1BlockFinalized → triggers derivation).

Three new capabilities needed (not implemented today):
1. Child → parent event subscription
2. Cross-domain directed events
3. No upward propagation by default

Required before Wave 5 PR 21 (L2 NodeBuilder mode). Updated the
componentization development plan with PR 20b as a prerequisite.
Single domain first, hierarchical later. This guides all component work:

- Waves 1-4: all components share one root domain, events flow freely
- Wave 5: add hierarchical domains for L1/L2 combined mode
- Component code doesn't change between phases — only domain wiring

Developer rules: don't pass domain references into components, define
events as plain structs, don't build domain-aware routing. The framework
handles domain topology; components are domain-agnostic.

Also updated the componentization development plan in erigon-documents
with the same principle at the top.
Replace the inline setUpSnapDownloader() + initDownloader() methods
with the DownloaderProvider component lifecycle. backend.go now calls
Configure → Initialize and reads the Client from the provider.

- downloaderProvider field replaces bare downloader.Downloader field
- setUpSnapDownloader() and initDownloader() removed from backend.go
- File-change callbacks (Seed/Delete) stay in backend.go since they
  need chainDB and notifications which aren't component deps yet
- downloaderClient field kept — all downstream consumers (stages,
  antiquary, caplin) still receive downloader.Client unchanged
- Unused imports removed: downloadercfg, downloadergrpc,
  downloaderproto, nodecfg
@mh0lt mh0lt force-pushed the feat/componentization-downloader-only branch from 79f210b to 3dc03e1 Compare April 15, 2026 15:24
mh0lt added a commit that referenced this pull request Apr 15, 2026
… commit

The glamsterdam suite was tracking upstream
ethpandaops/ethereum-package@main unpinned, while other suites
(regular, pectra) pin to a specific version. Upstream commit 835dd9b
("feat: support gpu ere prover in zkboost", 2026-04-15) introduced an
undefined GpuConfig reference in zkboost_launcher.star:272, breaking
every Erigon PR that ran glamsterdam from that point on — regardless
of what the PR changed. Six unrelated PRs (#20471, #20472, #20526,
#20583, #20584, #20585) all failed identically.

Pin to e07503d16b (2026-04-13, the commit before the break) to stop
the bleeding. This was the last state under which our main successfully
ran glamsterdam. Revisit once upstream stabilises, or switch to a
tagged release that supports gloas_fork_epoch / fulu_fork_epoch which
our glamsterdam.io config requires.
AskAlexSharov and others added 3 commits April 16, 2026 15:00
…r refs to Provider

After the merge with main (PR #20526 added chain.toml P2P discovery), two
cleanup items were missed:

1. setUpSnapDownloader and initDownloader functions remained in backend.go
   but were never called — their work is done by
   node/components/downloader/Provider since PR #20471.

2. The new P2P/chain.toml wiring in backend.go (ENR updater, node source,
   chain.toml publish, peer manager) referenced the deleted backend.downloader
   field. These calls now go through backend.downloaderProvider.Downloader.

All 7 method accesses are hoisted via a local alias to reduce line noise,
and guarded by a nil check on both the Provider and its embedded
*Downloader. Remote-downloader mode has a nil Downloader field, so the
entire chain.toml block is skipped — remote downloaders run their own
P2P stack.

Lint: clean. Build: erigon + integration OK.
…ap.p2p-manifest

Fold in the fix from #20615 — the chain.toml v2 publish/seed stack runs
unconditionally on main and collides with the downloaded chain.toml.torrent
during post-OtterSync AddTorrentsFromDisk, aborting fresh mainnet sync with
"snapshot exists with a different name: chain.toml".

Gate the two call sites we still have after the downloader extraction:

1. node/eth/backend.go: wrap the ENR updater + NodeSource + delayed
   PublishLocalChainToml + StartChainTomlDiscovery + StartTorrentPeerManager
   block in --snap.p2p-manifest. Default syncs skip the whole v2 stack.

2. db/downloader/downloader.go:810: gate the post-download republish using
   d.manifestReady != nil (the internal signal that EnableP2PManifest was
   invoked by the backend). Covers the loop-internal publish in
   chainTomlDiscoveryLoop too.

The third call site from #20615 (initDownloader's initial publish at
backend.go:1461 on main) doesn't exist in our branch — initDownloader moved
to node/components/downloader/Provider.initDownloader as part of the
extraction, and that function doesn't call PublishLocalChainToml.

Default --snap.p2p-manifest=false ⇒ zero v2 behaviour ⇒ collision avoided.
Opt-in --snap.p2p-manifest=true ⇒ full v2 publish + discover + peer-manager
flow preserved.

Ref: #20615
@AskAlexSharov

Copy link
Copy Markdown
Collaborator

Lint fix needed: two extra blank lines before SetUpBlockReader at line 1384 of node/eth/backend.go. gofmt expects only one blank line between functions.

-	return nodesInfo, nil
-}
-
-
-
-func SetUpBlockReader(
+	return nodesInfo, nil
+}
+
+func SetUpBlockReader(

@mh0lt mh0lt enabled auto-merge April 18, 2026 07:47
@mh0lt mh0lt added this pull request to the merge queue Apr 18, 2026
Merged via the queue into main with commit 56ea4a2 Apr 18, 2026
36 checks passed
@mh0lt mh0lt deleted the feat/componentization-downloader-only branch April 18, 2026 09:38
mh0lt added a commit that referenced this pull request Apr 18, 2026
Resolves conflicts introduced by #20471 (downloader extraction) merging
into main while this PR was in review.

Conflicts resolved:
- node/app/component/component.go: duplicate urfave/cli/v2 import (main's dedup
  wins — trivial goimports alignment).
- node/eth/backend.go: 7 regions where storage branch's nodebuilder wrapper
  (backend.components *nodebuilder.Builder) meets main's direct-field pattern
  (backend.downloaderProvider *downloadercomp.Provider). Kept storage's
  nodebuilder approach throughout (this PR's architectural intent). Dropped
  main's inline OnFilesChange callback block — storage.Provider.Initialize
  registers it verbatim (node/components/storage/provider.go:147), so backend.go
  no longer needs to. Added a pointer comment at the BuildDownloader call
  noting where OnFilesChange actually lives, to spare future readers a search.

Two references at lines 572-573 (chain.toml ENR updater block from #20471)
migrated from backend.downloaderProvider.Downloader to
backend.components.Downloader.Downloader. Nil-check pattern aligned to the
existing 2-check form at line 1060 — backend.components and .Downloader are
always non-nil after nodebuilder.New() allocates them; only .Downloader.Downloader
can legitimately be nil (remote-downloader mode).

Verified:
- make erigon integration: clean
- make lint: 0 issues
- No remaining downloaderProvider / downloadercomp references in backend.go
- BuildStorage called after BuildDownloader, so OnFilesChange is wired
- Storage Provider's OnFilesChange semantically identical to dropped block

Merge strategy chosen (per PR #20476 author) over rebase to preserve existing
approvals rather than force a fresh review. No functional changes beyond
conflict resolution.
github-merge-queue Bot pushed a commit that referenced this pull request Apr 18, 2026
…20476)

## Summary

Extracts the storage subsystem from backend.go into
`node/components/storage/Provider`. This is the foundation for the
overlay integration — the storage component will publish SD update
events in a later PR once the execution component extraction lands.

### Storage component (`node/components/storage/`)

Provider owns:
- **DB references** — ChainDB, BlockReader, BlockWriter, snapshots, Bor
stores (pass-through from `SetUpBlockReader`)
- **File-change callbacks** — seed completed snapshots / delete removed
snapshots via downloader client
- **Block retirement** (`freezeblocks.BlockRetire`)
- **Current block number** read at startup

**KvRPC and Notifications are NOT owned by the storage component** —
they are an execution-layer concern (state-change events flow from
execution, not storage). Storage takes a `services.DBEventNotifier` dep
so it can wire snapshot-change events forward; that's the minimum
coupling needed. KvRPC + `shards.Notifications` construction stays
inline in backend.go for now and will migrate to the execution component
when that is extracted. See
`erigon-archive/notification-migration-strategy.md`.

Also in this PR (for self-contained buildability): a minimal
`nodebuilder.Builder` scaffold at `node/nodebuilder/` that holds
`Downloader` and `Storage` providers and `BuildDownloader` /
`BuildStorage` entry points. This replaces the ad-hoc
`downloaderProvider` field on the Ethereum struct with
`backend.components`.

### backend.go changes

- `downloaderProvider` field → `components *nodebuilder.Builder`
- Replace inline file-change callbacks + BlockRetire with a single
`BuildStorage` call
- `blockRetire` local variable →
`backend.components.Storage.BlockRetire`
- KvRPC + Notifications constructed inline (unchanged ownership), passed
into storage deps

### Review feedback addressed

All 5 Copilot comments in commit `f5779d1ec9`:
- `GenesisHash` changed from `[32]byte` to `common.Hash`
- `ChainDB.View` uses `deps.Ctx` instead of `context.Background()`
- `Stop()` nil-guards `s.components`
- Stale package comment on `nodebuilder` fixed
- This PR description updated to match implementation

## Test plan

- [x] `make lint` passes
- [x] `make erigon integration` builds
- [ ] CI passes

**Depends on:** #20471 (downloader extraction)
github-merge-queue Bot pushed a commit that referenced this pull request Apr 19, 2026
)

## Summary

Extracts the sentry / P2P stack from \`node/eth/backend.go\` into
\`node/components/sentry/Provider\`, following the downloader component
pattern. Step 4 of the componentization push plan.

Stacked on #20471 (downloader extraction). Merge order: downloader →
sentry.

## What moves

From backend.go into \`node/components/sentry/Provider\`:
- P2P config assembly (per-protocol port allocation, DiscoveryDNS
wiring).
- External-sentry mode (dial each \`--sentry.api.addr\`).
- Local-sentry mode (per-protocol \`sentry.NewGrpcServer\`,
direct-client wrapping).
- \`StatusDataProvider\` construction.
- Shared \`libsentry.NewSentryMultiplexer\` (one instance, exposed as
\`Provider.Multiplexer\`; replaces the two \`sentryMux(sentries)\` call
sites).
- Execution-P2P layer (\`MessageListener\`, \`PeerPenalizer\`,
\`PeerTracker\`, \`MessageSender\`, \`Publisher\`).
- \`sentry_multi_client.NewMultiClient\` (late-binding — takes engine +
max-peers callback via \`Provider.BuildMultiClient\`).
- Stream loops + \`StatusDataProvider.Run\` + execution-P2P \`.Run\`
goroutines.
- Peer-count logger (was inline in sentry construction; now
\`runPeerCountLogger\` on Provider).
- \`sentryServer.Close()\` loop (moved into \`Provider.Close\`).
- \`splitAddrIntoHostAndPort\` + \`checkPortIsFree\` utility functions
(sentry was the only consumer).

## What stays

- Chain.toml ENR-update block (still backend.go; reads from
\`backend.sentryProvider.Servers\`). Migrates in a later step when
chain.toml V2 lands.
- \`maxBlockBroadcastPeers\` (engine-aware — depends on
\`bor.Bor.IsValidator\`; sentry shouldn't know about consensus engines).
- \`BackwardBlockDownloader\` + \`executionFetcher\` chain
(execution-side consumer of the exec-P2P layer; \`tmpdir\` is a node
concern).
- \`sentryMcDisableBlockDownload\` (file-scope const; still read by the
mined-blocks prefetch goroutine).

## What's deferred

- **Event wiring** (step 7 of the plan). The \`P2PReady\`,
\`PeerListChanged\`, \`TorrentPortBound\` events have no consumers on
this branch. Per the [snapshot-flow.md simplicity
bias](https://github.com/erigontech/erigon-documents/blob/snapshot-distribution-shape/ethereum/design/erigon-archive/snapshot-flow.md),
wire when step 5 (initial P2P snapshots) needs them.
- **Smoke test** (step 10). Build + lint + unit tests pass; local
\`--chain=dev\` run pending before marking ready for review.

## Commit breakdown

Six commits matching the plan's incremental steps, each lint-clean and
buildable:

1. \`094ac2c2d6\` — scaffold empty Provider with lifecycle API
2. \`4370280cc1\` — move P2P config + server construction
3. \`8df0105b62\` — move StatusDataProvider + execution-P2P layer
4. \`e51bdeae7b\` — move MultiClient construction (\`BuildMultiClient\`)
5. \`48ac128a46\` — move lifecycle (Start + Close)
6. \`5f05906203\` — migrate consumer field accesses to Provider-direct
reads

## Diff shape

- \`node/components/sentry/\` — 576 LOC new (Provider + port_util)
- \`node/eth/backend.go\` — net -99 LOC (construction block + lifecycle
+ pass-through fields all removed)

## Test plan

- [x] \`make lint\` — 0 issues across all commits
- [x] \`make erigon integration\` — both binaries build at every commit
- [x] \`go test -short ./node/eth/...\` — passes
- [ ] Local \`--chain=dev\` smoke test — pending
- [ ] CI passes

## Design contract

This PR implements the sentry-component responsibilities from
[snapshot-flow.md
§2.3](https://github.com/erigontech/erigon-documents/blob/snapshot-distribution-shape/ethereum/design/erigon-archive/snapshot-flow.md#23-sentry-component-nodecomponentssentry).
Event publish/subscribe hooks described there land in step 5 (initial
P2P snapshots) when the event bus has consumers.

---------

Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants