node/components: extract downloader from backend.go, add event guidelines#20471
Merged
Conversation
3 tasks
Initial structure for the Downloader component extraction: - Provider: lifecycle skeleton (Configure, Initialize, Close) - Tests: disabled mode, no-config, idempotent close - README: documents the component API, events, configuration, gRPC wrapper strategy, and integration plan Key design decisions documented: - Direct Go API as primary interface (gRPC becomes optional wrapper) - Events replace OnFilesChange callbacks (via node/app/event bus) - Component owns its config (not spread across ethconfig) - backend.go integration deferred until events are defined This is a checkpoint — the actual backend.go wiring requires defining event types and replacing the gRPC-based Client interface first.
Shared contract between the Downloader and its peers: Subscribes to: - SnapshotFilesCreated — seed new files (replaces OnFilesChange create) - SnapshotFilesDeleted — remove torrents (replaces OnFilesChange delete) - DownloadRequested — fetch files by name/infohash (replaces Client.Download) Publishes: - SnapshotDownloadComplete — unblocks stages (replaces afterSnapshotDownload) - NewSnapshotAvailable — notifies RPC (replaces events.OnNewSnapshot) - TorrentStatus — monitoring/metrics These types can be used by plugins/snapshot-manager immediately since it's greenfield and can be event-native from the start.
Document missing framework capabilities needed for production debugging: - Event tracing (currently only async handlers log at trace level) - Dead letter queue (unmatched events are silently dropped) - Event flow visualization (no way to see routing graph) These are framework-level issues in node/app/event that affect all event-driven components, not just the Downloader.
Comprehensive event bus testing: Functional tests: - TestEventDeliveryBetweenComponents — async delivery works - TestEventNotDeliveredToUnsubscribed — dead letter (0 handlers) - TestMultipleSubscribers — fan-out to N handlers - TestEventTypeRouting — type-based dispatch (FileCreated vs FileDeleted) - TestSyncHandlerTotalOrdering — sync handlers execute in registration order (total ordering within a single Publish call) Load tests: - TestEventLoadCompleteness — 10,000 events, all delivered - TestEventLoadMultipleSubscribers — 5,000 events × 2 subscribers Failure behavior: - TestBlockingHandlerDoesNotBlockPublisher — slow async handler doesn't block Publish (returns in <50ms while handler takes 100ms) - TestPanickingHandlerDoesNotCrashBus — panic in one handler doesn't prevent delivery to other handlers Integration: - TestComponentLifecycleWithEvents — start→events→stop full cycle Key findings documented: - Sync handlers: totally ordered per Publish call (under bus lock) - Async handlers: no ordering guarantee (worker pool dispatch) - Events are per-bus (per-domain), not cross-domain - Panics are caught per-handler, don't crash the bus - Publish is non-blocking for async handlers Also skipped TestAddRemoveDeps (same root domain gomock issue as the other two skipped tests).
Documents the tested behavior and design patterns: Ordering: - Sync handlers: totally ordered per Publish (registration order) - Async handlers: no ordering (worker pool dispatch) - Events scoped per-domain (no cross-domain propagation) Rules: - Never block in async handlers (starves worker pool) - Keep handlers idempotent (may replay) - Don't publish from sync handlers (deadlock) - Use value types for events (immutability) Patterns: - Event + Confirmation for acknowledgement - Saga coordinator for multi-step flows Performance: - 10K events delivered <10ms - Panics caught per-handler, bus continues - Publish non-blocking for async handlers All claims backed by passing tests in event_flow_test.go.
Events are scoped per-domain — components in different domains CANNOT communicate. This is the most important architectural constraint. Default: all components share the root domain. Creating a separate domain means complete event isolation — only appropriate for truly independent subsystems with no event dependencies. Documented with correct/wrong architecture diagrams and specific guidance on when separate domains ARE and ARE NOT appropriate.
…ical domains The combined L1/L2 node (cocoon/pocs-and-proposals/l1-l2-node) requires three domain levels: root (shared base), L1 domain, L2 domain. L1 and L2 need event isolation (separate state machines) but both need events from the shared base, and L2 needs specific L1 events (L1BlockFinalized → triggers derivation). Three new capabilities needed (not implemented today): 1. Child → parent event subscription 2. Cross-domain directed events 3. No upward propagation by default Required before Wave 5 PR 21 (L2 NodeBuilder mode). Updated the componentization development plan with PR 20b as a prerequisite.
Single domain first, hierarchical later. This guides all component work: - Waves 1-4: all components share one root domain, events flow freely - Wave 5: add hierarchical domains for L1/L2 combined mode - Component code doesn't change between phases — only domain wiring Developer rules: don't pass domain references into components, define events as plain structs, don't build domain-aware routing. The framework handles domain topology; components are domain-agnostic. Also updated the componentization development plan in erigon-documents with the same principle at the top.
Replace the inline setUpSnapDownloader() + initDownloader() methods with the DownloaderProvider component lifecycle. backend.go now calls Configure → Initialize and reads the Client from the provider. - downloaderProvider field replaces bare downloader.Downloader field - setUpSnapDownloader() and initDownloader() removed from backend.go - File-change callbacks (Seed/Delete) stay in backend.go since they need chainDB and notifications which aren't component deps yet - downloaderClient field kept — all downstream consumers (stages, antiquary, caplin) still receive downloader.Client unchanged - Unused imports removed: downloadercfg, downloadergrpc, downloaderproto, nodecfg
79f210b to
3dc03e1
Compare
3 tasks
mh0lt
added a commit
that referenced
this pull request
Apr 15, 2026
… commit
The glamsterdam suite was tracking upstream
ethpandaops/ethereum-package@main unpinned, while other suites
(regular, pectra) pin to a specific version. Upstream commit 835dd9b
("feat: support gpu ere prover in zkboost", 2026-04-15) introduced an
undefined GpuConfig reference in zkboost_launcher.star:272, breaking
every Erigon PR that ran glamsterdam from that point on — regardless
of what the PR changed. Six unrelated PRs (#20471, #20472, #20526,
#20583, #20584, #20585) all failed identically.
Pin to e07503d16b (2026-04-13, the commit before the break) to stop
the bleeding. This was the last state under which our main successfully
ran glamsterdam. Revisit once upstream stabilises, or switch to a
tagged release that supports gloas_fork_epoch / fulu_fork_epoch which
our glamsterdam.io config requires.
2 tasks
…r refs to Provider After the merge with main (PR #20526 added chain.toml P2P discovery), two cleanup items were missed: 1. setUpSnapDownloader and initDownloader functions remained in backend.go but were never called — their work is done by node/components/downloader/Provider since PR #20471. 2. The new P2P/chain.toml wiring in backend.go (ENR updater, node source, chain.toml publish, peer manager) referenced the deleted backend.downloader field. These calls now go through backend.downloaderProvider.Downloader. All 7 method accesses are hoisted via a local alias to reduce line noise, and guarded by a nil check on both the Provider and its embedded *Downloader. Remote-downloader mode has a nil Downloader field, so the entire chain.toml block is skipped — remote downloaders run their own P2P stack. Lint: clean. Build: erigon + integration OK.
…ap.p2p-manifest Fold in the fix from #20615 — the chain.toml v2 publish/seed stack runs unconditionally on main and collides with the downloaded chain.toml.torrent during post-OtterSync AddTorrentsFromDisk, aborting fresh mainnet sync with "snapshot exists with a different name: chain.toml". Gate the two call sites we still have after the downloader extraction: 1. node/eth/backend.go: wrap the ENR updater + NodeSource + delayed PublishLocalChainToml + StartChainTomlDiscovery + StartTorrentPeerManager block in --snap.p2p-manifest. Default syncs skip the whole v2 stack. 2. db/downloader/downloader.go:810: gate the post-download republish using d.manifestReady != nil (the internal signal that EnableP2PManifest was invoked by the backend). Covers the loop-internal publish in chainTomlDiscoveryLoop too. The third call site from #20615 (initDownloader's initial publish at backend.go:1461 on main) doesn't exist in our branch — initDownloader moved to node/components/downloader/Provider.initDownloader as part of the extraction, and that function doesn't call PublishLocalChainToml. Default --snap.p2p-manifest=false ⇒ zero v2 behaviour ⇒ collision avoided. Opt-in --snap.p2p-manifest=true ⇒ full v2 publish + discover + peer-manager flow preserved. Ref: #20615
This was referenced Apr 17, 2026
AskAlexSharov
approved these changes
Apr 18, 2026
Collaborator
|
Lint fix needed: two extra blank lines before - return nodesInfo, nil
-}
-
-
-
-func SetUpBlockReader(
+ return nodesInfo, nil
+}
+
+func SetUpBlockReader( |
mh0lt
added a commit
that referenced
this pull request
Apr 18, 2026
Resolves conflicts introduced by #20471 (downloader extraction) merging into main while this PR was in review. Conflicts resolved: - node/app/component/component.go: duplicate urfave/cli/v2 import (main's dedup wins — trivial goimports alignment). - node/eth/backend.go: 7 regions where storage branch's nodebuilder wrapper (backend.components *nodebuilder.Builder) meets main's direct-field pattern (backend.downloaderProvider *downloadercomp.Provider). Kept storage's nodebuilder approach throughout (this PR's architectural intent). Dropped main's inline OnFilesChange callback block — storage.Provider.Initialize registers it verbatim (node/components/storage/provider.go:147), so backend.go no longer needs to. Added a pointer comment at the BuildDownloader call noting where OnFilesChange actually lives, to spare future readers a search. Two references at lines 572-573 (chain.toml ENR updater block from #20471) migrated from backend.downloaderProvider.Downloader to backend.components.Downloader.Downloader. Nil-check pattern aligned to the existing 2-check form at line 1060 — backend.components and .Downloader are always non-nil after nodebuilder.New() allocates them; only .Downloader.Downloader can legitimately be nil (remote-downloader mode). Verified: - make erigon integration: clean - make lint: 0 issues - No remaining downloaderProvider / downloadercomp references in backend.go - BuildStorage called after BuildDownloader, so OnFilesChange is wired - Storage Provider's OnFilesChange semantically identical to dropped block Merge strategy chosen (per PR #20476 author) over rebase to preserve existing approvals rather than force a fresh review. No functional changes beyond conflict resolution.
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 18, 2026
…20476) ## Summary Extracts the storage subsystem from backend.go into `node/components/storage/Provider`. This is the foundation for the overlay integration — the storage component will publish SD update events in a later PR once the execution component extraction lands. ### Storage component (`node/components/storage/`) Provider owns: - **DB references** — ChainDB, BlockReader, BlockWriter, snapshots, Bor stores (pass-through from `SetUpBlockReader`) - **File-change callbacks** — seed completed snapshots / delete removed snapshots via downloader client - **Block retirement** (`freezeblocks.BlockRetire`) - **Current block number** read at startup **KvRPC and Notifications are NOT owned by the storage component** — they are an execution-layer concern (state-change events flow from execution, not storage). Storage takes a `services.DBEventNotifier` dep so it can wire snapshot-change events forward; that's the minimum coupling needed. KvRPC + `shards.Notifications` construction stays inline in backend.go for now and will migrate to the execution component when that is extracted. See `erigon-archive/notification-migration-strategy.md`. Also in this PR (for self-contained buildability): a minimal `nodebuilder.Builder` scaffold at `node/nodebuilder/` that holds `Downloader` and `Storage` providers and `BuildDownloader` / `BuildStorage` entry points. This replaces the ad-hoc `downloaderProvider` field on the Ethereum struct with `backend.components`. ### backend.go changes - `downloaderProvider` field → `components *nodebuilder.Builder` - Replace inline file-change callbacks + BlockRetire with a single `BuildStorage` call - `blockRetire` local variable → `backend.components.Storage.BlockRetire` - KvRPC + Notifications constructed inline (unchanged ownership), passed into storage deps ### Review feedback addressed All 5 Copilot comments in commit `f5779d1ec9`: - `GenesisHash` changed from `[32]byte` to `common.Hash` - `ChainDB.View` uses `deps.Ctx` instead of `context.Background()` - `Stop()` nil-guards `s.components` - Stale package comment on `nodebuilder` fixed - This PR description updated to match implementation ## Test plan - [x] `make lint` passes - [x] `make erigon integration` builds - [ ] CI passes **Depends on:** #20471 (downloader extraction)
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 19, 2026
) ## Summary Extracts the sentry / P2P stack from \`node/eth/backend.go\` into \`node/components/sentry/Provider\`, following the downloader component pattern. Step 4 of the componentization push plan. Stacked on #20471 (downloader extraction). Merge order: downloader → sentry. ## What moves From backend.go into \`node/components/sentry/Provider\`: - P2P config assembly (per-protocol port allocation, DiscoveryDNS wiring). - External-sentry mode (dial each \`--sentry.api.addr\`). - Local-sentry mode (per-protocol \`sentry.NewGrpcServer\`, direct-client wrapping). - \`StatusDataProvider\` construction. - Shared \`libsentry.NewSentryMultiplexer\` (one instance, exposed as \`Provider.Multiplexer\`; replaces the two \`sentryMux(sentries)\` call sites). - Execution-P2P layer (\`MessageListener\`, \`PeerPenalizer\`, \`PeerTracker\`, \`MessageSender\`, \`Publisher\`). - \`sentry_multi_client.NewMultiClient\` (late-binding — takes engine + max-peers callback via \`Provider.BuildMultiClient\`). - Stream loops + \`StatusDataProvider.Run\` + execution-P2P \`.Run\` goroutines. - Peer-count logger (was inline in sentry construction; now \`runPeerCountLogger\` on Provider). - \`sentryServer.Close()\` loop (moved into \`Provider.Close\`). - \`splitAddrIntoHostAndPort\` + \`checkPortIsFree\` utility functions (sentry was the only consumer). ## What stays - Chain.toml ENR-update block (still backend.go; reads from \`backend.sentryProvider.Servers\`). Migrates in a later step when chain.toml V2 lands. - \`maxBlockBroadcastPeers\` (engine-aware — depends on \`bor.Bor.IsValidator\`; sentry shouldn't know about consensus engines). - \`BackwardBlockDownloader\` + \`executionFetcher\` chain (execution-side consumer of the exec-P2P layer; \`tmpdir\` is a node concern). - \`sentryMcDisableBlockDownload\` (file-scope const; still read by the mined-blocks prefetch goroutine). ## What's deferred - **Event wiring** (step 7 of the plan). The \`P2PReady\`, \`PeerListChanged\`, \`TorrentPortBound\` events have no consumers on this branch. Per the [snapshot-flow.md simplicity bias](https://github.com/erigontech/erigon-documents/blob/snapshot-distribution-shape/ethereum/design/erigon-archive/snapshot-flow.md), wire when step 5 (initial P2P snapshots) needs them. - **Smoke test** (step 10). Build + lint + unit tests pass; local \`--chain=dev\` run pending before marking ready for review. ## Commit breakdown Six commits matching the plan's incremental steps, each lint-clean and buildable: 1. \`094ac2c2d6\` — scaffold empty Provider with lifecycle API 2. \`4370280cc1\` — move P2P config + server construction 3. \`8df0105b62\` — move StatusDataProvider + execution-P2P layer 4. \`e51bdeae7b\` — move MultiClient construction (\`BuildMultiClient\`) 5. \`48ac128a46\` — move lifecycle (Start + Close) 6. \`5f05906203\` — migrate consumer field accesses to Provider-direct reads ## Diff shape - \`node/components/sentry/\` — 576 LOC new (Provider + port_util) - \`node/eth/backend.go\` — net -99 LOC (construction block + lifecycle + pass-through fields all removed) ## Test plan - [x] \`make lint\` — 0 issues across all commits - [x] \`make erigon integration\` — both binaries build at every commit - [x] \`go test -short ./node/eth/...\` — passes - [ ] Local \`--chain=dev\` smoke test — pending - [ ] CI passes ## Design contract This PR implements the sentry-component responsibilities from [snapshot-flow.md §2.3](https://github.com/erigontech/erigon-documents/blob/snapshot-distribution-shape/ethereum/design/erigon-archive/snapshot-flow.md#23-sentry-component-nodecomponentssentry). Event publish/subscribe hooks described there land in step 5 (initial P2P snapshots) when the event bus has consumers. --------- Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extracts the downloader setup from backend.go into a lifecycle-managed component, and adds event bus documentation.
Downloader component (
node/components/downloader/)ProviderwrapsNewGrpcClient/BuildTorrentClientinto a component with Initialize/Start lifecycleEvent bus guidelines (
node/app/event/GUIDELINES.md)Component tests
event_flow_test.go)Test plan
make lintpassesmake erigonbuildsgo test ./node/components/downloader/... ./node/app/...— all passDepends on: #20450 (component framework)