node/components/sentry: extract sentry component from backend.go#20629
Merged
Conversation
Initial structure for the Downloader component extraction: - Provider: lifecycle skeleton (Configure, Initialize, Close) - Tests: disabled mode, no-config, idempotent close - README: documents the component API, events, configuration, gRPC wrapper strategy, and integration plan Key design decisions documented: - Direct Go API as primary interface (gRPC becomes optional wrapper) - Events replace OnFilesChange callbacks (via node/app/event bus) - Component owns its config (not spread across ethconfig) - backend.go integration deferred until events are defined This is a checkpoint — the actual backend.go wiring requires defining event types and replacing the gRPC-based Client interface first.
Shared contract between the Downloader and its peers: Subscribes to: - SnapshotFilesCreated — seed new files (replaces OnFilesChange create) - SnapshotFilesDeleted — remove torrents (replaces OnFilesChange delete) - DownloadRequested — fetch files by name/infohash (replaces Client.Download) Publishes: - SnapshotDownloadComplete — unblocks stages (replaces afterSnapshotDownload) - NewSnapshotAvailable — notifies RPC (replaces events.OnNewSnapshot) - TorrentStatus — monitoring/metrics These types can be used by plugins/snapshot-manager immediately since it's greenfield and can be event-native from the start.
Document missing framework capabilities needed for production debugging: - Event tracing (currently only async handlers log at trace level) - Dead letter queue (unmatched events are silently dropped) - Event flow visualization (no way to see routing graph) These are framework-level issues in node/app/event that affect all event-driven components, not just the Downloader.
Comprehensive event bus testing: Functional tests: - TestEventDeliveryBetweenComponents — async delivery works - TestEventNotDeliveredToUnsubscribed — dead letter (0 handlers) - TestMultipleSubscribers — fan-out to N handlers - TestEventTypeRouting — type-based dispatch (FileCreated vs FileDeleted) - TestSyncHandlerTotalOrdering — sync handlers execute in registration order (total ordering within a single Publish call) Load tests: - TestEventLoadCompleteness — 10,000 events, all delivered - TestEventLoadMultipleSubscribers — 5,000 events × 2 subscribers Failure behavior: - TestBlockingHandlerDoesNotBlockPublisher — slow async handler doesn't block Publish (returns in <50ms while handler takes 100ms) - TestPanickingHandlerDoesNotCrashBus — panic in one handler doesn't prevent delivery to other handlers Integration: - TestComponentLifecycleWithEvents — start→events→stop full cycle Key findings documented: - Sync handlers: totally ordered per Publish call (under bus lock) - Async handlers: no ordering guarantee (worker pool dispatch) - Events are per-bus (per-domain), not cross-domain - Panics are caught per-handler, don't crash the bus - Publish is non-blocking for async handlers Also skipped TestAddRemoveDeps (same root domain gomock issue as the other two skipped tests).
Documents the tested behavior and design patterns: Ordering: - Sync handlers: totally ordered per Publish (registration order) - Async handlers: no ordering (worker pool dispatch) - Events scoped per-domain (no cross-domain propagation) Rules: - Never block in async handlers (starves worker pool) - Keep handlers idempotent (may replay) - Don't publish from sync handlers (deadlock) - Use value types for events (immutability) Patterns: - Event + Confirmation for acknowledgement - Saga coordinator for multi-step flows Performance: - 10K events delivered <10ms - Panics caught per-handler, bus continues - Publish non-blocking for async handlers All claims backed by passing tests in event_flow_test.go.
Events are scoped per-domain — components in different domains CANNOT communicate. This is the most important architectural constraint. Default: all components share the root domain. Creating a separate domain means complete event isolation — only appropriate for truly independent subsystems with no event dependencies. Documented with correct/wrong architecture diagrams and specific guidance on when separate domains ARE and ARE NOT appropriate.
…ical domains The combined L1/L2 node (cocoon/pocs-and-proposals/l1-l2-node) requires three domain levels: root (shared base), L1 domain, L2 domain. L1 and L2 need event isolation (separate state machines) but both need events from the shared base, and L2 needs specific L1 events (L1BlockFinalized → triggers derivation). Three new capabilities needed (not implemented today): 1. Child → parent event subscription 2. Cross-domain directed events 3. No upward propagation by default Required before Wave 5 PR 21 (L2 NodeBuilder mode). Updated the componentization development plan with PR 20b as a prerequisite.
Single domain first, hierarchical later. This guides all component work: - Waves 1-4: all components share one root domain, events flow freely - Wave 5: add hierarchical domains for L1/L2 combined mode - Component code doesn't change between phases — only domain wiring Developer rules: don't pass domain references into components, define events as plain structs, don't build domain-aware routing. The framework handles domain topology; components are domain-agnostic. Also updated the componentization development plan in erigon-documents with the same principle at the top.
Replace the inline setUpSnapDownloader() + initDownloader() methods with the DownloaderProvider component lifecycle. backend.go now calls Configure → Initialize and reads the Client from the provider. - downloaderProvider field replaces bare downloader.Downloader field - setUpSnapDownloader() and initDownloader() removed from backend.go - File-change callbacks (Seed/Delete) stay in backend.go since they need chainDB and notifications which aren't component deps yet - downloaderClient field kept — all downstream consumers (stages, antiquary, caplin) still receive downloader.Client unchanged - Unused imports removed: downloadercfg, downloadergrpc, downloaderproto, nodecfg
…r refs to Provider After the merge with main (PR #20526 added chain.toml P2P discovery), two cleanup items were missed: 1. setUpSnapDownloader and initDownloader functions remained in backend.go but were never called — their work is done by node/components/downloader/Provider since PR #20471. 2. The new P2P/chain.toml wiring in backend.go (ENR updater, node source, chain.toml publish, peer manager) referenced the deleted backend.downloader field. These calls now go through backend.downloaderProvider.Downloader. All 7 method accesses are hoisted via a local alias to reduce line noise, and guarded by a nil check on both the Provider and its embedded *Downloader. Remote-downloader mode has a nil Downloader field, so the entire chain.toml block is skipped — remote downloaders run their own P2P stack. Lint: clean. Build: erigon + integration OK.
…ap.p2p-manifest Fold in the fix from #20615 — the chain.toml v2 publish/seed stack runs unconditionally on main and collides with the downloaded chain.toml.torrent during post-OtterSync AddTorrentsFromDisk, aborting fresh mainnet sync with "snapshot exists with a different name: chain.toml". Gate the two call sites we still have after the downloader extraction: 1. node/eth/backend.go: wrap the ENR updater + NodeSource + delayed PublishLocalChainToml + StartChainTomlDiscovery + StartTorrentPeerManager block in --snap.p2p-manifest. Default syncs skip the whole v2 stack. 2. db/downloader/downloader.go:810: gate the post-download republish using d.manifestReady != nil (the internal signal that EnableP2PManifest was invoked by the backend). Covers the loop-internal publish in chainTomlDiscoveryLoop too. The third call site from #20615 (initDownloader's initial publish at backend.go:1461 on main) doesn't exist in our branch — initDownloader moved to node/components/downloader/Provider.initDownloader as part of the extraction, and that function doesn't call PublishLocalChainToml. Default --snap.p2p-manifest=false ⇒ zero v2 behaviour ⇒ collision avoided. Opt-in --snap.p2p-manifest=true ⇒ full v2 publish + discover + peer-manager flow preserved. Ref: #20615
AskAlexSharov
approved these changes
Apr 18, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Extracts the Sentry / DevP2P stack wiring out of node/eth/backend.go into a dedicated lifecycle-managed component (node/components/sentry.Provider), aligning with the existing downloader componentization work.
Changes:
- Introduces
node/components/sentry.Providerto own sentry server/client setup, status-data provider, multiplexer, and execution-P2P layer construction and lifecycle. - Updates
node/eth/backend.goto configure/initialize/start/close the sentry component and to consume its exported fields instead of cachedEthereumfields. - Moves sentry-specific port/address helpers into
node/components/sentry/port_util.go.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| node/eth/backend.go | Replaces inline sentry/P2P stack construction + goroutines with sentrycomp.Provider lifecycle usage and direct field reads. |
| node/components/sentry/provider.go | New component implementing Configure/Initialize/BuildMultiClient/Start/Close for the sentry + exec-P2P stack. |
| node/components/sentry/port_util.go | New home for sentry-only address split + port probing utilities removed from backend. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+189
to
+209
| // Initialize builds the sentry stack. In external-sentry mode (P2P.SentryAddr | ||
| // set) it dials each external address via gRPC. In local mode it constructs | ||
| // one sentry.GrpcServer per requested protocol version, wraps each in a | ||
| // direct client, and appends them to Sentries. | ||
| // | ||
| // After this returns, p.Sentries is ready for multi-client construction and | ||
| // p.Servers is the list of local servers (empty in external mode). | ||
| // | ||
| // Initialize does NOT start background goroutines — call Start for that. | ||
| func (p *Provider) Initialize(ctx context.Context) error { | ||
| if len(p.cfg.P2P.SentryAddr) > 0 { | ||
| // External sentry: dial each address, collect the clients. | ||
| for _, addr := range p.cfg.P2P.SentryAddr { | ||
| sentryClient, err := sentry_multi_client.GrpcClient(p.cfg.SentryCtx, addr) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| p.Sentries = append(p.Sentries, sentryClient) | ||
| } | ||
| return nil | ||
| } |
| } | ||
| p.Sentries = append(p.Sentries, sentryClient) | ||
| } | ||
| return nil |
| } | ||
| if !checkPortIsFree(fmt.Sprintf("%s:%d", listenHost, pc)) { | ||
| p.logger.Warn("bind protocol to port has failed: port is busy", | ||
| "protocols", fmt.Sprintf("eth/%d", cfg.ProtocolVersion), "port", pc) |
Comment on lines
+79
to
+88
| // External-sentry mode is triggered when P2P.SentryAddr is non-empty: | ||
| // the Provider dials these addresses via gRPC instead of building | ||
| // local servers. All fields below are only consulted in local mode. | ||
|
|
||
| // ChainDB is queried by the readNodeInfo callback during ENR refresh, | ||
| // by StatusDataProvider during status-message construction, and by the | ||
| // multi-sentry client built via BuildMultiClient. It must satisfy | ||
| // kv.TemporalRoDB so the MultiClient can open temporal read transactions | ||
| // for header validation during backward download. | ||
| ChainDB kv.TemporalRoDB |
First commit of the sentry componentization (step 4 of the componentization plan). Establishes the Provider skeleton matching the downloader component's pattern: Configure → Initialize → Start → Close. All methods are no-ops in this commit. Subsequent commits migrate sentry setup out of node/eth/backend.go into the Provider piece by piece, each accompanied by a backend.go shrink. Public fields (Servers, Client, StatusDataProvider, Multiplexer, ExecutionP2P) will be added as each corresponding chunk of backend.go moves in. Deliberately no fields yet to keep this commit purely structural. Scaffold compiles and lints clean; no functional change.
…ovider Step 2 of the sentry componentization (see .claude/plans/sentry-componentization.md). Moves the external-sentry gRPC dial and local-sentry server construction out of node/eth/backend.go into sentrycomp.Provider.Initialize. What moves: - P2P config assembly + DiscoveryDNS wiring. - External-sentry path (dial each address in --sentry.api.addr). - Local-sentry path (per-protocol server build with port allocation, readNodeInfo closure, direct-client wrapping). - splitAddrIntoHostAndPort + checkPortIsFree utilities (moved to a new node/components/sentry/port_util.go since they're only used by sentry). What stays in backend.go for now (migrates in later commits): - backend.sentryServers is populated from sentryProvider.Servers so the rest of backend.go sees the same slice; callers migrate later. - The peer-count logger goroutine stays inline (step 5: move into Start). - StatusDataProvider, execution-P2P layer, MultiClient (steps 3–4). Behavioural changes: none. The Provider's Initialize executes the same logic in the same order with the same dependencies (SentryCtx, P2P config, ChainDB, ChainConfig, GenesisHash, NetworkID, EthDiscoveryURLs, ChainName, NodesDir, EnableWitProtocol). Net diff: backend.go -87 lines (construction block + two utility funcs), sentry package +196 lines. Lint clean, builds clean.
… into Provider Step 3 of the sentry componentization. Moves the StatusDataProvider construction, the shared sentry multiplexer, and the execution-P2P layer (MessageListener, PeerPenalizer, PeerTracker, MessageSender, Publisher) out of node/eth/backend.go into sentrycomp.Provider.Initialize. What moves: - sentry.NewStatusDataProvider construction — needs ChainDB, ChainConfig, Genesis, NetworkID, BlockReader. Added Genesis + BlockReader to sentrycomp.Config (ChainDB/Cfg/Hash/ID were already there for the readNodeInfo closure in step 2). - libsentry.NewSentryMultiplexer — one shared multiplexer, exposed as Provider.Multiplexer. Used by both the execution-P2P layer and by the polygon sync service; the second sentryMux(sentries) call site at the polygon sync service now reads backend.sentryProvider.Multiplexer. - execp2p.NewPeerPenalizer, NewMessageListener, NewPeerTracker, NewMessageSender, NewPublisher — all become public fields on Provider. - Removed the now-unused sentryMux helper from backend.go. What stays in backend.go: - Ethereum struct fields (backend.statusDataProvider, backend.executionP2PMessageListener, ...PeerTracker, ...Publisher) are populated by reading from the Provider; callers downstream migrate in a later step. This keeps the diff focused on construction, not consumer rewiring. - executionFetcher + BackwardBlockDownloader stay — they're execution-side consumers of the exec-P2P layer and depend on tmpdir, which is a node concern, not a sentry one. - Stageloop hook call sites still read the local statusDataProvider variable (populated from Provider); will migrate to read directly from backend.sentryProvider in a later step. Behavioural changes: none. Same types constructed in the same order. Lint clean, erigon + integration build clean.
…uildMultiClient Step 4 of the sentry componentization. Moves the sentry_multi_client NewMultiClient call from backend.go into a new BuildMultiClient method on the Provider. Why a separate method rather than extending Initialize: the MultiClient needs the consensus engine and a header-aware max-broadcast-peers callback, both of which are constructed between the sentry servers (early) and the MultiClient (later). Keeping those late-binding inputs in a dedicated MultiClientDeps struct avoids widening Config. What moves: - sentry_multi_client.NewMultiClient → Provider.BuildMultiClient. - Config.ChainDB widens from kv.RoDB to kv.TemporalRoDB (MultiClient needs BeginTemporalRo for header validation during backward download). backend.chainDB already satisfies the stronger interface, so the caller change is type-system-only. What stays: - sentryMcDisableBlockDownload promoted from a local const inside the removed block to a file-scope const, still consumed by the mined-blocks prefetch goroutine. - backend.sentriesClient is populated from backend.sentryProvider.Client (the Provider's new field). Downstream callers still read backend.sentriesClient; they migrate in a later step. - maxBlockBroadcastPeers computation stays in backend.go — it depends on bor.Bor.IsValidator and sentry shouldn't know about consensus engines. Behavioural changes: none. Same NewMultiClient call, same args, same order. Lint clean, erigon + integration build clean.
Steps 5+6 of the sentry componentization.
Start now:
- Runs MultiClient.StartStreamLoops(SentryCtx) when Client is present.
- Subscribes to AddHeaderSubscription + AddNewSnapshotSubscription on
the Events supplied via Config (nil-safe) and runs
StatusDataProvider.Run in the Provider's errgroup. Unsubscribe is
deferred inside the goroutine so teardown is symmetric.
- Launches MessageListener/PeerTracker/Publisher Run loops in the
errgroup. Each logs non-cancel errors and returns on ctx.Done()
(same behaviour as the legacy backend.go goroutines).
- Launches the peer-count logger (local-sentry mode only) — moved in
from the inline goroutine that used to live in the sentry
construction block in backend.go.
Start is idempotent via a started-flag guard. It ignores the caller
context and uses Config.SentryCtx for lifetime, matching the legacy
behaviour.
Close now:
- Calls Close() on each local sentry GrpcServer (moved from the
for-range loop in backend.go's Stop).
- Waits on the Provider's errgroup so background goroutines drain
before the caller's shutdown sequence continues.
- Does NOT cancel SentryCtx itself — that stays a caller responsibility
(backend.sentryCancel is the owner).
Config gains Events *shards.Events (optional; gate on nil).
backend.go changes:
- Ethereum.Start's ~40 lines of sentry/exec-P2P goroutine bookkeeping
replaced with a single `backend.sentryProvider.Start(backend.sentryCtx)`.
- Ethereum.Stop's `for _, s := range sentryServers { s.Close() }` loop
replaced with `backend.sentryProvider.Close()`.
- The inline peer-logger goroutine removed.
- Unused imports "strconv" and p2p/protocols/eth dropped.
Behavioural changes: none. Same goroutines, same contexts, same
lifecycle order. errgroup tracking moves from backend.bgComponentsEg
to a Provider-local errgroup; sentry-owned goroutines are now waited
on via Close rather than via backend's errgroup. Net shutdown-order
effect: sentry goroutines drain before the rest of backend.bgComponentsEg
(the existing Close call sits before the bgComponentsEg.Wait).
Lint clean, erigon + integration build clean.
…directly
Step 8 of the sentry componentization. Removes the cached fields on the
Ethereum struct that duplicated sentryProvider's public fields, and
migrates ~30 call sites to read directly from sentryProvider.
Ethereum struct shrinks: sentriesClient, sentryServers,
statusDataProvider, executionP2PMessageListener, executionP2PPeerTracker,
executionP2PPublisher all deleted. Only sentryProvider remains as the
single access point.
Callers migrated (mechanical rename, no logic changes):
- backend.sentriesClient → backend.sentryProvider.Client
- backend.sentryServers → backend.sentryProvider.Servers
- backend.statusDataProvider → backend.sentryProvider.StatusDataProvider
- backend.executionP2P{MessageListener,PeerTracker,Publisher} →
backend.sentryProvider.ExecutionP2P{MessageListener,PeerTracker,Publisher}
- Corresponding s.X forms in Ethereum methods.
Removed pass-through assignments that became tautological after the
rename (e.g. sentryProvider.Servers = sentryProvider.Servers). Dropped
the now-unused "p2p/sentry" import from backend.go (only sentrycomp
needs it; the Provider owns sentry-server types internally).
Rationale: with the Provider owning construction + lifecycle, caching
pointers on Ethereum adds a layer of indirection without value — readers
can't tell whether the field is still authoritative. Reading directly
from the Provider makes the source of truth obvious and avoids stale-
pointer hazards when the Provider mutates its own state (e.g. future
reset-on-reconnect, or test doubles that swap Client).
Behaviour: unchanged. Lint clean, erigon + integration build clean.
5f05906 to
be4252d
Compare
Resolves conflicts introduced by #20476 (storage component extraction) merging into main while this PR was in review. - node/eth/backend.go imports: drop direct downloadercomp import (now encapsulated in nodebuilder.Builder from #20476); keep sentrycomp for this PR's extraction and add storagecomp for BuildStorage deps from main. - node/eth/backend.go chain.toml ENR wiring: combine both migrations — downloader access now goes through backend.components.Downloader (nodebuilder pattern from #20476), sentry server list through backend.sentryProvider.Servers (this PR's extraction). Verified: make erigon integration clean; make lint 0 issues. No functional changes beyond conflict resolution. Merge strategy chosen (per prior sentry PR reviewer approval) over rebase to preserve existing approvals rather than force a fresh review.
Four Copilot review comments addressed:
1. provider.go:88 — Config comment group correction. The block comment
over ChainDB claimed "all fields below are only consulted in local
mode," but ChainDB / ChainConfig / GenesisHash / NetworkID / Genesis
/ BlockReader are all consumed by StatusDataProvider and the
MultiClient in both modes. Split the Config field block into
"required in both modes" and "local-mode only" groups with accurate
per-field commentary.
2. provider.go:208 — External-sentry mode early return. Initialize
returned before calling buildStatusAndExecutionP2P(), leaving
StatusDataProvider / Multiplexer / execution-P2P layer nil — which
breaks backend.go and MultiClient construction (both dereference
these fields, MultiClient uses StatusDataProvider.GetStatusData in
its stream pumps). Restructured so both modes reach
buildStatusAndExecutionP2P() before returning; local-mode code now
only runs when P2P.SentryAddr is empty.
3. provider.go:257 — Malformed port-busy log. Replaced
fmt.Sprintf("eth/%d", cfg.ProtocolVersion) — which logs
"%!d([]uint=...)" because ProtocolVersion is a slice — with
eth.ProtocolToString[protocol] so operators see which specific
protocol failed to bind.
4. provider_test.go — Basic unit tests added. Covers Configure
purity, Close idempotency (including pre-Initialize and double
Close), splitAddrIntoHostAndPort across ipv4/hostname/ipv6/error
cases, and checkPortIsFree with a live listener probe. Integration-
style coverage of Initialize itself (local-mode server construction,
BuildMultiClient wiring) is deferred — it requires real P2P stack
setup and belongs in a dedicated integration test if/when that
harness is built.
Build + make lint clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extracts the sentry / P2P stack from `node/eth/backend.go` into `node/components/sentry/Provider`, following the downloader component pattern. Step 4 of the componentization push plan.
Stacked on #20471 (downloader extraction). Merge order: downloader → sentry.
What moves
From backend.go into `node/components/sentry/Provider`:
What stays
What's deferred
Commit breakdown
Six commits matching the plan's incremental steps, each lint-clean and buildable:
Diff shape
Test plan
Design contract
This PR implements the sentry-component responsibilities from snapshot-flow.md §2.3. Event publish/subscribe hooks described there land in step 5 (initial P2P snapshots) when the event bus has consumers.