node/eth, db/downloader: gate chain.toml v2 publish paths behind --snap.p2p-manifest#20615
Conversation
…ap.p2p-manifest The chain.toml v2 publish/seed stack was running unconditionally. On a fresh mainnet sync with the default --snap.p2p-manifest=false, initDownloader's PublishLocalChainToml() seeded the locally-computed chain.toml torrent into the anacrolix client. OtterSync then downloaded the authoritative chain.toml .torrent from peers and AddTorrentsFromDisk collided with the already-seeded entry: "snapshot exists with a different name: \"chain.toml\"". The execution service never started. Gate the three unconditional v2 call sites on the existing --snap.p2p-manifest flag so default syncs keep pre-v2 behaviour: - node/eth/backend.go: wrap the ENR updater / NodeSource / delayed publish / discovery / torrent-peer-manager setup block in the P2PManifest check. - node/eth/backend.go: gate the initial publish in initDownloader(). - db/downloader/downloader.go: gate the post-download republish in DownloadSnapshots() using d.manifestReady (non-nil iff EnableP2PManifest was invoked by the backend). This also covers the loop-internal publish in chainTomlDiscoveryLoop, which only runs under P2P manifest mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ap.p2p-manifest Fold in the fix from #20615 — the chain.toml v2 publish/seed stack runs unconditionally on main and collides with the downloaded chain.toml.torrent during post-OtterSync AddTorrentsFromDisk, aborting fresh mainnet sync with "snapshot exists with a different name: chain.toml". Gate the two call sites we still have after the downloader extraction: 1. node/eth/backend.go: wrap the ENR updater + NodeSource + delayed PublishLocalChainToml + StartChainTomlDiscovery + StartTorrentPeerManager block in --snap.p2p-manifest. Default syncs skip the whole v2 stack. 2. db/downloader/downloader.go:810: gate the post-download republish using d.manifestReady != nil (the internal signal that EnableP2PManifest was invoked by the backend). Covers the loop-internal publish in chainTomlDiscoveryLoop too. The third call site from #20615 (initDownloader's initial publish at backend.go:1461 on main) doesn't exist in our branch — initDownloader moved to node/components/downloader/Provider.initDownloader as part of the extraction, and that function doesn't call PublishLocalChainToml. Default --snap.p2p-manifest=false ⇒ zero v2 behaviour ⇒ collision avoided. Opt-in --snap.p2p-manifest=true ⇒ full v2 publish + discover + peer-manager flow preserved. Ref: #20615
|
Folded the two applicable call sites into #20471 as commit 85b0dce — backend.go outer gate + downloader.go manifestReady gate. The third call site from this PR (the Whichever of #20471 and this PR merges first, the other can drop the overlap on rebase. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a fresh mainnet sync regression caused by chain.toml v2 publishing/discovery running unconditionally, leading to a torrent-name collision during AddTorrentsFromDisk. It restores the pre-v2 default sync path by gating v2 chain.toml publishing/discovery/peer-management behind the existing opt-in flag --snap.p2p-manifest (default false).
Changes:
- Gate the chain.toml v2 ENR updater + discovery + torrent peer-manager wiring in
New()behindSnapshot.P2PManifest. - Gate the initial
PublishLocalChainToml()call ininitDownloader()behindSnapshot.P2PManifest. - Gate the post-download
PublishLocalChainToml()inDownloader.DownloadSnapshots()behind the internalmanifestReadysignal (set only when P2P manifest mode is enabled).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| node/eth/backend.go | Gates chain.toml v2 publish/discovery startup and the initial publish to only run when --snap.p2p-manifest is enabled. |
| db/downloader/downloader.go | Prevents post-download chain.toml republish unless P2P manifest mode was enabled (via manifestReady). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ap.p2p-manifest (#20615) ## Summary - Fix fresh-mainnet-sync regression introduced with the chain.toml v2 merge: the publish/seed stack was running unconditionally and collided with the downloaded `chain.toml.torrent` during post-OtterSync `AddTorrentsFromDisk`, aborting the execution service with `snapshot exists with a different name: "chain.toml"`. - Gate the three unconditional v2 call sites on the existing `--snap.p2p-manifest` flag (default `false`) so default syncs stay on the pre-v2 path while opt-in users get the full v2 flow. ## Reproduction (before this change) ``` USE_STATE_CACHE=false ./build/bin/erigon \ --datadir=<fresh> --chain=mainnet --prune.mode=minimal ``` Fails during stage `[1/6 OtterSync]` immediately after `Downloader completed remaining snapshots`: ``` [EROR] Could not start execution service err="[1/6 OtterSync] after snapshot download: adding torrents from disk: adding torrent for chain.toml.torrent: snapshot exists with a different name: \"chain.toml\"" ``` Collision originates in `getExistingSnapshotTorrent` ([db/downloader/downloader.go:1683](https://github.com/erigontech/erigon/blob/main/db/downloader/downloader.go#L1683)) — the locally-seeded `chain.toml` torrent is already registered under that info-hash when the disk-scanned `chain.toml.torrent` tries to add. ## Change Three gated sites, all tied to the pre-existing `--snap.p2p-manifest` flag (usage: *"Discover snapshot manifest (chain.toml) from P2P peers via ENR instead of using centralized preverified.toml"* — default `false`): 1. [node/eth/backend.go:541](https://github.com/erigontech/erigon/blob/main/node/eth/backend.go#L541) — wrap the whole ENR updater / `SetNodeSourceFn` / delayed `PublishLocalChainToml` / `StartChainTomlDiscovery` / `StartTorrentPeerManager` block in `&& backend.config.Snapshot.P2PManifest`. 2. [node/eth/backend.go:1461](https://github.com/erigontech/erigon/blob/main/node/eth/backend.go#L1461) — gate the initial publish in `initDownloader()`. 3. [db/downloader/downloader.go:810](https://github.com/erigontech/erigon/blob/main/db/downloader/downloader.go#L810) — gate the post-download republish using `d.manifestReady != nil`, which is the internal signal that `EnableP2PManifest()` was invoked by the backend. This also covers the loop-internal publish in `chainTomlDiscoveryLoop` (only runs under P2P manifest mode). Default `--snap.p2p-manifest=false` ⇒ **zero v2 behaviour** ⇒ collision avoided. Opt-in `--snap.p2p-manifest=true` ⇒ full v2 publish + discover + peer-manager flow preserved. ## Test plan - [x] `make lint` — 0 issues - [x] `make erigon` — builds clean - [x] `go test -short ./db/downloader/...` — passes (`chaintoml_test.go`, `chaintoml_v2_test.go` call `PublishChainToml` directly, unaffected) - [x] `go vet ./node/eth/... ./db/downloader/...` — clean - [ ] Manual: fresh `--chain=mainnet` sync with default flags completes OtterSync and enters Headers stage without the collision error (reviewer to confirm on their setup — original reproducer took ~13 min to hit the crash point) - [ ] Manual: fresh sync with `--snap.p2p-manifest=true` still performs chain.toml discovery / publishing as before (v2 behaviour preserved) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
chain.toml.torrentduring post-OtterSyncAddTorrentsFromDisk, aborting the execution service withsnapshot exists with a different name: "chain.toml".--snap.p2p-manifestflag (defaultfalse) so default syncs stay on the pre-v2 path while opt-in users get the full v2 flow.Reproduction (before this change)
Fails during stage
[1/6 OtterSync]immediately afterDownloader completed remaining snapshots:Collision originates in
getExistingSnapshotTorrent(db/downloader/downloader.go:1683) — the locally-seededchain.tomltorrent is already registered under that info-hash when the disk-scannedchain.toml.torrenttries to add.Change
Three gated sites, all tied to the pre-existing
--snap.p2p-manifestflag (usage: "Discover snapshot manifest (chain.toml) from P2P peers via ENR instead of using centralized preverified.toml" — defaultfalse):SetNodeSourceFn/ delayedPublishLocalChainToml/StartChainTomlDiscovery/StartTorrentPeerManagerblock in&& backend.config.Snapshot.P2PManifest.initDownloader().d.manifestReady != nil, which is the internal signal thatEnableP2PManifest()was invoked by the backend. This also covers the loop-internal publish inchainTomlDiscoveryLoop(only runs under P2P manifest mode).Default
--snap.p2p-manifest=false⇒ zero v2 behaviour ⇒ collision avoided.Opt-in
--snap.p2p-manifest=true⇒ full v2 publish + discover + peer-manager flow preserved.Test plan
make lint— 0 issuesmake erigon— builds cleango test -short ./db/downloader/...— passes (chaintoml_test.go,chaintoml_v2_test.gocallPublishChainTomldirectly, unaffected)go vet ./node/eth/... ./db/downloader/...— clean--chain=mainnetsync with default flags completes OtterSync and enters Headers stage without the collision error (reviewer to confirm on their setup — original reproducer took ~13 min to hit the crash point)--snap.p2p-manifest=truestill performs chain.toml discovery / publishing as before (v2 behaviour preserved)🤖 Generated with Claude Code