snapcfg, downloader: lazy-parse registry, per-chain remote loading#19641
Merged
Conversation
…ry, per-chain remote loading - Move cloudflareHeaders + InsertCloudflareHeaders to db/snapcfg/cloudflare.go as canonical location; remove duplicates from db/downloader. - Make preverifiedRegistry lazy: store raw TOML bytes and parse on demand instead of parsing all chains at init. - Add LoadRemotePreverifiedForChain to fetch snapshot hashes for a single chain instead of all chains. - Use LoadRemotePreverifiedForChain in downloadercfg.LoadSnapshotsHashes.
Extract inline URL construction from fetchChainToml into ChainTomlR2URL and ChainTomlGitHubURL in cdn.go (renamed from cloudflare.go since it now covers more than just Cloudflare headers). Add table-driven unit tests for both URL builders.
Check io.ReadAll error before the empty-response check so partial reads with errors are not silently returned. Rewrite GetToml to use the existing snapshotHashPtrs map instead of a duplicated switch.
Member
Author
|
cc: @anacrolix hope you don't mind I want to progressively remove the snapshot registry struct |
Contributor
|
@wmitsuda it's great. This has been on my to-do for ages. I'll follow up by checking the issues I was tracking for this. |
wmitsuda
added a commit
that referenced
this pull request
Mar 8, 2026
…9722) Follow-up to #19641 — step 2/N towards simplifying TOML reading at startup. ## Summary - **Lazy-parse webseed TOML**: instead of parsing all 8 chains' webseed TOML at init time, store raw bytes in `EmbeddedWebseedsRaw` and parse on demand via `GetEmbeddedWebseeds(chain)` — only the chain actually in use gets parsed. - **Remove no-op re-assignment**: `LoadRemotePreverified` was redundantly re-building the same `KnownWebseeds` map; removed. - **Inline `webseedsParse`**: folded into its sole caller `GetEmbeddedWebseeds`. - **Rename `KnownWebseeds` → `EmbeddedWebseeds`**: clearer naming — `EmbeddedWebseedsRaw` for the raw bytes map, `GetEmbeddedWebseeds()` for parsed access. --- **TODO**: cherry-pick to `release/3.4` after merge.
wmitsuda
added a commit
that referenced
this pull request
Mar 8, 2026
…9722) Follow-up to #19641 — step 2/N towards simplifying TOML reading at startup. ## Summary - **Lazy-parse webseed TOML**: instead of parsing all 8 chains' webseed TOML at init time, store raw bytes in `EmbeddedWebseedsRaw` and parse on demand via `GetEmbeddedWebseeds(chain)` — only the chain actually in use gets parsed. - **Remove no-op re-assignment**: `LoadRemotePreverified` was redundantly re-building the same `KnownWebseeds` map; removed. - **Inline `webseedsParse`**: folded into its sole caller `GetEmbeddedWebseeds`. - **Rename `KnownWebseeds` → `EmbeddedWebseeds`**: clearer naming — `EmbeddedWebseedsRaw` for the raw bytes map, `GetEmbeddedWebseeds()` for parsed access. --- **TODO**: cherry-pick to `release/3.4` after merge.
5 tasks
wmitsuda
added a commit
that referenced
this pull request
Mar 11, 2026
## Summary Part 3 of optimizing remote preverified hash loading (after #19641, #19722). - `LoadPreverified` now takes a `chainName` parameter and calls `LoadRemotePreverified` instead of the old bulk variant that fetched all 10 chains - Refactored `webseeds.Verify` to load preverified per-chain inside the iteration loop instead of bulk-loading all chains upfront - Removed unused functions: old bulk `LoadRemotePreverified`, `registry.All`, `registry.ResetRaw`, `GetAllCurrentPreverified` - Renamed `LoadRemotePreverifiedForChain` → `LoadRemotePreverified` since it's now the only variant ## Test plan - [x] Built `erigon` and `downloader` binaries - [x] Tested `erigon seg reset --dry-run` with mainnet and hoodi ephemeral datadirs - [x] Tested `downloader verify_webseeds --chain=chiado --preverified=embedded` to completion - [x] Verified only the requested chain is fetched (confirmed via log output) ## Tasks - [ ] Cherry-pick merge commit to `release/3.4`
wmitsuda
added a commit
that referenced
this pull request
Mar 11, 2026
## Summary Part 3 of optimizing remote preverified hash loading (after #19641, #19722). - `LoadPreverified` now takes a `chainName` parameter and calls `LoadRemotePreverified` instead of the old bulk variant that fetched all 10 chains - Refactored `webseeds.Verify` to load preverified per-chain inside the iteration loop instead of bulk-loading all chains upfront - Removed unused functions: old bulk `LoadRemotePreverified`, `registry.All`, `registry.ResetRaw`, `GetAllCurrentPreverified` - Renamed `LoadRemotePreverifiedForChain` → `LoadRemotePreverified` since it's now the only variant ## Test plan - [x] Built `erigon` and `downloader` binaries - [x] Tested `erigon seg reset --dry-run` with mainnet and hoodi ephemeral datadirs - [x] Tested `downloader verify_webseeds --chain=chiado --preverified=embedded` to completion - [x] Verified only the requested chain is fetched (confirmed via log output) ## Tasks - [ ] Cherry-pick merge commit to `release/3.4`
This was referenced May 12, 2026
Sahil-4555
pushed a commit
to Sahil-4555/erigon
that referenced
this pull request
May 29, 2026
… dep (erigontech#21197) Fixes erigontech#21154. Fixes erigontech#19732. Sub-task of erigontech#21047. ## Summary - Drop the `github.com/erigontech/erigon-snapshot` Go-module import. The embedded TOMLs it ships were loaded at startup, immediately overwritten by a runtime fetch, and discarded — they have been unused on every daemon path since erigontech#12415 made remote-fetch failure fatal. - Drop the `--preverified=embedded` flag value (a dev convenience from erigontech#18273); `remote` and `local` remain. - Clean up the now-vestigial registry pieces: remove the unused `preverifiedRegistry.Reset` method (dead since erigontech#19641 switched to per-chain loading), promote the immutable supported-chain set to a package-level `knownChains` var, and inline its only membership-check consumer in `SetToml`. Runtime fetch source (`raw.githubusercontent.com/erigontech/erigon-snapshot` + R2 mirror) and fail-fast behaviour are unchanged. Binary size: −2,973,120 bytes uncompressed (−2.0%) / −1,015,102 bytes gzipped (−1.6%) on `darwin/arm64`, measured by building before/after a stubbed-empty `erigon-snapshot`. ## Test plan - [x] `make lint && make erigon integration` clean - [x] `go test ./db/snapcfg/... ./db/downloader/downloadercfg/...` pass - [x] Manual: `--chain=hoodi` with both CDN hosts unreachable (`HTTPS_PROXY` pointed at a dead local port) exits non-zero with the same fail-fast `[CRIT] Snapshot hashes for supported networks was not loaded …` startup trace - [x] Manual: `--chain=hoodi` with normal network logs `Loading remote snapshot hashes chain=hoodi`, no `Failed to load` warning, no `[CRIT]`, and progresses into the downloader (segments begin downloading) - [x] Manual: `--chain=hoodi` against a fresh datadir prepopulated with a real `<datadir>/snapshots/preverified.toml` (fetched out-of-band) starts cleanly with HTTPS_PROXY pointed at a dead local port — no `Loading remote snapshot hashes` log line, no `[CRIT]`, downloader brings up — confirming the local-file path bypasses remote fetch - [x] Manual: `erigon snapshots reset --datadir=<dd> --preverified=embedded --dry-run` exits 1 with `Error: invalid preverified flag value "embedded"`; the `--help` output shows `(remote, local)`; `--preverified=remote` and `--preverified=local` continue to work
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First PR in a series to remove the need to read all chains' TOML files at startup. Erigon only uses one chain per execution, but the current code parses all 8 chains' snapshot hashes eagerly at init and fetches all 8 from GitHub/R2 when loading remote preverified hashes.
This PR does not remove the registry, but makes it effectively a registry of 1 element by:
LoadRemotePreverifiedForChain: fetch snapshot hashes for a single chain from GitHub/R2 instead of all chains; use it indownloadercfg.LoadSnapshotsHashescdnHeaders: move the canonical definition +InsertCloudflareHeaderstodb/snapcfg/cdn.go; remove copies fromdb/downloaderfetchChainTomlfrom erigon-snapshot into this repo (db/snapcfg/util.go) with TODOs marking the copies to remove upstreamChainTomlR2URLandChainTomlGitHubURLindb/snapcfg/cdn.gowith unit testsMeasured improvement (chiado, cold start)
Test plan
make lintcleanmake erigon integrationbuilds--chain=chiadostarts and loads snapshot hashes correctlygo test ./db/snapcfg/...passes with new URL builder tests