Skip to content

snapcfg, downloader: lazy-parse registry, per-chain remote loading#19641

Merged
AskAlexSharov merged 3 commits into
mainfrom
wmitsuda/do-not-read-all-chains-toml
Mar 5, 2026
Merged

snapcfg, downloader: lazy-parse registry, per-chain remote loading#19641
AskAlexSharov merged 3 commits into
mainfrom
wmitsuda/do-not-read-all-chains-toml

Conversation

@wmitsuda

@wmitsuda wmitsuda commented Mar 4, 2026

Copy link
Copy Markdown
Member

Summary

First PR in a series to remove the need to read all chains' TOML files at startup. Erigon only uses one chain per execution, but the current code parses all 8 chains' snapshot hashes eagerly at init and fetches all 8 from GitHub/R2 when loading remote preverified hashes.

This PR does not remove the registry, but makes it effectively a registry of 1 element by:

  • Lazy-parsing the preverified registry: store raw embedded TOML bytes and parse on demand instead of parsing all chains at init time
  • Adding LoadRemotePreverifiedForChain: fetch snapshot hashes for a single chain from GitHub/R2 instead of all chains; use it in downloadercfg.LoadSnapshotsHashes
  • Deduplicating cdnHeaders: move the canonical definition + InsertCloudflareHeaders to db/snapcfg/cdn.go; remove copies from db/downloader
  • Pulling fetchChainToml from erigon-snapshot into this repo (db/snapcfg/util.go) with TODOs marking the copies to remove upstream
  • Extracting URL builders: ChainTomlR2URL and ChainTomlGitHubURL in db/snapcfg/cdn.go with unit tests

Measured improvement (chiado, cold start)

main this branch
Snapshot hash loading ~2533ms ~587ms
Speedup ~4.3x faster

Test plan

  • make lint clean
  • make erigon integration builds
  • Ephemeral node with --chain=chiado starts and loads snapshot hashes correctly
  • go test ./db/snapcfg/... passes with new URL builder tests

wmitsuda added 2 commits March 4, 2026 19:32
…ry, per-chain remote loading

- Move cloudflareHeaders + InsertCloudflareHeaders to db/snapcfg/cloudflare.go
  as canonical location; remove duplicates from db/downloader.
- Make preverifiedRegistry lazy: store raw TOML bytes and parse on demand
  instead of parsing all chains at init.
- Add LoadRemotePreverifiedForChain to fetch snapshot hashes for a single
  chain instead of all chains.
- Use LoadRemotePreverifiedForChain in downloadercfg.LoadSnapshotsHashes.
Extract inline URL construction from fetchChainToml into
ChainTomlR2URL and ChainTomlGitHubURL in cdn.go (renamed from
cloudflare.go since it now covers more than just Cloudflare headers).
Add table-driven unit tests for both URL builders.
Check io.ReadAll error before the empty-response check so partial
reads with errors are not silently returned. Rewrite GetToml to
use the existing snapshotHashPtrs map instead of a duplicated switch.

@AskAlexSharov AskAlexSharov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice

@AskAlexSharov AskAlexSharov enabled auto-merge (squash) March 5, 2026 00:46
@AskAlexSharov AskAlexSharov merged commit f02c274 into main Mar 5, 2026
24 checks passed
@AskAlexSharov AskAlexSharov deleted the wmitsuda/do-not-read-all-chains-toml branch March 5, 2026 01:18
@wmitsuda

wmitsuda commented Mar 5, 2026

Copy link
Copy Markdown
Member Author

cc: @anacrolix hope you don't mind I want to progressively remove the snapshot registry struct

@anacrolix

Copy link
Copy Markdown
Contributor

@wmitsuda it's great. This has been on my to-do for ages. I'll follow up by checking the issues I was tracking for this.

wmitsuda added a commit that referenced this pull request Mar 8, 2026
…9722)

Follow-up to #19641 — step 2/N towards simplifying TOML reading at
startup.

## Summary

- **Lazy-parse webseed TOML**: instead of parsing all 8 chains' webseed
TOML at init time, store raw bytes in `EmbeddedWebseedsRaw` and parse on
demand via `GetEmbeddedWebseeds(chain)` — only the chain actually in use
gets parsed.
- **Remove no-op re-assignment**: `LoadRemotePreverified` was
redundantly re-building the same `KnownWebseeds` map; removed.
- **Inline `webseedsParse`**: folded into its sole caller
`GetEmbeddedWebseeds`.
- **Rename `KnownWebseeds` → `EmbeddedWebseeds`**: clearer naming —
`EmbeddedWebseedsRaw` for the raw bytes map, `GetEmbeddedWebseeds()` for
parsed access.

---

**TODO**: cherry-pick to `release/3.4` after merge.
wmitsuda added a commit that referenced this pull request Mar 8, 2026
…9722)

Follow-up to #19641 — step 2/N towards simplifying TOML reading at
startup.

## Summary

- **Lazy-parse webseed TOML**: instead of parsing all 8 chains' webseed
TOML at init time, store raw bytes in `EmbeddedWebseedsRaw` and parse on
demand via `GetEmbeddedWebseeds(chain)` — only the chain actually in use
gets parsed.
- **Remove no-op re-assignment**: `LoadRemotePreverified` was
redundantly re-building the same `KnownWebseeds` map; removed.
- **Inline `webseedsParse`**: folded into its sole caller
`GetEmbeddedWebseeds`.
- **Rename `KnownWebseeds` → `EmbeddedWebseeds`**: clearer naming —
`EmbeddedWebseedsRaw` for the raw bytes map, `GetEmbeddedWebseeds()` for
parsed access.

---

**TODO**: cherry-pick to `release/3.4` after merge.
wmitsuda added a commit that referenced this pull request Mar 11, 2026
## Summary
Part 3 of optimizing remote preverified hash loading (after #19641,
#19722).

- `LoadPreverified` now takes a `chainName` parameter and calls
`LoadRemotePreverified` instead of the old bulk variant that fetched all
10 chains
- Refactored `webseeds.Verify` to load preverified per-chain inside the
iteration loop instead of bulk-loading all chains upfront
- Removed unused functions: old bulk `LoadRemotePreverified`,
`registry.All`, `registry.ResetRaw`, `GetAllCurrentPreverified`
- Renamed `LoadRemotePreverifiedForChain` → `LoadRemotePreverified`
since it's now the only variant

## Test plan
- [x] Built `erigon` and `downloader` binaries
- [x] Tested `erigon seg reset --dry-run` with mainnet and hoodi
ephemeral datadirs
- [x] Tested `downloader verify_webseeds --chain=chiado
--preverified=embedded` to completion
- [x] Verified only the requested chain is fetched (confirmed via log
output)

## Tasks
- [ ] Cherry-pick merge commit to `release/3.4`
wmitsuda added a commit that referenced this pull request Mar 11, 2026
## Summary
Part 3 of optimizing remote preverified hash loading (after #19641,
#19722).

- `LoadPreverified` now takes a `chainName` parameter and calls
`LoadRemotePreverified` instead of the old bulk variant that fetched all
10 chains
- Refactored `webseeds.Verify` to load preverified per-chain inside the
iteration loop instead of bulk-loading all chains upfront
- Removed unused functions: old bulk `LoadRemotePreverified`,
`registry.All`, `registry.ResetRaw`, `GetAllCurrentPreverified`
- Renamed `LoadRemotePreverifiedForChain` → `LoadRemotePreverified`
since it's now the only variant

## Test plan
- [x] Built `erigon` and `downloader` binaries
- [x] Tested `erigon seg reset --dry-run` with mainnet and hoodi
ephemeral datadirs
- [x] Tested `downloader verify_webseeds --chain=chiado
--preverified=embedded` to completion
- [x] Verified only the requested chain is fetched (confirmed via log
output)

## Tasks
- [ ] Cherry-pick merge commit to `release/3.4`
Sahil-4555 pushed a commit to Sahil-4555/erigon that referenced this pull request May 29, 2026
… dep (erigontech#21197)

Fixes erigontech#21154. Fixes erigontech#19732. Sub-task of erigontech#21047.

## Summary
- Drop the `github.com/erigontech/erigon-snapshot` Go-module import. The
embedded TOMLs it ships were loaded at startup, immediately overwritten
by a runtime fetch, and discarded — they have been unused on every
daemon path since erigontech#12415 made remote-fetch failure fatal.
- Drop the `--preverified=embedded` flag value (a dev convenience from
erigontech#18273); `remote` and `local` remain.
- Clean up the now-vestigial registry pieces: remove the unused
`preverifiedRegistry.Reset` method (dead since erigontech#19641 switched to
per-chain loading), promote the immutable supported-chain set to a
package-level `knownChains` var, and inline its only membership-check
consumer in `SetToml`.

Runtime fetch source
(`raw.githubusercontent.com/erigontech/erigon-snapshot` + R2 mirror) and
fail-fast behaviour are unchanged. Binary size: −2,973,120 bytes
uncompressed (−2.0%) / −1,015,102 bytes gzipped (−1.6%) on
`darwin/arm64`, measured by building before/after a stubbed-empty
`erigon-snapshot`.

## Test plan
- [x] `make lint && make erigon integration` clean
- [x] `go test ./db/snapcfg/... ./db/downloader/downloadercfg/...` pass
- [x] Manual: `--chain=hoodi` with both CDN hosts unreachable
(`HTTPS_PROXY` pointed at a dead local port) exits non-zero with the
same fail-fast `[CRIT] Snapshot hashes for supported networks was not
loaded …` startup trace
- [x] Manual: `--chain=hoodi` with normal network logs `Loading remote
snapshot hashes chain=hoodi`, no `Failed to load` warning, no `[CRIT]`,
and progresses into the downloader (segments begin downloading)
- [x] Manual: `--chain=hoodi` against a fresh datadir prepopulated with
a real `<datadir>/snapshots/preverified.toml` (fetched out-of-band)
starts cleanly with HTTPS_PROXY pointed at a dead local port — no
`Loading remote snapshot hashes` log line, no `[CRIT]`, downloader
brings up — confirming the local-file path bypasses remote fetch
- [x] Manual: `erigon snapshots reset --datadir=<dd>
--preverified=embedded --dry-run` exits 1 with `Error: invalid
preverified flag value "embedded"`; the `--help` output shows `(remote,
local)`; `--preverified=remote` and `--preverified=local` continue to
work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants