Skip to content

perf(l1): download bytecodes concurrently with storage during snap sync#6205

Open
ilitteri wants to merge 2 commits into
mainfrom
snap-sync-concurrent-bytecodes
Open

perf(l1): download bytecodes concurrently with storage during snap sync#6205
ilitteri wants to merge 2 commits into
mainfrom
snap-sync-concurrent-bytecodes

Conversation

@ilitteri

Copy link
Copy Markdown
Collaborator

Summary

  • Download bytecodes concurrently with storage ranges/insertion instead of sequentially after healing, eliminating the bytecodes phase from the critical path
  • Skip already-stored bytecodes, start download earlier (during account trie building), fix metrics conflict during concurrent run, and add progress logging

Description

During snap sync, bytecodes are content-addressed (hash = key), so they're safe to download regardless of pivot staleness. Previously, bytecodes were downloaded sequentially after all healing completed (~6 min on mainnet). This PR spawns a background task that starts downloading bytecodes during account insertion and runs concurrently with storage download/insertion.

Changes:

  • snap_sync.rs: Spawn background bytecode download task using oneshot channel to receive code hashes from insert_accounts. Extract download_bytecodes() and read_code_hashes_from_dir() helpers. Handle healing-discovered bytecodes separately.
  • code_collector.rs: Add flush_all() (non-consuming flush for mid-insertion use) and dir() accessor.
  • client.rs: Add update_metrics parameter to request_bytecodes() to prevent overwriting current_step during concurrent download.

Benchmark (Ethereum mainnet, ethrex-mainnet-4):

Phase With PR Baseline Delta
Headers 15:30 15:50 -0:20
Account Insertion 13:20 13:40 -0:20
Storage (DL + Insert) 53:00 53:00 0:00
Healing 2:20 2:20 0:00
Bytecodes (sequential) 0:00 6:23 -6:23
Total 1:29:47 1:32:48 -3:01 (-3.3%)

Bytecodes (2.1M hashes, 8 min download) finished 46 minutes before storage insertion ended.

How to Test

  1. Run ethrex with snap sync on mainnet or Hoodi with a consensus client
  2. Verify "Background bytecode download starting for N unique code hashes" appears during account insertion
  3. Verify "Bytecode progress: X/Y downloaded" messages appear during storage phases
  4. Verify "Background bytecode download complete" appears after healing
  5. Verify no sequential bytecodes phase at the end

…after healing

Instead of downloading bytecodes as a sequential phase after healing (15 min
on mainnet), spawn the download right after account insertion. Bytecodes are
content-addressed (hash=key) so they're safe to download while storage
download, insertion, and healing proceed in parallel.

Any additional bytecodes discovered during healing are downloaded separately
after healing completes.

Both geth and nethermind download bytecodes concurrently with storage.
fix metrics conflict, and add progress logging

- Filter out already-stored bytecodes before downloading to avoid
  redundant work on retries or when healing already fetched codes.
- Start background bytecode download during account trie building
  (pass 2) instead of after all insertion completes, using a Notify
  to coordinate between the code hash collection pass and the trie
  building pass in insert_accounts.
- Add update_metrics parameter to request_bytecodes so the background
  download doesn't overwrite current_step during concurrent storage
  operations.
- Add cumulative progress logging to download_bytecodes showing
  downloaded/total with percentage after each chunk.
@ilitteri ilitteri requested a review from a team as a code owner February 13, 2026 20:53
Copilot AI review requested due to automatic review settings February 13, 2026 20:53
@greptile-apps

greptile-apps Bot commented Feb 13, 2026

Copy link
Copy Markdown

Greptile Overview

Greptile Summary

Moves bytecode download from sequential execution after healing to concurrent execution during account insertion, reducing snap sync time by ~3 minutes (-3.3%). The implementation spawns a background task that receives code hashes via oneshot channel after the account collection pass, then downloads bytecodes while trie building proceeds. This is safe because bytecodes are content-addressed.

Key changes:

  • Spawns background task in snap_sync() that downloads bytecodes concurrently with storage download/insertion
  • Adds flush_all() to CodeHashCollector for non-consuming flush between account collection and trie building passes
  • Extracts download_bytecodes() and read_code_hashes_from_dir() helper functions
  • Adds update_metrics parameter to request_bytecodes() to avoid overwriting metrics during concurrent run
  • Filters already-stored bytecodes to support retries after partial downloads

Architecture:
The implementation creates a new collector after the first bytecode batch to handle healing-discovered bytecodes separately, then awaits the background download task before completion.

Confidence Score: 4/5

  • This PR is safe to merge with low risk, implementing a performance optimization using well-understood concurrency patterns
  • The implementation is solid with proper error handling and synchronization. The concurrent bytecode download is safe because bytecodes are content-addressed. Minor deduction for the complexity of the two-collector pattern and potential edge cases around cleanup/restart scenarios, but these are well-handled in the code.
  • No files require special attention - all changes are straightforward

Important Files Changed

Filename Overview
crates/networking/p2p/snap/client.rs Added update_metrics parameter to prevent metric conflicts during concurrent bytecode downloads
crates/networking/p2p/sync/code_collector.rs Added flush_all() method and dir() accessor to support mid-insertion flushing for concurrent bytecode download
crates/networking/p2p/sync/snap_sync.rs Refactored to spawn background bytecode download task during account insertion, with new helper functions for reading and downloading bytecodes

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant BG as Background Task
    participant Peers as Peer Network
    participant Store as Storage

    Main->>Main: Download account ranges
    Main->>Main: Start account insertion
    Main->>Main: Collect code hashes (pass 1)
    Main->>Main: flush_all() code hashes
    Main->>Main: read_code_hashes_from_dir()
    Main->>BG: Send code hashes via channel
    Note over BG: Background task starts
    par Concurrent Operations
        BG->>Peers: Request bytecodes (chunks)
        Peers-->>BG: Return bytecodes
        BG->>Store: Write bytecodes
        Note over BG: Continues downloading
    and
        Main->>Main: Build account trie (pass 2)
        Main->>Main: Download storage ranges
        Main->>Main: Insert storage tries
        Main->>Main: Heal state & storage tries
    end
    Main->>Main: Finish healing, new collector
    Main->>Main: Download healing bytecodes
    Main->>BG: Await background task
    BG-->>Main: Complete
    Main->>Main: Snap sync complete
Loading

Last reviewed commit: 6582a57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes snap sync by downloading bytecodes concurrently with storage ranges instead of sequentially after healing, reducing total sync time by approximately 3 minutes (~3.3%) on Ethereum mainnet. Since bytecodes are content-addressed (hash=key), they can be safely downloaded regardless of pivot staleness, making concurrent download safe.

Changes:

  • Spawns a background task to download bytecodes during account insertion that runs concurrently with storage download/insertion
  • Adds ability to flush code hash collector without consuming it, enabling code hash transmission to background task mid-insertion
  • Adds update_metrics parameter to request_bytecodes() to prevent metric conflicts during concurrent execution

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
crates/networking/p2p/sync/snap_sync.rs Spawns background bytecode download task, adds helper functions for reading/downloading bytecodes, modifies insert_accounts to send code hashes via oneshot channel, handles healing-discovered bytecodes separately
crates/networking/p2p/sync/code_collector.rs Adds flush_all() for non-consuming flush and dir() accessor to support code hash transmission
crates/networking/p2p/snap/client.rs Adds update_metrics parameter to request_bytecodes() to conditionally update metrics, preventing concurrent overwrites of current_step

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +325 to +326
// Wait for code hashes to be sent from insert_accounts (between passes)
let code_hashes = code_hashes_rx.await.map_err(|_| SyncError::BytecodesNotFound)?;

Copilot AI Feb 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error mapping here is misleading. A oneshot receiver error (RecvError) means the sender was dropped without sending a value, which indicates a programming error or unexpected panic in insert_accounts, not that bytecodes weren't found from peers. Consider creating a more specific error variant like SyncError::CodeHashChannelClosed or SyncError::InternalChannelError to better represent this failure mode.

Suggested change
// Wait for code hashes to be sent from insert_accounts (between passes)
let code_hashes = code_hashes_rx.await.map_err(|_| SyncError::BytecodesNotFound)?;
// Wait for code hashes to be sent from insert_accounts (between passes).
// If the sender is dropped without sending, this indicates a logic error or panic
// in the producer, so we treat it as a hard failure rather than a "bytecodes
// not found from peers" error.
let code_hashes = code_hashes_rx
.await
.expect("code_hashes sender dropped before sending; this indicates a bug or panic in insert_accounts");

Copilot uses AI. Check for mistakes.
if let Some(tx) = code_hashes_tx {
code_hash_collector.flush_all().await?;
let code_hashes = read_code_hashes_from_dir(code_hash_collector.dir())?;
let _ = tx.send(code_hashes);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let _ = tx.send(code_hashes) silently drops a send failure. If the background task panicked or its receiver was dropped, the code hashes are lost — and since the snapshot dir is cleaned up and recreated immediately after insert_accounts returns (line 368), they can't be re-read. Bytecodes would silently never be downloaded for this sync attempt.

Suggestion: at minimum, log a warning if the send fails. Better yet, propagate the error:

tx.send(code_hashes).map_err(|_| SyncError::BytecodesNotFound)?;

(Same issue in the non-rocksdb path at line 860.)

// Filter out already-stored bytecodes to avoid re-downloading on retry
let code_hashes: Vec<H256> = code_hashes
.into_iter()
.filter(|hash| !store.get_account_code(*hash).is_ok_and(|code| code.is_some()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter calls store.get_account_code(*hash) individually for each of the ~2.1M code hashes. Each call acquires a Mutex lock on account_code_cache and potentially does a RocksDB read — that's a lot of synchronous blocking I/O in an async context.

For the common case (fresh sync, no codes stored yet), every single check will miss the cache and hit the DB, only to return None. Consider short-circuiting: skip the filter entirely when is_fresh_sync or similar flag is set, and only enable it for retry scenarios where some codes may already exist.

let bytecode_store = store.clone();
tokio::spawn(async move {
// Wait for code hashes to be sent from insert_accounts (between passes)
let code_hashes = code_hashes_rx.await.map_err(|_| SyncError::BytecodesNotFound)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the oneshot sender is dropped (e.g., insert_accounts returns early with an error before reaching the tx.send), this converts the RecvError to SyncError::BytecodesNotFound, which is misleading — the bytecodes weren't "not found", the code hash channel was closed. This could confuse debugging since the real error would be the one from insert_accounts, but the background task's error might surface first (or be logged) as a confusing "BytecodesNotFound".

nit: Consider a more descriptive error variant, e.g. SyncError::BytecodeChannelClosed.

ElFantasma added a commit that referenced this pull request Apr 15, 2026
- Add §1.18 observability tooling (PR #6470)
- Add §1.19 pivot update reliability (PR #6475, issue #6474)
- Add §1.20 big-account within-trie parallelization (issue #6477)
- Add §1.21 small-account batching (issue #6476)
- Add §1.22 decoded TrieLayerCache (PR #6348)
- Add §1.23 bloom filter for non-existent storage (PR #6288)
- Add §1.24 adaptive request sizing + bisection (PR #6181)
- Add §1.25 concurrent bytecode + storage (PR #6205)
- Add §1.26 phase completion markers (PR #6189)
- Add §2.18 StorageTrieTracker refactor (PR #6171)
- Update current-state bottleneck table with small-account and pivot-update findings
- Reprioritize timeline: pivot-update crash fix is now priority 0
- Add two risks (pivot crash masks perf work, DB corruption on every crash)
- Bump doc version to 1.3
@ElFantasma ElFantasma mentioned this pull request Apr 15, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants