Skip to content

Snap Sync Refactor: download storage slots plan #6170

@fedacking

Description

@fedacking

Plan: Refactor Storage Download to Use StorageTrieTracker

This is being done in branch perf/refactor-storage-download-snap

Context

The current AccountStorageRoots struct in snap sync tracks storage downloads per-account, with complex index-based referencing into accounts_by_root_hash. This makes the download loop in request_storage_ranges hard to follow: tasks reference accounts by index, results carry index ranges, and big-account promotion involves mutating intervals in the tracking struct.

The new StorageTrieTracker (already defined in sync.rs) groups storage tries by root hash from the start, separating small (single-request) from big (multi-request) tries. The refactor simplifies data ownership by moving trie data into tasks and back in results, eliminating clones and the index-based referencing.

Files to Modify

  1. crates/networking/p2p/sync.rs — Add healed_accounts field and methods to StorageTrieTracker
  2. crates/networking/p2p/snap/client.rs — New StorageTask/StorageTaskResult enums, refactor request_storage_ranges + worker
  3. crates/networking/p2p/sync/snap_sync.rs — Replace AccountStorageRoots with StorageTrieTracker throughout, update insert_accounts
  4. crates/networking/p2p/sync/healing/state.rs — Update parameter types, use tracker.handle_healed_account()
  5. crates/networking/p2p/sync/healing/storage.rs — Update get_initial_downloads() to read from tracker.healed_accounts

Step 1: Extend StorageTrieTracker (sync.rs)

Add two fields to StorageTrieTracker:

  • healed_accounts: HashSet<H256> — used by storage healing (get_initial_downloads) to know which accounts need storage trie node repair.
  • account_to_root: HashMap<H256, H256> — reverse lookup from account hash to its current storage root. Maintained by all mutating methods. Enables O(1) lookup of old root during healing.

Add methods:

  • insert_account(&mut self, account_hash: H256, storage_root: H256) — Inserts into small_tries, grouping by root. If root already exists (in small or big), appends to its accounts vec. Also inserts into account_to_root.

  • promote_to_big(&mut self, root: H256, first_slots: Vec<Slot>, intervals: Vec<Interval>) — Removes from small_tries, inserts into big_tries with the same accounts + provided slots/intervals.

  • take_small_batch(&mut self, batch_size: usize) -> Vec<(H256, SmallTrie)> — Drains up to batch_size entries from small_tries, returning owned data.

  • return_small_tries(&mut self, tries: Vec<(H256, SmallTrie)>) — Re-inserts failed small tries back into small_tries.

  • handle_healed_account(&mut self, account_hash: H256, old_root: H256, new_root: H256) — Called by state healing when an account's storage root changes. Rules:

    1. If old_root == new_root → do nothing.
    2. If old root was in small_tries (or not registered at all):
      • Remove the account from the old SmallTrie (if it exists; clean up entry if accounts list becomes empty).
      • If new_root is already registered (in small_tries or big_tries) → add account to that trie's accounts list.
      • If new_root is not registered → create a new SmallTrie with this account.
    3. If old root was in big_tries:
      • If the account is the only account in that BigTrie → re-key the entry: remove from big_tries[old_root], insert at big_tries[new_root] (keeping slots and intervals).
      • If there are other accounts in the BigTrie → remove the account, create a new BigTrie at new_root with cloned slots and intervals.
    4. Always add account_hash to healed_accounts. Update account_to_root to point to new_root.
  • drain_all_to_healed(&mut self) — Moves all accounts from both maps into healed_accounts and clears the maps. Used by the 5-attempt fallback in snap_sync.rs.

  • remaining_count(&self) -> usize — Returns small_tries.len() + big_tries.len().

Step 2: BigTrie::compute_intervals helper (sync.rs)

Extract the chunking logic from client.rs:763-876 into a method:

impl BigTrie {
    pub fn compute_intervals(
        last_downloaded_hash: H256,
        slot_count: usize,
        slots_per_chunk: usize, // default 10_000
    ) -> Vec<Interval>
}

Computes storage density from last_downloaded_hash / slot_count, derives chunk size, and produces a Vec<Interval> covering the remaining range up to HASH_MAX.

Step 3: New StorageTask enum (client.rs)

Replace the current StorageTask struct:

enum StorageTask {
    SmallBatch {
        /// Owned small tries moved from the tracker. Vec of (root, SmallTrie).
        tries: Vec<(H256, SmallTrie)>,
    },
    BigInterval {
        root: H256,
        /// Cloned from BigTrie (cheap: just H256 hashes)
        accounts: Vec<H256>,
        /// Moved out of BigTrie.intervals
        interval: Interval,
    },
}

Worker extracts account_hashes / storage_roots / start_hash / limit_hash from the task variant.

Step 4: New StorageTaskResult enum (client.rs)

Replace the current StorageTaskResult struct (no more Clone derive):

enum StorageTaskResult {
    /// Some small tries downloaded, some may remain.
    SmallComplete {
        completed: Vec<(H256, SmallTrie)>,  // slots populated
        remaining: Vec<(H256, SmallTrie)>,  // not downloaded, re-queue
        peer_id: H256,
    },
    /// Entire small batch failed (network/validation error).
    SmallFailed {
        tries: Vec<(H256, SmallTrie)>,  // returned unmodified
        peer_id: H256,
    },
    /// A small trie was discovered to actually be a big trie during download.
    /// The first slots were downloaded but the trie wasn't fully fetched.
    SmallPromotedToBig {
        completed: Vec<(H256, SmallTrie)>,  // small tries that completed before the big one
        remaining: Vec<(H256, SmallTrie)>,  // small tries not attempted, re-queue
        big_root: H256,                     // storage root of the promoted trie
        big_trie: SmallTrie,                // the trie with initial slots populated
        peer_id: H256,
    },
    /// A big trie interval was (partially) downloaded.
    BigIntervalResult {
        root: H256,
        accounts: Vec<H256>,
        slots: Vec<Slot>,
        /// None = interval fully downloaded. Some = remaining sub-interval.
        remaining_interval: Option<Interval>,
        peer_id: H256,
    },
}

Step 5: Refactor request_storage_ranges_worker (client.rs)

Split into two inner handlers based on task variant:

handle_small_batch:

  • Derives account_hashes (first account per trie) and storage_roots from tries
  • Sends GetStorageRanges with start=H256::zero(), limit=HASH_MAX
  • Validates each storage range per trie
  • If last trie has should_continue: sends SmallPromotedToBig with completed tries before it, the promoted trie (with initial slots), and remaining tries
  • Otherwise: splits tries into completed (slots filled) and remaining (not reached by response), sends SmallComplete

handle_big_interval:

  • Sends GetStorageRanges with single account hash, start=interval.start, limit=interval.end
  • If should_continue: computes remaining sub-interval from last slot hash to interval.end
  • Sends BigIntervalResult

Step 6: Refactor request_storage_ranges main loop (client.rs)

Signature change: account_storage_roots: &mut AccountStorageRoots becomes tracker: &mut StorageTrieTracker

Task creation: Replace the accounts_by_root_hash construction (lines 538-584) with:

  • Drain small tries via tracker.take_small_batch(STORAGE_BATCH_SIZE) into SmallBatch tasks
  • Create BigInterval tasks from tracker.big_tries (clone accounts, move intervals out)

Result processing: Match on result enum:

  • SmallComplete — write completed tries to current_account_storages for disk dump, re-queue remaining as new SmallBatch
  • SmallFailed — record peer failure, re-queue all tries
  • SmallPromotedToBig — write completed tries to disk, re-queue remaining as SmallBatch, call BigTrie::compute_intervals() on the promoted trie, call tracker.promote_to_big(), queue new BigInterval tasks for each interval, add promoted accounts to tracker.healed_accounts
  • BigIntervalResult — append slots to current_account_storages, re-queue remaining interval as new BigInterval if partial

Eliminated: accounts_by_root_hash, accounts_done HashMap, index-based referencing.

Step 7: Update insert_accounts (snap_sync.rs)

Non-rocksdb (line 821-826): Replace storage_accounts.accounts_with_storage_root.extend(...) with:

for (hash, state) in &account_states_snapshot {
    if state.storage_root != *EMPTY_TRIE_HASH {
        tracker.insert_account(*hash, state.storage_root);
    }
}

Rocksdb (line 979-983): Same pattern, call tracker.insert_account(...).

Step 8: Update snap_sync() flow (snap_sync.rs)

  • Line 289: let mut tracker = StorageTrieTracker::default();
  • Pass &mut tracker to insert_accounts, heal_state_trie_wrap, request_storage_ranges, heal_storage_trie
  • Lines 381-403 fallback: call tracker.drain_all_to_healed()
  • Progress logging: use tracker.remaining_count() and tracker.healed_accounts.len()

Step 9: Update state healing (healing/state.rs)

  • Change parameter from storage_accounts: &mut AccountStorageRoots to tracker: &mut StorageTrieTracker
  • Lines 172-178: When a leaf is healed, look up the old root via tracker.account_to_root.get(&account_hash) (O(1)), get the new root from the decoded AccountState.storage_root, and call tracker.handle_healed_account(account_hash, old_root, new_root). If the account is not in account_to_root, old_root is treated as "not registered" (rule 2 applies).

Step 10: Update storage healing (healing/storage.rs)

  • Change parameter from storage_accounts: &AccountStorageRoots to tracker: &StorageTrieTracker
  • get_initial_downloads(): read from tracker.healed_accounts instead of account_paths.healed_accounts (same logic, different field access)

Step 11: Remove AccountStorageRoots and clean up imports

  • Remove AccountStorageRoots struct from sync.rs
  • Update all imports to use StorageTrieTracker, SmallTrie, BigTrie, Slot, Interval

Verification

  1. cargo check -p ethrex-p2p — compiles cleanly
  2. cargo test -p ethrex-p2p — existing tests pass
  3. cargo clippy -p ethrex-p2p — no warnings
  4. Full snap sync test against a testnet peer (manual verification that accounts and storages download correctly)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions