Plan: Refactor Storage Download to Use StorageTrieTracker
This is being done in branch perf/refactor-storage-download-snap
Context
The current AccountStorageRoots struct in snap sync tracks storage downloads per-account, with complex index-based referencing into accounts_by_root_hash. This makes the download loop in request_storage_ranges hard to follow: tasks reference accounts by index, results carry index ranges, and big-account promotion involves mutating intervals in the tracking struct.
The new StorageTrieTracker (already defined in sync.rs) groups storage tries by root hash from the start, separating small (single-request) from big (multi-request) tries. The refactor simplifies data ownership by moving trie data into tasks and back in results, eliminating clones and the index-based referencing.
Files to Modify
crates/networking/p2p/sync.rs — Add healed_accounts field and methods to StorageTrieTracker
crates/networking/p2p/snap/client.rs — New StorageTask/StorageTaskResult enums, refactor request_storage_ranges + worker
crates/networking/p2p/sync/snap_sync.rs — Replace AccountStorageRoots with StorageTrieTracker throughout, update insert_accounts
crates/networking/p2p/sync/healing/state.rs — Update parameter types, use tracker.handle_healed_account()
crates/networking/p2p/sync/healing/storage.rs — Update get_initial_downloads() to read from tracker.healed_accounts
Step 1: Extend StorageTrieTracker (sync.rs)
Add two fields to StorageTrieTracker:
healed_accounts: HashSet<H256> — used by storage healing (get_initial_downloads) to know which accounts need storage trie node repair.
account_to_root: HashMap<H256, H256> — reverse lookup from account hash to its current storage root. Maintained by all mutating methods. Enables O(1) lookup of old root during healing.
Add methods:
-
insert_account(&mut self, account_hash: H256, storage_root: H256) — Inserts into small_tries, grouping by root. If root already exists (in small or big), appends to its accounts vec. Also inserts into account_to_root.
-
promote_to_big(&mut self, root: H256, first_slots: Vec<Slot>, intervals: Vec<Interval>) — Removes from small_tries, inserts into big_tries with the same accounts + provided slots/intervals.
-
take_small_batch(&mut self, batch_size: usize) -> Vec<(H256, SmallTrie)> — Drains up to batch_size entries from small_tries, returning owned data.
-
return_small_tries(&mut self, tries: Vec<(H256, SmallTrie)>) — Re-inserts failed small tries back into small_tries.
-
handle_healed_account(&mut self, account_hash: H256, old_root: H256, new_root: H256) — Called by state healing when an account's storage root changes. Rules:
- If
old_root == new_root → do nothing.
- If old root was in
small_tries (or not registered at all):
- Remove the account from the old SmallTrie (if it exists; clean up entry if accounts list becomes empty).
- If
new_root is already registered (in small_tries or big_tries) → add account to that trie's accounts list.
- If
new_root is not registered → create a new SmallTrie with this account.
- If old root was in
big_tries:
- If the account is the only account in that BigTrie → re-key the entry: remove from
big_tries[old_root], insert at big_tries[new_root] (keeping slots and intervals).
- If there are other accounts in the BigTrie → remove the account, create a new BigTrie at
new_root with cloned slots and intervals.
- Always add
account_hash to healed_accounts. Update account_to_root to point to new_root.
-
drain_all_to_healed(&mut self) — Moves all accounts from both maps into healed_accounts and clears the maps. Used by the 5-attempt fallback in snap_sync.rs.
-
remaining_count(&self) -> usize — Returns small_tries.len() + big_tries.len().
Step 2: BigTrie::compute_intervals helper (sync.rs)
Extract the chunking logic from client.rs:763-876 into a method:
impl BigTrie {
pub fn compute_intervals(
last_downloaded_hash: H256,
slot_count: usize,
slots_per_chunk: usize, // default 10_000
) -> Vec<Interval>
}
Computes storage density from last_downloaded_hash / slot_count, derives chunk size, and produces a Vec<Interval> covering the remaining range up to HASH_MAX.
Step 3: New StorageTask enum (client.rs)
Replace the current StorageTask struct:
enum StorageTask {
SmallBatch {
/// Owned small tries moved from the tracker. Vec of (root, SmallTrie).
tries: Vec<(H256, SmallTrie)>,
},
BigInterval {
root: H256,
/// Cloned from BigTrie (cheap: just H256 hashes)
accounts: Vec<H256>,
/// Moved out of BigTrie.intervals
interval: Interval,
},
}
Worker extracts account_hashes / storage_roots / start_hash / limit_hash from the task variant.
Step 4: New StorageTaskResult enum (client.rs)
Replace the current StorageTaskResult struct (no more Clone derive):
enum StorageTaskResult {
/// Some small tries downloaded, some may remain.
SmallComplete {
completed: Vec<(H256, SmallTrie)>, // slots populated
remaining: Vec<(H256, SmallTrie)>, // not downloaded, re-queue
peer_id: H256,
},
/// Entire small batch failed (network/validation error).
SmallFailed {
tries: Vec<(H256, SmallTrie)>, // returned unmodified
peer_id: H256,
},
/// A small trie was discovered to actually be a big trie during download.
/// The first slots were downloaded but the trie wasn't fully fetched.
SmallPromotedToBig {
completed: Vec<(H256, SmallTrie)>, // small tries that completed before the big one
remaining: Vec<(H256, SmallTrie)>, // small tries not attempted, re-queue
big_root: H256, // storage root of the promoted trie
big_trie: SmallTrie, // the trie with initial slots populated
peer_id: H256,
},
/// A big trie interval was (partially) downloaded.
BigIntervalResult {
root: H256,
accounts: Vec<H256>,
slots: Vec<Slot>,
/// None = interval fully downloaded. Some = remaining sub-interval.
remaining_interval: Option<Interval>,
peer_id: H256,
},
}
Step 5: Refactor request_storage_ranges_worker (client.rs)
Split into two inner handlers based on task variant:
handle_small_batch:
- Derives
account_hashes (first account per trie) and storage_roots from tries
- Sends
GetStorageRanges with start=H256::zero(), limit=HASH_MAX
- Validates each storage range per trie
- If last trie has
should_continue: sends SmallPromotedToBig with completed tries before it, the promoted trie (with initial slots), and remaining tries
- Otherwise: splits tries into
completed (slots filled) and remaining (not reached by response), sends SmallComplete
handle_big_interval:
- Sends
GetStorageRanges with single account hash, start=interval.start, limit=interval.end
- If
should_continue: computes remaining sub-interval from last slot hash to interval.end
- Sends
BigIntervalResult
Step 6: Refactor request_storage_ranges main loop (client.rs)
Signature change: account_storage_roots: &mut AccountStorageRoots becomes tracker: &mut StorageTrieTracker
Task creation: Replace the accounts_by_root_hash construction (lines 538-584) with:
- Drain small tries via
tracker.take_small_batch(STORAGE_BATCH_SIZE) into SmallBatch tasks
- Create
BigInterval tasks from tracker.big_tries (clone accounts, move intervals out)
Result processing: Match on result enum:
SmallComplete — write completed tries to current_account_storages for disk dump, re-queue remaining as new SmallBatch
SmallFailed — record peer failure, re-queue all tries
SmallPromotedToBig — write completed tries to disk, re-queue remaining as SmallBatch, call BigTrie::compute_intervals() on the promoted trie, call tracker.promote_to_big(), queue new BigInterval tasks for each interval, add promoted accounts to tracker.healed_accounts
BigIntervalResult — append slots to current_account_storages, re-queue remaining interval as new BigInterval if partial
Eliminated: accounts_by_root_hash, accounts_done HashMap, index-based referencing.
Step 7: Update insert_accounts (snap_sync.rs)
Non-rocksdb (line 821-826): Replace storage_accounts.accounts_with_storage_root.extend(...) with:
for (hash, state) in &account_states_snapshot {
if state.storage_root != *EMPTY_TRIE_HASH {
tracker.insert_account(*hash, state.storage_root);
}
}
Rocksdb (line 979-983): Same pattern, call tracker.insert_account(...).
Step 8: Update snap_sync() flow (snap_sync.rs)
- Line 289:
let mut tracker = StorageTrieTracker::default();
- Pass
&mut tracker to insert_accounts, heal_state_trie_wrap, request_storage_ranges, heal_storage_trie
- Lines 381-403 fallback: call
tracker.drain_all_to_healed()
- Progress logging: use
tracker.remaining_count() and tracker.healed_accounts.len()
Step 9: Update state healing (healing/state.rs)
- Change parameter from
storage_accounts: &mut AccountStorageRoots to tracker: &mut StorageTrieTracker
- Lines 172-178: When a leaf is healed, look up the old root via
tracker.account_to_root.get(&account_hash) (O(1)), get the new root from the decoded AccountState.storage_root, and call tracker.handle_healed_account(account_hash, old_root, new_root). If the account is not in account_to_root, old_root is treated as "not registered" (rule 2 applies).
Step 10: Update storage healing (healing/storage.rs)
- Change parameter from
storage_accounts: &AccountStorageRoots to tracker: &StorageTrieTracker
get_initial_downloads(): read from tracker.healed_accounts instead of account_paths.healed_accounts (same logic, different field access)
Step 11: Remove AccountStorageRoots and clean up imports
- Remove
AccountStorageRoots struct from sync.rs
- Update all imports to use
StorageTrieTracker, SmallTrie, BigTrie, Slot, Interval
Verification
cargo check -p ethrex-p2p — compiles cleanly
cargo test -p ethrex-p2p — existing tests pass
cargo clippy -p ethrex-p2p — no warnings
- Full snap sync test against a testnet peer (manual verification that accounts and storages download correctly)
Plan: Refactor Storage Download to Use
StorageTrieTrackerThis is being done in branch
perf/refactor-storage-download-snapContext
The current
AccountStorageRootsstruct in snap sync tracks storage downloads per-account, with complex index-based referencing intoaccounts_by_root_hash. This makes the download loop inrequest_storage_rangeshard to follow: tasks reference accounts by index, results carry index ranges, and big-account promotion involves mutating intervals in the tracking struct.The new
StorageTrieTracker(already defined insync.rs) groups storage tries by root hash from the start, separating small (single-request) from big (multi-request) tries. The refactor simplifies data ownership by moving trie data into tasks and back in results, eliminating clones and the index-based referencing.Files to Modify
crates/networking/p2p/sync.rs— Addhealed_accountsfield and methods toStorageTrieTrackercrates/networking/p2p/snap/client.rs— NewStorageTask/StorageTaskResultenums, refactorrequest_storage_ranges+ workercrates/networking/p2p/sync/snap_sync.rs— ReplaceAccountStorageRootswithStorageTrieTrackerthroughout, updateinsert_accountscrates/networking/p2p/sync/healing/state.rs— Update parameter types, usetracker.handle_healed_account()crates/networking/p2p/sync/healing/storage.rs— Updateget_initial_downloads()to read fromtracker.healed_accountsStep 1: Extend
StorageTrieTracker(sync.rs)Add two fields to
StorageTrieTracker:healed_accounts: HashSet<H256>— used by storage healing (get_initial_downloads) to know which accounts need storage trie node repair.account_to_root: HashMap<H256, H256>— reverse lookup from account hash to its current storage root. Maintained by all mutating methods. Enables O(1) lookup of old root during healing.Add methods:
insert_account(&mut self, account_hash: H256, storage_root: H256)— Inserts intosmall_tries, grouping by root. If root already exists (in small or big), appends to its accounts vec. Also inserts intoaccount_to_root.promote_to_big(&mut self, root: H256, first_slots: Vec<Slot>, intervals: Vec<Interval>)— Removes fromsmall_tries, inserts intobig_trieswith the same accounts + provided slots/intervals.take_small_batch(&mut self, batch_size: usize) -> Vec<(H256, SmallTrie)>— Drains up tobatch_sizeentries fromsmall_tries, returning owned data.return_small_tries(&mut self, tries: Vec<(H256, SmallTrie)>)— Re-inserts failed small tries back intosmall_tries.handle_healed_account(&mut self, account_hash: H256, old_root: H256, new_root: H256)— Called by state healing when an account's storage root changes. Rules:old_root == new_root→ do nothing.small_tries(or not registered at all):new_rootis already registered (insmall_triesorbig_tries) → add account to that trie's accounts list.new_rootis not registered → create a newSmallTriewith this account.big_tries:big_tries[old_root], insert atbig_tries[new_root](keeping slots and intervals).new_rootwith cloned slots and intervals.account_hashtohealed_accounts. Updateaccount_to_rootto point tonew_root.drain_all_to_healed(&mut self)— Moves all accounts from both maps intohealed_accountsand clears the maps. Used by the 5-attempt fallback in snap_sync.rs.remaining_count(&self) -> usize— Returnssmall_tries.len() + big_tries.len().Step 2:
BigTrie::compute_intervalshelper (sync.rs)Extract the chunking logic from client.rs:763-876 into a method:
Computes storage density from
last_downloaded_hash / slot_count, derives chunk size, and produces aVec<Interval>covering the remaining range up toHASH_MAX.Step 3: New
StorageTaskenum (client.rs)Replace the current
StorageTaskstruct:Worker extracts
account_hashes/storage_roots/start_hash/limit_hashfrom the task variant.Step 4: New
StorageTaskResultenum (client.rs)Replace the current
StorageTaskResultstruct (no moreClonederive):Step 5: Refactor
request_storage_ranges_worker(client.rs)Split into two inner handlers based on task variant:
handle_small_batch:account_hashes(first account per trie) andstorage_rootsfromtriesGetStorageRangeswithstart=H256::zero(),limit=HASH_MAXshould_continue: sendsSmallPromotedToBigwith completed tries before it, the promoted trie (with initial slots), and remaining triescompleted(slots filled) andremaining(not reached by response), sendsSmallCompletehandle_big_interval:GetStorageRangeswith single account hash,start=interval.start,limit=interval.endshould_continue: computes remaining sub-interval from last slot hash tointerval.endBigIntervalResultStep 6: Refactor
request_storage_rangesmain loop (client.rs)Signature change:
account_storage_roots: &mut AccountStorageRootsbecomestracker: &mut StorageTrieTrackerTask creation: Replace the
accounts_by_root_hashconstruction (lines 538-584) with:tracker.take_small_batch(STORAGE_BATCH_SIZE)intoSmallBatchtasksBigIntervaltasks fromtracker.big_tries(clone accounts, move intervals out)Result processing: Match on result enum:
SmallComplete— write completed tries tocurrent_account_storagesfor disk dump, re-queue remaining as newSmallBatchSmallFailed— record peer failure, re-queue all triesSmallPromotedToBig— write completed tries to disk, re-queue remaining asSmallBatch, callBigTrie::compute_intervals()on the promoted trie, calltracker.promote_to_big(), queue newBigIntervaltasks for each interval, add promoted accounts totracker.healed_accountsBigIntervalResult— append slots tocurrent_account_storages, re-queue remaining interval as newBigIntervalif partialEliminated:
accounts_by_root_hash,accounts_doneHashMap, index-based referencing.Step 7: Update
insert_accounts(snap_sync.rs)Non-rocksdb (line 821-826): Replace
storage_accounts.accounts_with_storage_root.extend(...)with:Rocksdb (line 979-983): Same pattern, call
tracker.insert_account(...).Step 8: Update
snap_sync()flow (snap_sync.rs)let mut tracker = StorageTrieTracker::default();&mut trackertoinsert_accounts,heal_state_trie_wrap,request_storage_ranges,heal_storage_trietracker.drain_all_to_healed()tracker.remaining_count()andtracker.healed_accounts.len()Step 9: Update state healing (healing/state.rs)
storage_accounts: &mut AccountStorageRootstotracker: &mut StorageTrieTrackertracker.account_to_root.get(&account_hash)(O(1)), get the new root from the decodedAccountState.storage_root, and calltracker.handle_healed_account(account_hash, old_root, new_root). If the account is not inaccount_to_root, old_root is treated as "not registered" (rule 2 applies).Step 10: Update storage healing (healing/storage.rs)
storage_accounts: &AccountStorageRootstotracker: &StorageTrieTrackerget_initial_downloads(): read fromtracker.healed_accountsinstead ofaccount_paths.healed_accounts(same logic, different field access)Step 11: Remove
AccountStorageRootsand clean up importsAccountStorageRootsstruct fromsync.rsStorageTrieTracker,SmallTrie,BigTrie,Slot,IntervalVerification
cargo check -p ethrex-p2p— compiles cleanlycargo test -p ethrex-p2p— existing tests passcargo clippy -p ethrex-p2p— no warnings