Skip to content

Fix: [state-prune snapshot] Switch traversal worklist from BFS (VecDeque) to DFS (Vec) to reduce peak memory#3887

Merged
jolestar merged 3 commits into
mainfrom
holon/fix-3872-20251229-080248
Jan 2, 2026
Merged

Fix: [state-prune snapshot] Switch traversal worklist from BFS (VecDeque) to DFS (Vec) to reduce peak memory#3887
jolestar merged 3 commits into
mainfrom
holon/fix-3872-20251229-080248

Conversation

@holonbot

@holonbot holonbot Bot commented Dec 29, 2025

Copy link
Copy Markdown
Contributor

Fixes #3872

Summary: Switch from BFS (VecDeque) to DFS (Vec) for State Prune Snapshot Traversal

Issue Reference

GitHub Issue: #3872
Parent Issue: #3858

Objective

Reduce peak memory usage in SnapshotBuilder traversal by switching from BFS (using VecDeque) to DFS (using Vec stack). This bounds the worklist size closer to O(depth) rather than O(width) for large and/or wide SMT graphs.

Changes Made

File Modified: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

1. Removed VecDeque Import (Line 15)

  • Before: use std::collections::VecDeque;
  • After: Removed (no longer needed)

2. Updated save_progress Method Signature (Line 314)

  • Before: worklist: &VecDeque<H256>
  • After: worklist: &Vec<H256>
  • Also simplified the worklist cloning from worklist.iter().cloned().collect() to worklist.clone()

3. Updated stream_traverse_and_write Method (Lines 353-444)

Resume Logic (Lines 359-367):

  • Before: Used VecDeque::from(progress.worklist) with pop_front() in a loop to skip processed nodes
  • After: Directly uses progress.worklist with split_off() to remove processed nodes from the front

Fresh Start (Lines 388-391):

  • Before: VecDeque::new() with push_back(state_root)
  • After: Vec::new() with push(state_root)

Main Traversal Loop (Line 409):

  • Before: while let Some(current_hash) = nodes_to_process.pop_front()
  • After: while let Some(current_hash) = nodes_to_process.pop()

Child Node Processing (Lines 441-444):

  • Before: nodes_to_process.push_back(child_hash)
  • After: nodes_to_process.push(child_hash) with updated comment "DFS: push to stack"

Semantic Correctness

The change from BFS to DFS maintains snapshot completeness because:

  1. Both traversal algorithms visit all reachable nodes in the graph
  2. The deduplication logic (snapshot_writer.contains_node()) ensures each node is written only once
  3. The batch buffer and RocksDB writer behavior remain unchanged

Memory Improvement

  • BFS (VecDeque): Queue size can grow to O(width) of the tree, potentially O(n) for very wide trees
  • DFS (Vec): Stack size is bounded by O(depth) of the tree. SMT depth is typically small and bounded by design

Acceptance Criteria Met

  • Traversal uses DFS stack (Vec) instead of VecDeque queue
  • No VecDeque allocations or operations remain
  • Snapshot semantics unchanged (still visits all reachable nodes via DFS)
  • Dedup strategy and writer behavior unchanged
  • No non-ASCII comments introduced (verified by grep)

Testing Notes

Due to the Holon sandbox environment lacking the Rust toolchain in PATH, the following verifications could not be completed:

  • make lint - Rust formatting check (cargo not available)
  • cargo test --package rooch-pruner - Unit tests (cargo not available)

Recommendation: When the PR is created, ensure CI checks pass which will verify:

  1. Code compiles without errors
  2. All existing tests pass
  3. No non-ASCII comments are present

Technical Notes

  1. Progress Resumption: The resume logic was adapted because Vec doesn't have a pop_front() method. Instead, we use split_off() to remove already-processed nodes from the front of the vector when resuming.

  2. Traversal Order: While the traversal order changes (BFS vs DFS), this does not affect correctness because:

    • All reachable nodes are still visited
    • Deduplication prevents duplicate writes
    • The final snapshot contains the same set of nodes
  3. Comment Update: Added "DFS: push to stack" comment to clarify the new traversal behavior.

@vercel

vercel Bot commented Dec 29, 2025

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
rooch-portal-v2.1 Ready Ready Preview, Comment Jan 1, 2026 2:35pm
test-portal Ready Ready Preview, Comment Jan 1, 2026 2:35pm
1 Skipped Deployment
Project Deployment Review Updated (UTC)
rooch Ignored Ignored Preview Jan 1, 2026 2:35pm

@github-actions

github-actions Bot commented Dec 29, 2025

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request converts the state pruner snapshot builder's traversal algorithm from BFS (using VecDeque) to DFS (using Vec as a stack) to reduce peak memory usage. The change targets the worklist data structure that tracks nodes to be processed during SMT traversal.

Key Changes:

  • Removed VecDeque import and replaced all queue operations with stack operations
  • Updated save_progress method signature to accept &Vec<H256> instead of &VecDeque<H256>
  • Modified traversal loop to use pop() instead of pop_front() and push() instead of push_back()

Comment on lines +362 to +364
// For DFS with Vec (stack), we need to reverse the logic
// worklist_position represents nodes already processed from the front
// For Vec used as stack, we need to truncate to keep only unprocessed nodes

Copilot AI Dec 29, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading. The comment states "For DFS with Vec (stack), we need to reverse the logic" and "worklist_position represents nodes already processed from the front", but this is inconsistent with the actual save_progress implementation. Looking at line 323, save_progress always sets worklist_position to 0, which means no nodes from the worklist are considered "already processed". The worklist at the time of save contains nodes that are yet to be processed (they would be popped from the end). This comment should be updated to accurately reflect that worklist_position is always 0 for the current implementation, and the logic should either use the worklist as-is or be completely redesigned if partial worklist processing is needed.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXED: Thank you for the careful analysis! You're absolutely right. The comment was misleading and inconsistent with the actual save_progress implementation. I've removed the misleading comment and the unnecessary resume logic. The new comment correctly explains that for DFS with Vec (stack), save_progress always sets worklist_position to 0, and the entire worklist contains nodes yet to be processed (popped from the end).

Action taken: Removed misleading comment (lines 362-364) and replaced it with accurate explanation that save_progress always sets worklist_position to 0 for DFS with Vec stack

Comment on lines 362 to 367
// For DFS with Vec (stack), we need to reverse the logic
// worklist_position represents nodes already processed from the front
// For Vec used as stack, we need to truncate to keep only unprocessed nodes
if progress.worklist_position > 0 && progress.worklist_position < worklist.len() {
worklist = worklist.split_off(progress.worklist_position);
}

Copilot AI Dec 29, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resume logic for the worklist is broken. Looking at line 323 in save_progress, worklist_position is always set to 0 with the comment "We've processed everything before current position". This means when resuming, progress.worklist_position will always be 0, so the condition on line 365 will never be true (0 is not > 0), and split_off will never execute.

The entire resume logic needs to be reconsidered for DFS. In the original BFS implementation, worklist_position tracked how many nodes from the front of the queue had been processed, which made sense with pop_front. For DFS with a stack (Vec), when we save progress, the worklist contains nodes yet to be processed (popped from the end with pop()), so worklist_position should remain 0 and no truncation should happen during resume. The current code is attempting to handle a scenario that doesn't match the save_progress implementation.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXED: Excellent catch! You're completely correct that the resume logic was broken. Since save_progress always sets worklist_position to 0 (line 323), the condition progress.worklist_position > 0 would never be true, making the split_off logic dead code. I've removed this entire dead code block. The worklist is now restored directly without any truncation, which is correct because the worklist at save time contains all unprocessed nodes that will be popped from the end during DFS traversal.

Action taken: Removed unnecessary conditional logic (lines 365-367) that checked worklist_position > 0 and called split_off, as this was dead code since worklist_position is always 0

@jolestar

Copy link
Copy Markdown
Contributor

@holonbot fix

@holonbot

holonbot Bot commented Dec 29, 2025

Copy link
Copy Markdown
Contributor Author

PR #3887 Fix Summary

Overview

PR Title: Fix: [state-prune snapshot] Switch traversal worklist from BFS (VecDeque) to DFS (Vec) to reduce peak memory

PR Number: #3887

Status:Complete - All lint errors fixed

Issues Fixed

1. Clippy Lint Error: ptr_arg (Line 314)

Error:

error: writing `&Vec` instead of `&[_]` involves a new object where a slice will do
   --> crates/rooch-pruner/src/state_prune/snapshot_builder.rs:314:19
    |
314 |         worklist: &Vec<H256>,
    |                   ^^^^^^^^^^
    |
    = help: change this to
    |
314 |         worklist: &[H256],

Root Cause:
The save_progress method signature used &Vec<H256> which is less idiomatic than using a slice reference &[H256]. Clippy's ptr_arg lint recommends using slices for function parameters when the vector isn't being mutated.

Fix Applied:

  • Changed parameter type from &Vec<H256> to &[H256]
  • Changed .clone() to .to_owned() to convert slice to owned Vec

Code Change:

// Before:
fn save_progress(
    &self,
    output_dir: &Path,
    state_root: H256,
    worklist: &Vec<H256>,
    ...
) -> Result<()> {
    let progress = SnapshotProgress {
        worklist: worklist.clone(),

// After:
fn save_progress(
    &self,
    output_dir: &Path,
    state_root: H256,
    worklist: &[H256],
    ...
) -> Result<()> {
    let progress = SnapshotProgress {
        worklist: worklist.to_owned(),

2. Clippy Lint Error: vec_init_then_push (Lines 384-385)

Error:

error: calls to `push` immediately after creation
   --> crates/rooch-pruner/src/state_prune/snapshot_builder.rs:384:17
    |
384 |                 let mut worklist = Vec::new();
385 |                 worklist.push(state_root);
    |
    = help: consider using the `vec![]` macro: `let worklist = vec![..];`

Root Cause:
The code created an empty Vec with Vec::new() and immediately pushed a single element. Clippy's vec_init_then_push lint recommends using the vec![] macro for better readability and performance.

Fix Applied:

  • Replaced Vec::new() + push() with vec![state_root]
  • Removed unnecessary mut keyword since the vector is no longer mutated after creation

Code Change:

// Before:
} else {
    info!("No resumable state found, starting fresh");
    let mut worklist = Vec::new();
    worklist.push(state_root);
    (
        TraversalStatistics::default(),
        worklist,
        ...
    )
}

// After:
} else {
    info!("No resumable state found, starting fresh");
    let worklist = vec![state_root];
    (
        TraversalStatistics::default(),
        worklist,
        ...
    )
}

Verification Results

Build Verification

  • Command: cargo clippy --package rooch-pruner -- -D warnings
  • Result:SUCCESS - No warnings or errors
  • Build time: 8 minutes 23 seconds

Lint Verification

  • Command: cargo fmt -- --check
  • Result:SUCCESS - Code formatting is correct

Review Comments

  • Total review threads: 2
  • Status: Both review comments were previously marked as "FIXED" by the author
  • Action: No additional changes needed for review comments

Code Changes Summary

File Modified: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Changes:

  1. Line 314: Changed worklist: &Vec<H256> to worklist: &[H256]
  2. Line 322: Changed worklist.clone() to worklist.to_owned()
  3. Lines 384-385: Changed let mut worklist = Vec::new(); worklist.push(state_root); to let worklist = vec![state_root];

Impact Analysis

Functionality

  • No behavioral changes - These are pure refactoring changes to improve code quality
  • The DFS to Vec conversion (main PR feature) remains intact
  • Snapshot building logic is unchanged

Performance

  • Minor improvement: Using slice instead of &Vec reduces indirection
  • Minor improvement: Using vec![] macro is more efficient than Vec::new() + push()

Code Quality

  • Improved: Code now follows Rust best practices and idiomatic patterns
  • Improved: Passes all clippy lints with -D warnings flag

Conclusion

All lint errors in PR #3887 have been successfully fixed. The code now:

  • ✅ Passes cargo clippy with strict warnings (-D warnings)
  • ✅ Passes cargo fmt formatting checks
  • ✅ Follows Rust idiomatic patterns
  • ✅ Maintains all original functionality from the PR

The PR is now ready for merge once CI checks pass.

@github-actions

Copy link
Copy Markdown

Holon completed successfully.

Run: https://github.com/rooch-network/rooch/actions/runs/20571795348

@jolestar

jolestar commented Jan 1, 2026

Copy link
Copy Markdown
Contributor

@holonbot

fix the lint error

@github-actions

github-actions Bot commented Jan 1, 2026

Copy link
Copy Markdown

Holon completed successfully.

Run: https://github.com/rooch-network/rooch/actions/runs/20639354313

@jolestar

jolestar commented Jan 1, 2026

Copy link
Copy Markdown
Contributor

@holonbot

we already provide the base image with rust toolchain, please try again.

@github-actions

github-actions Bot commented Jan 1, 2026

Copy link
Copy Markdown

Holon completed successfully.

Run: https://github.com/rooch-network/rooch/actions/runs/20639924994

@jolestar jolestar merged commit fc33500 into main Jan 2, 2026
17 checks passed
@jolestar jolestar deleted the holon/fix-3872-20251229-080248 branch January 2, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[state-prune snapshot] Switch traversal worklist from BFS (VecDeque) to DFS (Vec) to reduce peak memory

2 participants