Fix: Unresolved Review Comments from PR #3863: Streaming Snapshot Prune Implementation#3896
Conversation
…ne Implementation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR addresses review feedback from PR #3863 by fixing logic issues in the streaming snapshot builder for state pruning. The changes focus on improving cycle detection, fixing counter accuracy, removing dead code, and enhancing documentation for the snapshot metadata fields.
Key changes:
- Replaced flawed empty-batch tracking with HashSet-based cycle detection for infinite loop prevention
- Repositioned nodes_visited counter increment to track all processed nodes
- Added comprehensive documentation for the global_size field and set it to 0 with explanation
| let mut last_memory_check = Instant::now(); | ||
| let mut last_progress_save = Instant::now(); | ||
| const MAX_EMPTY_BATCHES: u32 = 100; // Safety limit to prevent infinite loops | ||
| let mut visited_nodes = std::collections::HashSet::new(); |
There was a problem hiding this comment.
The visited_nodes HashSet is not persisted when saving progress (line 498-515), which means cycle detection state will be lost when resuming from a saved checkpoint. This could allow infinite loops to occur after resume if there are circular references in the state tree that span across the resume boundary.
Consider either:
- Adding visited_nodes to the SnapshotProgress struct and persisting it during save_progress calls, or
- Using the existing snapshot_writer.contains_node check (line 421) as the sole deduplication mechanism and removing the visited_nodes tracking entirely, since nodes already written to the snapshot DB won't be reprocessed anyway.
There was a problem hiding this comment.
✅ FIXED: Excellent analysis! You're absolutely right that the visited_nodes HashSet wasn't being persisted, which would break cycle detection on resume. I've taken your suggestion #2 and removed the visited_nodes HashSet entirely, relying instead on the existing snapshot_writer.contains_node check for deduplication. This approach is better because:
- The RocksDB-based contains_node check provides O(1) space complexity (no HashSet memory overhead)
- It's automatically persisted with the snapshot data
- It handles both deduplication AND cycle detection in a single check
- No changes needed to the SnapshotProgress persistence logic
The comment now explicitly documents that the contains_node check serves as cycle detection since nodes already processed won't be revisited.
Action taken: Removed visited_nodes HashSet initialization and the cycle detection block that used it. Moved the contains_node check to happen first (before nodes_visited increment) and updated the comment to clarify it serves as both deduplication and cycle detection.
| } | ||
|
|
||
| // Increment nodes_visited counter immediately when we start processing | ||
| statistics.nodes_visited += 1; |
There was a problem hiding this comment.
The nodes_visited counter is now incremented for every node popped from the queue (line 417), but this happens before checking if the node already exists in the snapshot (line 421). If a node already exists and is skipped via the contains_node check, it still gets counted as "visited". This creates inconsistency where nodes_visited includes nodes that weren't actually processed.
The previous implementation had this right - nodes_visited should only be incremented when nodes are actually processed (added to batch_buffer or determined to not exist). Consider moving the increment to after the contains_node check, or incrementing only when a node is added to batch_buffer.
There was a problem hiding this comment.
✅ FIXED: Spot on! The counter was indeed being incremented before checking if the node already existed, which would artificially inflate the nodes_visited count with duplicates. I've moved the nodes_visited increment to happen AFTER the contains_node check (line 416), so now it only counts nodes that are actually being processed for the first time.
This ensures that:
- nodes_visited accurately reflects only unique nodes processed (not duplicates)
- The counter is consistent with the actual work done
- Statistics are meaningful for progress tracking and performance analysis
Action taken: Moved the 'statistics.nodes_visited += 1' statement from before the visited_nodes check (line 417 in old code) to after the contains_node check (line 416 in new code). This ensures only unique, non-duplicate nodes are counted.
Summary of Fixes for PR #3896OverviewSuccessfully addressed both unresolved review comments from PR #3896 regarding the streaming snapshot prune implementation in Issues Fixed1. Removed In-Memory visited_nodes HashSet (Comment #2660936474)Problem: The Solution: Removed the
Code Changes:
2. Fixed nodes_visited Counter Timing (Comment #2660936485)Problem: The Solution: Moved the
Code Changes:
Verification ResultsBuild Status
Code Quality
Technical DetailsModified File
Key Improvements
Impact AssessmentPositive Impact:
No Breaking Changes:
Testing RecommendationsWhile the build succeeds, comprehensive testing would verify:
These tests would require a properly configured MoveOSStore with realistic state tree data. Review Comment ResponsesBoth review comments were from Copilot providing automated code analysis. The feedback was excellent and led to meaningful improvements:
Both issues have been fully addressed with the fixes implemented above. ConclusionAll unresolved review comments have been successfully addressed. The code now:
|
Fixes #3866
Summary: Issue #3866 - Streaming Snapshot Prune Implementation Fixes
Overview
Successfully implemented fixes for unresolved review comments from PR #3863 (Streaming Snapshot with RocksDB backend). The changes address logic issues, code quality problems, and documentation clarity in the state pruning snapshot builder.
Changes Made
1. Fixed Infinite Loop Prevention Logic (Issue 2.1)
File:
crates/rooch-pruner/src/state_prune/snapshot_builder.rsProblem: The previous infinite loop prevention logic was flawed - it triggered when the queue was empty after popping a node, which is normal during DFS tree traversal.
Solution: Replaced the flawed empty-batch tracking with proper cycle detection using a
HashSetto track visited nodes:visited_nodes: HashSet<H256>to track nodes already processedconsecutive_empty_batchesto checking if a node was already visited usingvisited_nodes.insert()2. Fixed nodes_visited Counter Consistency (Issue 2.2)
File:
crates/rooch-pruner/src/state_prune/snapshot_builder.rsProblem: The
nodes_visitedcounter was incremented inconsistently:Solution: Increment the counter immediately when each node is popped from the queue for processing:
statistics.nodes_visited += 1to the start of the loop iteration3. Documented global_size Field Usage (Issue 4.2)
File:
crates/rooch-pruner/src/state_prune/snapshot_builder.rsProblem:
global_sizewas incorrectly set toactive_nodes_count(SMT node count), but it should represent the number of global objects.Solution: Added comprehensive documentation and set
global_sizeto 0 with explanation:global_sizeto 0 inSnapshotMeta::new()callglobal_sizerepresents the number of global objectsStateChangeSetwhen building a snapshot for a specific tx_order4. Removed Unused batch_size Field (Issue 3.1)
File:
crates/rooch-pruner/src/state_prune/snapshot_builder.rsProblem: The
batch_sizefield inSnapshotNodeWriterwas stored but never used.Solution: Removed the unused field:
#[allow(dead_code)] batch_size: usizefromSnapshotNodeWriterstructbatch_size: config.batch_sizeinitialization innew()method5. Removed Unused output_dir Parameter (Issue 3.2)
File:
crates/rooch-pruner/src/state_prune/snapshot_builder.rsProblem: The
finalizemethod had an unusedoutput_dirparameter.Solution: Removed the parameter:
finalize(self, _output_dir: &Path, metadata: &mut StatePruneMetadata)tofinalize(self, metadata: &mut StatePruneMetadata)&output_dirargument6. Fixed Compiler Warnings
File:
crates/rooch-pruner/src/state_prune/snapshot_builder.rsAdditional fixes:
let MEMORY_CHECK_INTERVAL_SECStoconst MEMORY_CHECK_INTERVAL_SECSfor proper naming conventionbatch_sizevariable that was only used for determining batch buffer lengthconfigparameter with underscore:_configVerification Results
Build Status
/holon/workspace/target/release/librooch_pruner.rlib(4.15 MB)Code Quality
Known Environmental Issues
The following environmental issues were encountered but are not related to the code changes:
rustfmtnot installed (non-fatal warning)cargo-clippyandcargo-machetenot pre-installed in environmentFiles Modified
/holon/workspace/crates/rooch-pruner/src/state_prune/snapshot_builder.rsTesting Recommendations
While the build succeeds, full testing would require:
crates/rooch-prunerpackageThese tests would require a properly configured MoveOSStore and test data, which may not be available in the current environment.
Impact Assessment
Positive Impact
No Breaking Changes
finalizemethod signature which was unused externally)Future Work
As noted in the issue comments, some items were deferred for future consideration: