Fix: Unresolved Review Comments from PR #3863: Streaming Snapshot Prune Implementation#3896

Merged

jolestar merged 2 commits into

mainfrom

holon/fix-3866-20260105-095732

Jan 6, 2026

jolestar commented Jan 5, 2026

Contributor

Fixes #3866

Summary: Issue #3866 - Streaming Snapshot Prune Implementation Fixes

Overview

Successfully implemented fixes for unresolved review comments from PR #3863 (Streaming Snapshot with RocksDB backend). The changes address logic issues, code quality problems, and documentation clarity in the state pruning snapshot builder.

Changes Made

1. Fixed Infinite Loop Prevention Logic (Issue 2.1)

File: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Problem: The previous infinite loop prevention logic was flawed - it triggered when the queue was empty after popping a node, which is normal during DFS tree traversal.

Solution: Replaced the flawed empty-batch tracking with proper cycle detection using a HashSet to track visited nodes:

Added visited_nodes: HashSet<H256> to track nodes already processed
Changed from checking consecutive_empty_batches to checking if a node was already visited using visited_nodes.insert()
If a node is revisited (detected cycle), log a warning and skip it instead of breaking the entire traversal

2. Fixed nodes_visited Counter Consistency (Issue 2.2)

File: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Problem: The nodes_visited counter was incremented inconsistently:

Incremented by batch_size after writing batches
Incremented by 1 when node not found
Led to inaccurate statistics

Solution: Increment the counter immediately when each node is popped from the queue for processing:

Moved statistics.nodes_visited += 1 to the start of the loop iteration
Removed batch-based increments (lines 468 and 526)
Now accurately reflects the number of nodes processed

3. Documented global_size Field Usage (Issue 4.2)

File: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Problem: global_size was incorrectly set to active_nodes_count (SMT node count), but it should represent the number of global objects.

Solution: Added comprehensive documentation and set global_size to 0 with explanation:

Set global_size to 0 in SnapshotMeta::new() call
Added detailed comment explaining that global_size represents the number of global objects
Documented that this should be populated from StateChangeSet when building a snapshot for a specific tx_order
For state-root-only snapshots, the field remains 0 as the information is not available without querying state change set storage

4. Removed Unused batch_size Field (Issue 3.1)

File: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Problem: The batch_size field in SnapshotNodeWriter was stored but never used.

Solution: Removed the unused field:

Removed #[allow(dead_code)] batch_size: usize from SnapshotNodeWriter struct
Removed batch_size: config.batch_size initialization in new() method

5. Removed Unused output_dir Parameter (Issue 3.2)

File: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Problem: The finalize method had an unused output_dir parameter.

Solution: Removed the parameter:

Changed finalize(self, _output_dir: &Path, metadata: &mut StatePruneMetadata) to finalize(self, metadata: &mut StatePruneMetadata)
Updated the call site at line 221 to remove the &output_dir argument

6. Fixed Compiler Warnings

File: crates/rooch-pruner/src/state_prune/snapshot_builder.rs

Additional fixes:

Changed let MEMORY_CHECK_INTERVAL_SECS to const MEMORY_CHECK_INTERVAL_SECS for proper naming convention
Removed unused batch_size variable that was only used for determining batch buffer length
Prefixed unused config parameter with underscore: _config

Verification Results

Build Status

cargo build --package rooch-pruner --release: ✅ SUCCESS
- Completed in 13m 01s
- Generated /holon/workspace/target/release/librooch_pruner.rlib (4.15 MB)

Code Quality

All compiler warnings related to our changes were fixed
Code follows Rust naming conventions (snake_case for constants)
Removed dead code and unused parameters

Known Environmental Issues

The following environmental issues were encountered but are not related to the code changes:

rustfmt not installed (non-fatal warning)
RocksDB C++ compilation OOM during debug builds (release build works fine)
cargo-clippy and cargo-machete not pre-installed in environment

Files Modified

/holon/workspace/crates/rooch-pruner/src/state_prune/snapshot_builder.rs
- Lines 393-409: Fixed infinite loop prevention with visited nodes tracking
- Line 412: Fixed nodes_visited counter placement
- Lines 223-234: Added documentation for global_size field
- Line 221: Removed output_dir parameter from finalize call
- Line 630-632: Removed unused batch_size field
- Line 636: Prefixed config parameter with underscore
- Line 403: Fixed constant naming convention

Testing Recommendations

While the build succeeds, full testing would require:

Running unit tests in crates/rooch-pruner package
Integration testing with actual state tree traversal
Verification that cycle detection works correctly with circular references
Validation of statistics accuracy (nodes_visited count)

These tests would require a properly configured MoveOSStore and test data, which may not be available in the current environment.

Impact Assessment

Positive Impact

More robust infinite loop prevention using proper cycle detection
Accurate statistics tracking for nodes visited
Cleaner code with removed dead code
Better documentation for future maintenance

No Breaking Changes

All changes are internal to the snapshot builder
Public API remains unchanged (except for finalize method signature which was unused externally)
Existing functionality preserved

Future Work

As noted in the issue comments, some items were deferred for future consideration:

Issue 2.3: Progress calculation using magic number (low priority)
Issue 4.1: Incomplete disk space check (medium priority, requires additional fs2 crate usage)
Issue 3.3: WAL directory configuration inconsistency (low priority, optimization cleanup)


          Fix: Unresolved Review Comments from PR #3863: Streaming Snapshot Pru…

2c8fd10

…ne Implementation

Copilot AI review requested due to automatic review settings

January 5, 2026 09:57

jolestar requested a review from baichuan3 as a code owner

January 5, 2026 09:57

vercel Bot commented Jan 5, 2026 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
rooch-portal-v2.1	Ready	Preview, Comment	Jan 5, 2026 1:28pm
test-portal	Ready	Preview, Comment	Jan 5, 2026 1:28pm

1 Skipped Deployment

Project	Deployment	Review	Updated (UTC)
rooch	Ignored	Preview	Jan 5, 2026 1:28pm

github-actions Bot commented Jan 5, 2026 •

edited

Loading

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot started reviewing on behalf of jolestar

January 5, 2026 09:58

vercel Bot deployed to Preview – test-portal

January 5, 2026 09:59

View deployment

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Contributor

Pull request overview

This PR addresses review feedback from PR #3863 by fixing logic issues in the streaming snapshot builder for state pruning. The changes focus on improving cycle detection, fixing counter accuracy, removing dead code, and enhancing documentation for the snapshot metadata fields.

Key changes:

Replaced flawed empty-batch tracking with HashSet-based cycle detection for infinite loop prevention
Repositioned nodes_visited counter increment to track all processed nodes
Added comprehensive documentation for the global_size field and set it to 0 with explanation

crates/rooch-pruner/src/state_prune/snapshot_builder.rs Outdated

    
                      let mut last_memory_check = Instant::now();

                      let mut last_progress_save = Instant::now();

                      const MAX_EMPTY_BATCHES: u32 = 100; // Safety limit to prevent infinite loops

                      let mut visited_nodes = std::collections::HashSet::new();

Copilot AI Jan 5, 2026

The visited_nodes HashSet is not persisted when saving progress (line 498-515), which means cycle detection state will be lost when resuming from a saved checkpoint. This could allow infinite loops to occur after resume if there are circular references in the state tree that span across the resume boundary.

Consider either:

Adding visited_nodes to the SnapshotProgress struct and persisting it during save_progress calls, or
Using the existing snapshot_writer.contains_node check (line 421) as the sole deduplication mechanism and removing the visited_nodes tracking entirely, since nodes already written to the snapshot DB won't be reprocessed anyway.

Copilot uses AI. Check for mistakes.

jolestar Jan 5, 2026

Contributor Author

✅ FIXED: Excellent analysis! You're absolutely right that the visited_nodes HashSet wasn't being persisted, which would break cycle detection on resume. I've taken your suggestion #2 and removed the visited_nodes HashSet entirely, relying instead on the existing snapshot_writer.contains_node check for deduplication. This approach is better because:

The RocksDB-based contains_node check provides O(1) space complexity (no HashSet memory overhead)
It's automatically persisted with the snapshot data
It handles both deduplication AND cycle detection in a single check
No changes needed to the SnapshotProgress persistence logic

The comment now explicitly documents that the contains_node check serves as cycle detection since nodes already processed won't be revisited.

Action taken: Removed visited_nodes HashSet initialization and the cycle detection block that used it. Moved the contains_node check to happen first (before nodes_visited increment) and updated the comment to clarify it serves as both deduplication and cycle detection.

crates/rooch-pruner/src/state_prune/snapshot_builder.rs Outdated

                           }
+                          // Increment nodes_visited counter immediately when we start processing
+                          statistics.nodes_visited += 1;

Copilot AI Jan 5, 2026

The nodes_visited counter is now incremented for every node popped from the queue (line 417), but this happens before checking if the node already exists in the snapshot (line 421). If a node already exists and is skipped via the contains_node check, it still gets counted as "visited". This creates inconsistency where nodes_visited includes nodes that weren't actually processed.

The previous implementation had this right - nodes_visited should only be incremented when nodes are actually processed (added to batch_buffer or determined to not exist). Consider moving the increment to after the contains_node check, or incrementing only when a node is added to batch_buffer.

Copilot uses AI. Check for mistakes.

jolestar Jan 5, 2026

Contributor Author

✅ FIXED: Spot on! The counter was indeed being incremented before checking if the node already existed, which would artificially inflate the nodes_visited count with duplicates. I've moved the nodes_visited increment to happen AFTER the contains_node check (line 416), so now it only counts nodes that are actually being processed for the first time.

This ensures that:

nodes_visited accurately reflects only unique nodes processed (not duplicates)
The counter is consistent with the actual work done
Statistics are meaningful for progress tracking and performance analysis

Action taken: Moved the 'statistics.nodes_visited += 1' statement from before the visited_nodes check (line 417 in old code) to after the contains_node check (line 416 in new code). This ensures only unique, non-duplicate nodes are counted.

vercel Bot deployed to Preview – rooch-portal-v2.1

January 5, 2026 10:02

View deployment


          Apply changes from Holon

50bd4fa

jolestar commented Jan 5, 2026

Contributor Author

Summary of Fixes for PR #3896

Overview

Successfully addressed both unresolved review comments from PR #3896 regarding the streaming snapshot prune implementation in crates/rooch-pruner/src/state_prune/snapshot_builder.rs.

Issues Fixed

1. Removed In-Memory visited_nodes HashSet (Comment #2660936474)

Problem: The visited_nodes HashSet was used for cycle detection but was not persisted when saving progress. This meant that cycle detection state would be lost when resuming from a saved checkpoint, potentially allowing infinite loops if there were circular references spanning across the resume boundary.

Solution: Removed the visited_nodes HashSet entirely and rely solely on the existing snapshot_writer.contains_node() check for both deduplication and cycle detection. This approach is superior because:

O(1) space complexity: No HashSet memory overhead
Automatic persistence: The RocksDB snapshot data persists across checkpoints
Dual purpose: Single check handles both deduplication AND cycle detection
No persistence logic changes: No need to modify SnapshotProgress structure

Code Changes:

Removed: let mut visited_nodes = std::collections::HashSet::new(); (line 402)
Removed: The cycle detection block that checked !visited_nodes.insert(current_hash) (lines 407-414)
Reordered: Moved snapshot_writer.contains_node() check to happen first in the loop
Updated comment to clarify that contains_node serves as cycle detection

2. Fixed nodes_visited Counter Timing (Comment #2660936485)

Problem: The nodes_visited counter was incremented BEFORE checking if the node already existed in the snapshot. This meant that duplicate nodes (which are skipped via the contains_node check) were still being counted as "visited", leading to inaccurate statistics.

Solution: Moved the statistics.nodes_visited += 1 increment to happen AFTER the contains_node check but BEFORE fetching node data. This ensures:

Only unique nodes (not already in snapshot) are counted
Statistics accurately reflect actual work performed
Progress tracking is meaningful for performance analysis

Code Changes:

Moved increment from line 417 (before visited_nodes check) to line 416 (after contains_node check)
Added clarifying comments explaining the timing

Verification Results

Build Status

Command: cargo build --package rooch-pruner --release
Result: ✅ SUCCESS
Time: 8m 21s
Output: Generated /holon/workspace/target/release/librooch_pruner.rlib

Code Quality

No compiler warnings
No dead code introduced
Comments updated to explain design decisions
Consistent with existing codebase patterns

Technical Details

Modified File

File: /holon/workspace/crates/rooch-pruner/src/state_prune/snapshot_builder.rs
Lines modified: 398-419 (approximately 22 lines)
Functions affected: build_snapshot_streaming_with_dfs()

Key Improvements

Memory Efficiency: Eliminated HashSet allocation for tracking visited nodes
Correctness: Fixed cycle detection to work correctly across resume checkpoints
Accuracy: nodes_visited counter now only counts actually-processed nodes
Simplicity: Reduced code complexity by using existing deduplication mechanism

Impact Assessment

Positive Impact:

More reliable cycle detection that survives resume operations
Accurate statistics for monitoring and debugging
Reduced memory overhead during traversal
Simpler code with fewer moving parts

No Breaking Changes:

All changes are internal to the snapshot builder
Public API remains unchanged
Backward compatible with existing snapshot data

Testing Recommendations

While the build succeeds, comprehensive testing would verify:

Resume functionality: Test that snapshot resume works correctly without the HashSet
Cycle detection: Verify that circular references in state tree are handled correctly
Statistics accuracy: Confirm nodes_visited count matches actual unique nodes processed
Performance: Validate that RocksDB contains_node checks are efficient for large datasets

These tests would require a properly configured MoveOSStore with realistic state tree data.

Review Comment Responses

Both review comments were from Copilot providing automated code analysis. The feedback was excellent and led to meaningful improvements:

Comment 2660936474: Identified a critical flaw in the cycle detection approach
Comment 2660936485: Caught a subtle bug in statistics tracking

Both issues have been fully addressed with the fixes implemented above.

Conclusion

All unresolved review comments have been successfully addressed. The code now:

Uses RocksDB-based deduplication for both cycle detection and duplicate avoidance
Accurately counts only unique nodes in the statistics
Builds successfully without warnings
Is ready for merge

vercel Bot deployed to Preview – rooch-portal-v2.1

January 5, 2026 13:26

View deployment

vercel Bot deployed to Preview – test-portal

January 5, 2026 13:28

View deployment

jolestar merged commit 1658489 into main

17 checks passed

jolestar deleted the holon/fix-3866-20260105-095732 branch

January 6, 2026 00:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet