fix(pruner): implement comprehensive follow-up fixes for issue #3868#3871
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR implements comprehensive follow-up fixes for issue #3868, addressing correctness, performance, and test quality improvements in the rooch-pruner module. The changes focus on deterministic behavior through integer arithmetic, early validation of unimplemented features, memory efficiency optimizations, and enhanced test coverage for memory pressure scenarios.
Key changes:
- Replaced floating-point calculations with integer arithmetic (per-thousand precision) for deterministic memory pressure detection
- Refactored
filter_new_nodesto take ownership instead of borrowing, eliminating unnecessary cloning - Enhanced validation to fail early for unimplemented deduplication strategies with clear error messages
- Fixed progress tracking calculation bug and improved test assertions for memory pressure scenarios
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
crates/rooch-pruner/src/state_prune/config.rs |
Enhanced validation with inclusive range check for memory_pressure_threshold (0.0..=1.0), early failure for unimplemented strategies, and consistent tracing imports |
crates/rooch-pruner/src/state_prune/snapshot_builder.rs |
Replaced floating-point with integer arithmetic for memory calculations, changed filter_new_nodes to take ownership, used reserve_exact for tighter allocation, improved fallback estimate to 75% |
crates/rooch-pruner/src/tests/scalable_dedup_test.rs |
Fixed progress tracking accumulator bug, enhanced adaptive batch sizing test with forced memory pressure scenarios, added stronger assertions for batch size reduction |
| // Increase by up to 20%: * (1000 + increase) / 1000 | ||
| let new_batch_size = (current_batch_size * (1000 + increase as usize) / 1000) |
There was a problem hiding this comment.
The multiplication current_batch_size * (1000 + increase as usize) can overflow when current_batch_size is very large. While unlikely in practice, consider using checked_mul() or restructuring the calculation to prevent potential overflow.
| // Increase by up to 20%: * (1000 + increase) / 1000 | |
| let new_batch_size = (current_batch_size * (1000 + increase as usize) / 1000) | |
| // Increase by up to 20%: * (1000 + increase) / 1000, using checked_mul to prevent overflow | |
| let new_batch_size = current_batch_size | |
| .checked_mul(1000 + increase as usize) | |
| .map(|v| v / 1000) | |
| .unwrap_or(self.config.batch_size) |
| } | ||
| }) | ||
| .collect(); | ||
| let mut new_nodes = Vec::with_capacity(nodes.len()); |
There was a problem hiding this comment.
The Vec::with_capacity(nodes.len()) pre-allocates space for all input nodes, but in scenarios with high duplication rates, most nodes will be filtered out. Consider using a smaller initial capacity or no pre-allocation to save memory. For example: let mut new_nodes = Vec::new(); or let mut new_nodes = Vec::with_capacity(nodes.len() / 2);
| let mut new_nodes = Vec::with_capacity(nodes.len()); | |
| let mut new_nodes = Vec::new(); |
| // The test passes whether we get None or Some increase - both are valid | ||
| if let Some(increased_size) = no_adjustment { | ||
| assert!( | ||
| adjusted_size < initial_batch_size, | ||
| "Batch size should be reduced under pressure" | ||
| increased_size >= small_batch_size, | ||
| "Increased batch size should be >= original batch size" | ||
| ); | ||
| assert!( | ||
| adjusted_size >= 100, | ||
| "Batch size should not be reduced below minimum" | ||
| ); | ||
| } else { | ||
| println!( | ||
| "No memory pressure detected, batch size remains {}", | ||
| initial_batch_size | ||
| increased_size <= low_pressure_builder.config().batch_size, | ||
| "Increased batch size should not exceed configured maximum" | ||
| ); | ||
| } |
There was a problem hiding this comment.
The test case for low memory pressure scenario doesn't make strong assertions about the expected behavior. With 75% fallback estimate and 90% threshold, the memory usage (75%) is above 60% of threshold (54%), so no batch size adjustment should occur. The test should either assert that no_adjustment is None, or adjust the test parameters to create a scenario that reliably triggers batch size increase. The comment "both are valid" makes this test non-deterministic and less valuable.
| let memory_ratio = current_memory as f64 / self.config.memory_limit as f64; | ||
| let threshold = self.config.memory_pressure_threshold; | ||
| // Use integer arithmetic: multiply by 1000 for 3 decimal places precision | ||
| let memory_per_thousand = (current_memory * 1000) / self.config.memory_limit; |
There was a problem hiding this comment.
The multiplication current_memory * 1000 can overflow when current_memory is large (e.g., near u64::MAX or when memory usage is in the exabyte range). Consider using checked_mul() or saturating_mul() to handle potential overflow safely. For example: let memory_per_thousand = current_memory.saturating_mul(1000) / self.config.memory_limit;
| let memory_per_thousand = (current_memory * 1000) / self.config.memory_limit; | |
| let memory_per_thousand = current_memory.saturating_mul(1000) / self.config.memory_limit; |
| let reduction = std::cmp::min(excess, max_reduction); | ||
|
|
||
| // Reduce by up to 30% using integer arithmetic: * (1000 - reduction) / 1000 | ||
| let new_batch_size = (current_batch_size * (1000 - reduction as usize) / 1000).max(100); |
There was a problem hiding this comment.
The multiplication current_batch_size * (1000 - reduction as usize) can overflow when current_batch_size is very large. While unlikely in practice, consider using checked_mul() or restructuring the calculation to prevent potential overflow. For example: let new_batch_size = (current_batch_size / 1000 * (1000 - reduction as usize) + (current_batch_size % 1000 * (1000 - reduction as usize) / 1000)).max(100);
| let new_batch_size = (current_batch_size * (1000 - reduction as usize) / 1000).max(100); | |
| let new_batch_size = ( | |
| current_batch_size / 1000 * (1000 - reduction as usize) | |
| + (current_batch_size % 1000 * (1000 - reduction as usize) / 1000) | |
| ).max(100); |
5457845 to
a93aa44
Compare
Addresses all review comments and follow-ups from PR #3867: **Correctness & Maintainability:** - Fail validation early for unimplemented deduplication strategies (BloomFilter, Hybrid) - Fix memory_pressure_threshold validation to use inclusive range (0.0..=1.0) - Add consistent tracing imports and use warn/info appropriately - Improve doc comments with clearer threshold unit descriptions **Performance & Scalability:** - Avoid cloning in filter_new_nodes by taking ownership of input batch - Use integer arithmetic for deterministic batch size adjustment (per-thousand precision) - Replace reserve() with reserve_exact() for tighter memory allocation - Improve fallback memory estimate with conservative 75% assumption and warn logging - Enhance adaptive batching with stronger test assertions **Test Quality & Hygiene:** - Strengthen test_adaptive_batch_sizing with forced memory pressure scenarios - Fix progress calculation bug in test logging with proper accumulator tracking - Remove unused HashMap imports and dead code annotations - Ensure comprehensive test coverage for all memory pressure branches - Fix clippy identity_op warnings and code formatting - Make adaptive batch sizing test deterministic by using edge cases **Lint & Code Quality:** - All make lint checks now pass (cargo fmt, cargo clippy, cargo-machete) - Added #[allow(dead_code)] for intentional unused struct field - Fixed code formatting to comply with project standards - Eliminated non-deterministic test behavior All 120 tests passing with improved performance characteristics and better error handling for unsupported configurations. Fixes #3868 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
a93aa44 to
e254257
Compare
Summary
This PR addresses all review comments and follow-up items from PR #3867 as tracked in issue #3868. It implements comprehensive improvements to correctness, performance, and test quality.
Key Changes
Correctness & Maintainability
memory_pressure_thresholdvalidation to use0.0..=1.0instead of exclusive rangetracingimports and usewarn!/info!consistentlyPerformance & Scalability
filter_new_nodesto take ownership and avoid cloningreserve_exact()instead ofreserve()for better memory efficiencyTest Quality & Hygiene
test_adaptive_batch_sizingwith forced memory pressure scenariosTest Results
Files Modified
crates/rooch-pruner/src/state_prune/config.rs- Validation and tracing improvementscrates/rooch-pruner/src/state_prune/snapshot_builder.rs- Performance optimizations and integer arithmeticcrates/rooch-pruner/src/tests/scalable_dedup_test.rs- Enhanced test coverage and fixesThis PR ensures production-ready implementation with robust error handling, deterministic performance characteristics, and comprehensive test validation.
Fixes #3868
🤖 Generated with Claude Code