-
Notifications
You must be signed in to change notification settings - Fork 182
fix: OOM in forest-tool archive diff
#6051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughMoved pretty-print computation (calc_pp) inside conditional branches for changed/added actors, introduced a COMMA constant for splitting, and retained existing diff/print behavior. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant StateDiff as try_print_actor_states
participant Actors
Caller->>StateDiff: invoke with actor states
loop for each actor
StateDiff->>Actors: determine status (unchanged/changed/added)
alt changed or added
note over StateDiff: Compute calc_pp lazily
StateDiff->>StateDiff: Split expected_pp and calc_pp by COMMA
StateDiff->>StateDiff: Build diffs
StateDiff->>Caller: Print changes
else unchanged
StateDiff->>Caller: Skip calc_pp and continue
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/statediff/mod.rs (3)
94-105: Prefer a char separator and a single, reusable constant.Minor cleanups to reduce tiny overhead and avoid block-local duplication:
- Use a char separator (faster split path).
- Hoist the separator to module scope so it’s defined once.
Apply:
- const COMMA: &str = ","; - let calc_pp = pp_actor_state(bs, actor, depth)?; + let calc_pp = pp_actor_state(bs, actor, depth)?; let expected_pp = pp_actor_state(bs, &other, depth)?; - let expected = expected_pp - .split(COMMA) + let expected = expected_pp + .split(SEP) .map(|s| s.trim_start_matches('\n')) .collect::<Vec<&str>>(); - let calculated = calc_pp - .split(COMMA) + let calculated = calc_pp + .split(SEP) .map(|s| s.trim_start_matches('\n')) .collect::<Vec<&str>>();Add at module scope (e.g., after use statements):
const SEP: char = ',';
105-110: Optional: fall back to line-based diff for huge tokens to further curb memory spikes.When comma-splitting yields very large token vectors, a coarse line-diff avoids building massive [&str] slices and can trim peak RAM.
Apply:
- let diffs = TextDiff::from_slices(&expected, &calculated); + let diffs = if expected.len() + calculated.len() > 20_000 { + // Coarser but safer on memory + TextDiff::from_lines(&expected_pp, &calc_pp) + } else { + TextDiff::from_slices(&expected, &calculated) + };
112-115: Unify output through the same locked handle (consistency and slight perf).You lock stdout for the “changed” branch but use println! for “added”. Writing both via the same handle avoids interleaving and extra locks.
Apply (within this branch):
- // Added actor, print out the json format actor state. - println!("{}", format!("+ Address {addr}:\n{calc_pp}").green()); + // Added actor, print out the json format actor state. + let stdout = stdout(); + let mut handle = stdout.lock(); + writeln!(handle, "{}", format!("+ Address {addr}:\n{calc_pp}").green())?;Optionally, create one stdout lock before for_each and pass a mutable handle into the closure to avoid repeated locks.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/statediff/mod.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
🔇 Additional comments (1)
src/statediff/mod.rs (1)
91-110: Good: defer pretty-printing to only changed actors.Computing calc_pp inside the changed-actor branch eliminates unnecessary large String allocations for unchanged actors and directly addresses the OOM risk. Nice, targeted fix.
LesnyRumcajs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand. Could you please explain what was the issue and how this change fixes it?
@LesnyRumcajs Calling |
But why is that? Is it due to the number of calls to |
@LesnyRumcajs I don't see memory leak in it, my best guess is, resolving some large IPLD on a single actor causes OOM (which is confirmed by monitoring |
I see. So this will surface again if that actor has a diff. Something to keep in mind. |
Summary of changes
This PR fixes OOM issue in running
forest-tool archive diff --epoch 2819795 forest_snapshot_mainnet_2023-05-01_height_2820000.forest.car.zstOutput:
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit