Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Sep 8, 2025

Summary of changes

This PR fixes OOM issue in running forest-tool archive diff --epoch 2819795 forest_snapshot_mainnet_2023-05-01_height_2820000.forest.car.zst

Output:

forest-tool archive diff --epoch 2819795 forest_snapshot_mainnet_2023-05-01_height_2820000.forest.car.zst

- Expected state hash: bafy2bzacedflaenehbgpwlzgeimnamtuuu45vy5zk3u6vp345h7zs4ybweerq
+ Computed state hash: bafy2bzacecyw2yay6jkkmyob53u4hobgsi57xiqdfla6uu437w2vriarb5nds
Address f04 changed: 
 ActorState(ActorState { code: Cid(bafk2bzaceaxgloxuzg35vu7l7tohdgaq2frsfp4ejmuo7tkoxjp5zqrze6sf4)
- state: Cid(bafy2bzacedabdkat4ikokbpmzkbmwjql63zqmj2mlqdbbuzscmo6jz7vhwpxi)
+ state: Cid(bafy2bzaceanjejlvqc5gwilc62xpmdvzkgfzcqgp4igcymywto2vm25wexnni)
  sequence: 0
  balance: TokenAmount(0.011)
  delegated_address: None })
V11(State { total_raw_byte_power: 14030677426019762176
  total_bytes_committed: 14030973057208680448
  total_quality_adj_power: 22832346608471441408
  total_qa_bytes_committed: 22833184005643042816
- total_pledge_collateral: TokenAmount(143269926.2652187083170961)
+ total_pledge_collateral: TokenAmount(143270003.413371965258963041)
  this_epoch_raw_byte_power: 14030677426019762176
  this_epoch_quality_adj_power: 22832346608471441408
- this_epoch_pledge_collateral: TokenAmount(143269926.2652187083170961)
+ this_epoch_pledge_collateral: TokenAmount(143270003.413371965258963041)
  this_epoch_qa_power_smoothed: FilterEstimate { position: 7788953586247045787797498918428003402915103487709779837094
  velocity: 30553465660375297517809812792028847344200527615729490 }
  miner_count: 597663
  miner_above_min_power_count: 3468
  cron_event_queue: Cid(bafy2bzaceblqrq7j5jz75xqtwkgm7h67chxyp443zy7agjzkf6dvcpthxbhmw)
  first_cron_epoch: 2819796
  claims: Cid(bafy2bzacecl7s52etztcf2aaj5ccilgrzovhr4zeitxcl4n35ti7eucxbfwdg)
  proof_validation_batch: None })
Address f099 changed: 
 ActorState(ActorState { code: Cid(bafk2bzacealnlr7st6lkwoh6wxpf2hnrlex5sknaopgmkr2tuhg7vmbfy45so)
  state: Cid(bafy2bzacedpuk5ggwoq3s2wixsyjjnexpsjstdlyntio76vs2lt2jvy3o6mau)
  sequence: 0
- balance: TokenAmount(36187969.358883641800996567)
+ balance: TokenAmount(36187896.068627543923907091)
  delegated_address: None })
V11(State { address: Address { payload: ID(99) } })
Address f094625 changed: 
 ActorState(ActorState { code: Cid(bafk2bzacec24okjqrp7c7rj3hbrs5ez5apvwah2ruka6haesgfngf37mhk6us)
- state: Cid(bafy2bzacebr6if6imx2olrhrb3gquzgfwd4zkn36cxinnikiz4lyngys3eeyk)
+ state: Cid(bafy2bzacedjrgrvkj4wopdyecsqviyzride2ae4itin3uzabxg5kqh4jrr5l6)
  sequence: 0
- balance: TokenAmount(49655.00241996624371334)
- delegated_address: None })
V11(State { info: Cid(bafy2bzacealxryywh3wxrpq2ydnv5p53mp64x5eniswhj5tdnd67lhee6tjua)
+ balance: TokenAmount(49732.150573223185580281)
+ delegated_address: None })
V11(State { info: Cid(bafy2bzaced5vr2epcy7ao63r6sbda6jberludtn77u2dg6xufeydtyazo3b4m)
  pre_commit_deposits: TokenAmount(0.0)
- locked_funds: TokenAmount(5036.123441842349323139)
- vesting_funds: Cid(bafy2bzacedo75tr7blox5i374yioa2fk3xpzyqes7tb4tmiuu7beuwounk4oa)
+ locked_funds: TokenAmount(5113.27159509929119008)
+ vesting_funds: Cid(bafy2bzaceayfqbja334jl3cjggp5jhdrnwb6k3l2ey7mz5qro56mptkwvprae)
  fee_debt: TokenAmount(0.0)
  initial_pledge: TokenAmount(35640.487189858427627691)
  pre_committed_sectors: Cid(bafy2bzaceamp42wmmgr2g2ymg46euououzfyck7szknvfacqscohrvaikwfay)
  pre_committed_sectors_cleanup: Cid(bafy2bzaceaa2jny7gkgdwnid4kuldau6bnvgyss5bszo4uy6uikrncvdu5mc2)
  allocated_sectors: Cid(bafy2bzaceagnnbswwvovpa4knbmqco7iweh62ffsm4jelgp5d2mmmopb4ix2e)
  sectors: Cid(bafy2bzaced4paoaja7y7inff55ua2jqanjsvjpyci2zpg4khy5zhhrdvyitzk)
  proving_period_start: 2817774
  current_deadline: 33
  deadlines: Cid(bafy2bzacebgwqhqu6momrtokexrgqgoyp7baswlne5jaqvlcxm6owygcbfdm2)
  early_terminations: BitField { ranges: []
  set: {}
  unset: {} }
  deadline_cron_active: true })
Address f01984106 changed: 
 ActorState(ActorState { code: Cid(bafk2bzacealnlr7st6lkwoh6wxpf2hnrlex5sknaopgmkr2tuhg7vmbfy45so)
  state: Cid(bafy2bzacebfeoet4rnxbdcsmnrmttjtylyf64zmt27tjoud52ettkvgscr3xo)
  sequence: 111
- balance: TokenAmount(51.096479311654039993)
+ balance: TokenAmount(47.238582152589262528)
  delegated_address: None })
V11(State { address: Address { payload: Secp256k1([78
  66
  50
  65
  69
  41
  115
  113
  158
  124
  128
  139
  119
  241
  247
  133
  197
  177
  154
  216]) } })

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • New Features
    • No user-facing feature changes in this release.
  • Performance
    • Reduced upfront processing when displaying actor state differences; computes details only for changed or added items, improving responsiveness with large diffs.
  • Refactor
    • Streamlined preparation of state diff data and clarified internal constants. No changes to output format or public APIs. Behavior and results remain the same; only efficiency and maintainability improved.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 8, 2025

Walkthrough

Moved pretty-print computation (calc_pp) inside conditional branches for changed/added actors, introduced a COMMA constant for splitting, and retained existing diff/print behavior. No public API changes.

Changes

Cohort / File(s) Summary
State diff logic refactor
src/statediff/mod.rs
Deferred calc_pp computation to only changed/added actors; introduced local COMMA constant for splitting expected/calculated pretty-printed strings; preserved existing diff construction and printing; no exported signature changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant StateDiff as try_print_actor_states
  participant Actors

  Caller->>StateDiff: invoke with actor states
  loop for each actor
    StateDiff->>Actors: determine status (unchanged/changed/added)
    alt changed or added
      note over StateDiff: Compute calc_pp lazily
      StateDiff->>StateDiff: Split expected_pp and calc_pp by COMMA
      StateDiff->>StateDiff: Build diffs
      StateDiff->>Caller: Print changes
    else unchanged
      StateDiff->>Caller: Skip calc_pp and continue
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • LesnyRumcajs
  • elmattic
  • sudo-shashank
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/fix-state-diff-oom

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 marked this pull request as ready for review September 8, 2025 13:03
@hanabi1224 hanabi1224 requested a review from a team as a code owner September 8, 2025 13:03
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and akaladarshi and removed request for a team September 8, 2025 13:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/statediff/mod.rs (3)

94-105: Prefer a char separator and a single, reusable constant.

Minor cleanups to reduce tiny overhead and avoid block-local duplication:

  • Use a char separator (faster split path).
  • Hoist the separator to module scope so it’s defined once.

Apply:

-                const COMMA: &str = ",";
-                let calc_pp = pp_actor_state(bs, actor, depth)?;
+                let calc_pp = pp_actor_state(bs, actor, depth)?;
                 let expected_pp = pp_actor_state(bs, &other, depth)?;
-                let expected = expected_pp
-                    .split(COMMA)
+                let expected = expected_pp
+                    .split(SEP)
                     .map(|s| s.trim_start_matches('\n'))
                     .collect::<Vec<&str>>();
-                let calculated = calc_pp
-                    .split(COMMA)
+                let calculated = calc_pp
+                    .split(SEP)
                     .map(|s| s.trim_start_matches('\n'))
                     .collect::<Vec<&str>>();

Add at module scope (e.g., after use statements):

const SEP: char = ',';

105-110: Optional: fall back to line-based diff for huge tokens to further curb memory spikes.

When comma-splitting yields very large token vectors, a coarse line-diff avoids building massive [&str] slices and can trim peak RAM.

Apply:

-                let diffs = TextDiff::from_slices(&expected, &calculated);
+                let diffs = if expected.len() + calculated.len() > 20_000 {
+                    // Coarser but safer on memory
+                    TextDiff::from_lines(&expected_pp, &calc_pp)
+                } else {
+                    TextDiff::from_slices(&expected, &calculated)
+                };

112-115: Unify output through the same locked handle (consistency and slight perf).

You lock stdout for the “changed” branch but use println! for “added”. Writing both via the same handle avoids interleaving and extra locks.

Apply (within this branch):

-            // Added actor, print out the json format actor state.
-            println!("{}", format!("+ Address {addr}:\n{calc_pp}").green());
+            // Added actor, print out the json format actor state.
+            let stdout = stdout();
+            let mut handle = stdout.lock();
+            writeln!(handle, "{}", format!("+ Address {addr}:\n{calc_pp}").green())?;

Optionally, create one stdout lock before for_each and pass a mutable handle into the closure to avoid repeated locks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f635581 and a1f9adf.

📒 Files selected for processing (1)
  • src/statediff/mod.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
🔇 Additional comments (1)
src/statediff/mod.rs (1)

91-110: Good: defer pretty-printing to only changed actors.

Computing calc_pp inside the changed-actor branch eliminates unnecessary large String allocations for unchanged actors and directly addresses the OOM risk. Nice, targeted fix.

Copy link
Member

@LesnyRumcajs LesnyRumcajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand. Could you please explain what was the issue and how this change fixes it?

@hanabi1224 hanabi1224 enabled auto-merge September 8, 2025 14:13
@hanabi1224
Copy link
Contributor Author

I don't quite understand. Could you please explain what was the issue and how this change fixes it?

@LesnyRumcajs Calling pp_actor_state for every actor state causes OOM on fuzzy(with 64GiB RAM), even with cacheless hamt iteration. Calling pp_actor_state on demand fixes the issue.

@LesnyRumcajs
Copy link
Member

I don't quite understand. Could you please explain what was the issue and how this change fixes it?

@LesnyRumcajs Calling pp_actor_state for every actor state causes OOM on fuzzy(with 64GiB RAM), even with cacheless hamt iteration. Calling pp_actor_state on demand fixes the issue.

But why is that? Is it due to the number of calls to pp_actor_state or is there a particular state that causes it? I don't see how the memory grows here, even if called on all actors; pp_actor_state should claim memory and then release it, no?

@hanabi1224
Copy link
Contributor Author

hanabi1224 commented Sep 9, 2025

even if called on all actors; pp_actor_state should claim memory and then release it, no?

@LesnyRumcajs I don't see memory leak in it, my best guess is, resolving some large IPLD on a single actor causes OOM (which is confirmed by monitoring htop on Fuzzy)

@LesnyRumcajs
Copy link
Member

even if called on all actors; pp_actor_state should claim memory and then release it, no?

@LesnyRumcajs I don't see memory leak in it, my best guess is, resolving some large IPLD on a single actor causes OOM (which is confirmed by monitoring htop on Fuzzy)

I see. So this will surface again if that actor has a diff. Something to keep in mind.

@hanabi1224 hanabi1224 added this pull request to the merge queue Sep 9, 2025
Merged via the queue into main with commit ca01b7f Sep 9, 2025
44 checks passed
@hanabi1224 hanabi1224 deleted the hm/fix-state-diff-oom branch September 9, 2025 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants