Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Oct 17, 2025

Summary of changes

This PR introduces a tool to take db snapshot for validating a tipset. The snapshot can be used for benchmarks (may come in a subsequent PR) to justify critial dependency updates (e.g. wasmtime in fvm)

Changes introduced in this pull request:

  • add forest-tool state compute
  • add forest-tool state replay-compute

Example usage

➜  state_compute forest-tool state compute --chain mainnet --epoch 5412980 --export-db-to mainnet_5412980.forest.car.zst
2025-10-17T11:53:23.299386Z  INFO forest::daemon::db_util: Loaded 1 CARs
2025-10-17T11:53:23.303018Z  INFO forest::genesis: Initialized genesis: bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2
epoch: 5412980, state_root: bafy2bzaceayrf3zvlnwxoe2gp5inorvczqgh6vcngiv2gdhocixpvoebbaj5m, receipt_root: bafy2bzacebc3cz43bb76ehtpfuv4brc4echaaxmn2ecih5cx2o5zptym6zzq4, db_snapshot_size: 16.2 MiB
➜  state_compute forest-tool state compute --chain calibnet --epoch 3111900 --export-db-to calibnet_3111900.forest.car.zst
2025-10-17T11:53:33.962248Z  INFO forest::daemon::db_util: Loaded 1 CARs
2025-10-17T11:53:33.966142Z  INFO forest::genesis: Initialized genesis: bafy2bzacecyaggy24wol5ruvs6qm73gjibs2l2iyhcqmvi7r7a4ph7zx3yqd4
epoch: 3111900, state_root: bafy2bzacearxaxwxpn4szgyb3ekhr5ou6kgclhjwexdbx4grdrli45wi2ksuw, receipt_root: bafy2bzacebx5mqrqqgh7s2unqnscqfcili6mumpybdfdkbgimo2zof3mkh6qe, db_snapshot_size: 3 MiB
➜  state_compute forest-tool state replay-compute --chain mainnet -n 5 mainnet_5412980.forest.car.zst
2025-10-17T11:54:31.540237Z  INFO forest::genesis: Initialized genesis: bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2
epoch: 5412980, state_root: bafy2bzaceayrf3zvlnwxoe2gp5inorvczqgh6vcngiv2gdhocixpvoebbaj5m, receipt_root: bafy2bzacebc3cz43bb76ehtpfuv4brc4echaaxmn2ecih5cx2o5zptym6zzq4, took 5s 485ms 678us 801ns.
epoch: 5412980, state_root: bafy2bzaceayrf3zvlnwxoe2gp5inorvczqgh6vcngiv2gdhocixpvoebbaj5m, receipt_root: bafy2bzacebc3cz43bb76ehtpfuv4brc4echaaxmn2ecih5cx2o5zptym6zzq4, took 453ms 433us 769ns.
epoch: 5412980, state_root: bafy2bzaceayrf3zvlnwxoe2gp5inorvczqgh6vcngiv2gdhocixpvoebbaj5m, receipt_root: bafy2bzacebc3cz43bb76ehtpfuv4brc4echaaxmn2ecih5cx2o5zptym6zzq4, took 451ms 79us 754ns.
epoch: 5412980, state_root: bafy2bzaceayrf3zvlnwxoe2gp5inorvczqgh6vcngiv2gdhocixpvoebbaj5m, receipt_root: bafy2bzacebc3cz43bb76ehtpfuv4brc4echaaxmn2ecih5cx2o5zptym6zzq4, took 450ms 547us 244ns.
epoch: 5412980, state_root: bafy2bzaceayrf3zvlnwxoe2gp5inorvczqgh6vcngiv2gdhocixpvoebbaj5m, receipt_root: bafy2bzacebc3cz43bb76ehtpfuv4brc4echaaxmn2ecih5cx2o5zptym6zzq4, took 455ms 183us 853ns.
➜  state_compute forest-tool state replay-compute --chain calibnet -n 5 calibnet_3111900.forest.car.zst
2025-10-17T11:55:11.527308Z  INFO forest::genesis: Initialized genesis: bafy2bzacecyaggy24wol5ruvs6qm73gjibs2l2iyhcqmvi7r7a4ph7zx3yqd4
epoch: 3111900, state_root: bafy2bzacearxaxwxpn4szgyb3ekhr5ou6kgclhjwexdbx4grdrli45wi2ksuw, receipt_root: bafy2bzacebx5mqrqqgh7s2unqnscqfcili6mumpybdfdkbgimo2zof3mkh6qe, took 3s 694ms 652us 947ns.
epoch: 3111900, state_root: bafy2bzacearxaxwxpn4szgyb3ekhr5ou6kgclhjwexdbx4grdrli45wi2ksuw, receipt_root: bafy2bzacebx5mqrqqgh7s2unqnscqfcili6mumpybdfdkbgimo2zof3mkh6qe, took 48ms 269us 352ns.
epoch: 3111900, state_root: bafy2bzacearxaxwxpn4szgyb3ekhr5ou6kgclhjwexdbx4grdrli45wi2ksuw, receipt_root: bafy2bzacebx5mqrqqgh7s2unqnscqfcili6mumpybdfdkbgimo2zof3mkh6qe, took 50ms 52us 706ns.
epoch: 3111900, state_root: bafy2bzacearxaxwxpn4szgyb3ekhr5ou6kgclhjwexdbx4grdrli45wi2ksuw, receipt_root: bafy2bzacebx5mqrqqgh7s2unqnscqfcili6mumpybdfdkbgimo2zof3mkh6qe, took 47ms 367us 543ns.
epoch: 3111900, state_root: bafy2bzacearxaxwxpn4szgyb3ekhr5ou6kgclhjwexdbx4grdrli45wi2ksuw, receipt_root: bafy2bzacebx5mqrqqgh7s2unqnscqfcili6mumpybdfdkbgimo2zof3mkh6qe, took 49ms 510us 978ns.

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • New Features

    • Added forest-tool state compute to generate DB snapshots for tipset validation, with optional export and diagnostics.
    • Added forest-tool state replay-compute to replay tipset validation from a snapshot, supporting repeated runs and timing output.
  • Documentation

    • CLI reference updated with new state-related commands and reordering of state/backup/completion sections.
  • Changelog

    • Added entries for the new state compute and replay-compute commands.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

Walkthrough

Adds forest-tool state subcommands (compute, replay-compute), implements a new state_compute_cmd module with compute/replay logic, wires State into the CLI runner, adjusts several module/function visibilities, extends CAR path conversion to accept PathBuf, and updates docs/changelog.

Changes

Cohort / File(s) Summary
Changelog & CLI docs
CHANGELOG.md, docs/docs/users/reference/cli.sh
Added unreleased changelog entries and expanded CLI reference with state, state compute, state replay-compute, added state-migration sections and reordered completion section.
New state compute subcommand
src/tool/subcommands/state_compute_cmd.rs
New module adding StateCommand enum with Compute and ReplayCompute variants and their run() implementations; ComputeCommand computes tipset state from a DB and can export a CAR snapshot; ReplayComputeCommand loads a CAR snapshot into an in-memory DB and repeats state computation for benchmarking.
CLI wiring
src/tool/main.rs, src/tool/subcommands/mod.rs
Added State variant to the Subcommand enum, introduced state_compute_cmd module, and dispatches Subcommand::State asynchronously in the CLI runner.
API/test snapshot visibility changes
src/tool/subcommands/api_cmd.rs, src/tool/subcommands/api_cmd/api_compare_tests.rs, src/tool/subcommands/api_cmd/generate_test_snapshot.rs
Adjusted visibilities: generate_test_snapshot module to pub(super), TestDump struct to pub, and load_db function to pub; removed a genesis_header.clone() by moving ownership.
CAR PathBuf conversion
src/db/car/any.rs
Added TryFrom<&PathBuf> impl for AnyCar<EitherMmapOrRandomAccessFile> delegating to existing TryFrom<&Path>, and updated imports to include PathBuf.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as forest-tool CLI
    participant StateCmd as StateCommand::Compute
    participant DB as Read-only DB (ManyCar/ParityDb)
    participant ChainStore as ChainStore
    participant SM as StateManager
    participant CAR as CAR Export

    User->>CLI: forest-tool state compute --epoch N --chain <network> [--db <path>] [--export_db_to <path>]
    CLI->>StateCmd: run()
    StateCmd->>DB: load read-only snapshot (load_db)
    StateCmd->>ChainStore: build ChainStore (with genesis header)
    StateCmd->>ChainStore: resolve tipset at epoch N
    StateCmd->>SM: create StateManager
    StateCmd->>SM: compute tipset state
    SM-->>StateCmd: state_root, receipt_root
    StateCmd->>CAR: optionally export CAR snapshot
    StateCmd->>User: print epoch, state_root, receipt_root, snapshot size
Loading
sequenceDiagram
    actor User
    participant CLI as forest-tool CLI
    participant ReplayCmd as StateCommand::ReplayCompute
    participant AnyCar as AnyCar (load snapshot)
    participant MemDB as In-memory DB (ManyCar<MemoryDB>)
    participant ChainStore as ChainStore
    participant SM as StateManager

    User->>CLI: forest-tool state replay-compute --snapshot <path> --chain <network> -n <count>
    CLI->>ReplayCmd: run()
    ReplayCmd->>AnyCar: load CAR snapshot
    AnyCar-->>ReplayCmd: heaviest tipset & epoch
    ReplayCmd->>MemDB: build in-memory DB from snapshot
    ReplayCmd->>ChainStore: build ChainStore (with genesis header)
    loop n times
        ReplayCmd->>SM: compute tipset state
        SM-->>ReplayCmd: state_root, receipt_root, duration
        ReplayCmd->>User: print epoch, state_root, receipt_root, duration
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • elmattic
  • akaladarshi
  • LesnyRumcajs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "feat: forest-tool state compute" is clearly related to the changeset and accurately describes a major component of the changes. The title specifically refers to the new forest-tool state compute subcommand, which is a real and significant part of the PR. However, the PR actually introduces two complementary features: forest-tool state compute and forest-tool state replay-compute, both of which appear to be equally important based on the PR objectives. The title captures only one of these two main features, making it partially related to the complete scope of the changeset rather than fully related.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/tool-state-compute

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 marked this pull request as ready for review October 17, 2025 12:25
@hanabi1224 hanabi1224 requested a review from a team as a code owner October 17, 2025 12:25
@hanabi1224 hanabi1224 requested review from akaladarshi and elmattic and removed request for a team October 17, 2025 12:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/db/car/any.rs (1)

141-146: LGTM with an optional idiom note.

The implementation correctly delegates to the existing TryFrom<&Path> via as_path(). This provides convenient conversion for callers holding &PathBuf.

Note: Rust API guidelines typically favor accepting &Path over &PathBuf since PathBuf derefs to Path, making &Path more flexible. However, as a trait implementation (rather than a function parameter), this may simplify specific call sites. If this isn't used frequently, consider whether callers could use .as_ref() or .as_path() with the existing TryFrom<&Path> instead.

src/tool/subcommands/state_compute_cmd.rs (3)

81-81: Avoid unwraps on fallible constructors; add context

Replace unwrap() with ? plus anyhow::Context for clearer errors.

+use anyhow::Context as _;

 ...
-            )
-            .unwrap(),
+            )
+            .context("creating ChainStore")?,
 ...
-        let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
+        let state_manager = Arc::new(
+            StateManager::new(chain_store.clone()).context("creating StateManager")?
+        );
 ...
-            )
-            .unwrap(),
+            )
+            .context("creating ChainStore (replay)")?,
 ...
-        let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
+        let state_manager = Arc::new(
+            StateManager::new(chain_store.clone()).context("creating StateManager (replay)")?
+        );

Also applies to: 90-90, 145-145


100-107: Stream snapshot to file when exporting to avoid double buffering

Currently you always build a full in-memory Vec<u8> then optionally write it to disk. When --export-db-to is set, write directly to the file; fall back to in-memory only when not exporting.

-        let mut db_snapshot = vec![];
-        db.export_forest_car(&mut db_snapshot).await?;
-        println!(
-            "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, db_snapshot_size: {}",
-            human_bytes::human_bytes(db_snapshot.len() as f64)
-        );
-        if let Some(export_db_to) = export_db_to {
-            std::fs::write(export_db_to, db_snapshot)?;
-        }
+        if let Some(export_db_to) = export_db_to {
+            let mut file = std::fs::File::create(&export_db_to)?;
+            db.export_forest_car(&mut file).await?;
+            let size = file.metadata()?.len();
+            println!(
+                "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, db_snapshot_size: {}",
+                human_bytes::human_bytes(size as u128)
+            );
+        } else {
+            let mut db_snapshot = Vec::new();
+            db.export_forest_car(&mut db_snapshot).await?;
+            println!(
+                "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, db_snapshot_size: {}",
+                human_bytes::human_bytes(db_snapshot.len() as u128)
+            );
+        }

153-164: Minor: consistent byte/time formatting across commands

You’re using humantime and human_bytes—good. Consider adding the same byte-size line to replay runs (e.g., snapshot size once before the loop) for parity with compute. Optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15ea901 and e207be1.

📒 Files selected for processing (9)
  • CHANGELOG.md (1 hunks)
  • docs/docs/users/reference/cli.sh (1 hunks)
  • src/db/car/any.rs (2 hunks)
  • src/tool/main.rs (1 hunks)
  • src/tool/subcommands/api_cmd.rs (1 hunks)
  • src/tool/subcommands/api_cmd/api_compare_tests.rs (1 hunks)
  • src/tool/subcommands/api_cmd/generate_test_snapshot.rs (2 hunks)
  • src/tool/subcommands/mod.rs (2 hunks)
  • src/tool/subcommands/state_compute_cmd.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/tool/subcommands/state_compute_cmd.rs (6)
src/cli_shared/mod.rs (2)
  • chain_path (20-22)
  • read_config (24-41)
src/db/mod.rs (1)
  • db_root (320-322)
src/genesis/mod.rs (1)
  • read_genesis_header (21-44)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (3)
  • load_db (73-79)
  • new (190-195)
  • setting_keys (250-252)
src/db/car/any.rs (4)
  • new (34-53)
  • try_from (129-131)
  • try_from (136-138)
  • try_from (143-145)
src/networks/mod.rs (1)
  • from_chain (404-414)
src/db/car/any.rs (2)
src/db/car/forest.rs (2)
  • std (109-109)
  • try_from (241-243)
src/db/car/many.rs (2)
  • try_from (157-159)
  • try_from (164-166)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (1)
src/tool/subcommands/shed_cmd/migration.rs (1)
  • load_db (101-107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Check
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: Deploy to Cloudflare Pages
  • GitHub Check: All lint checks
🔇 Additional comments (2)
src/db/car/any.rs (1)

21-21: LGTM!

The import update correctly adds PathBuf to support the new TryFrom implementation below.

src/tool/subcommands/state_compute_cmd.rs (1)

101-104: Let me verify the actual state by checking the codebase more carefully:

Based on the verification, I found the human_bytes function in version 0.2.1 accepts both u32 directly and u64 as f64, suggesting it uses generic type handling. However, I cannot definitively verify the exact API signature for version 0.4 from web sources. The inconsistency in the codebase reveals that:

  • archive_cmd.rs:336 passes index_size_bytes without an f64 cast
  • state_compute_cmd.rs:103 casts with as f64
  • snapshot.rs:357 also casts with as f64

If the code currently compiles with these different patterns, the function likely accepts multiple numeric types. The review comment's rationale about "precision loss" for large snapshots is reasonable (f64 loses precision at very large byte counts), but the suggestion to use u128 specifically requires verification of the 0.4 API.

Use integer type for human_bytes to avoid precision loss

The current code casts to f64, which loses precision for large byte counts. However, before applying the suggested change to u128, verify that version 0.4 of human_bytes accepts u128. The codebase already uses this function inconsistently: archive_cmd.rs:336 passes an integer directly without casting, while state_compute_cmd.rs:103 and snapshot.rs:357 both use f64 casts. Align all usages after confirming the supported input types for human_bytes 0.4.

mod net_cmd;
mod shed_cmd;
mod snapshot_cmd;
mod state_compute_cmd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix “private type in public interface” for Subcommand::State

Subcommand is pub, but it exposes state_compute_cmd::StateCommand from a private module, which breaks visibility (E0446). Re-export the type and reference the re-export, or make the module public. Minimal fix:

 mod state_compute_cmd;
+pub use self::state_compute_cmd::StateCommand;

 ...
-    #[command(subcommand)]
-    State(state_compute_cmd::StateCommand),
+    #[command(subcommand)]
+    State(StateCommand),

Also applies to: 85-87

🤖 Prompt for AI Agents
In src/tool/subcommands/mod.rs around lines 15 and also affecting lines 85-87,
Subcommand::State exposes StateCommand from a private module causing a "private
type in public interface" error; fix by either making the module public (change
the module declaration to pub mod state_compute_cmd) or re-exporting the type
(add pub use state_compute_cmd::StateCommand) and update Subcommand::State to
reference the re-exported StateCommand (e.g., use
crate::tool::subcommands::StateCommand) so the public enum only exposes public
types.

Comment on lines 89 to 91
SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());

Copy link
Contributor

@coderabbitai coderabbitai bot Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Wrong store used when writing HEAD; likely compile error and unintended DB mutation

SettingsStoreExt::write_obj(&db.tracker, ...) targets the read-ops tracker, not the settings store. This won’t compile and, if pointed to the real DB, could also mutate a live DB’s HEAD. Use the DB wrapper itself, or drop the HEAD write entirely (not needed when passing ts to compute_tipset_state).

-        SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
+        // Optional: this write is not strictly necessary for compute; remove if you want no side-effects.
+        SettingsStoreExt::write_obj(&*db, crate::db::setting_keys::HEAD_KEY, &ts.key())?;

If you prefer method syntax:

-        SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
+        db.write_obj(crate::db::setting_keys::HEAD_KEY, &ts.key())?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
// Optional: this write is not strictly necessary for compute; remove if you want no side-effects.
SettingsStoreExt::write_obj(&*db, crate::db::setting_keys::HEAD_KEY, &ts.key())?;
let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
🤖 Prompt for AI Agents
In src/tool/subcommands/state_compute_cmd.rs around lines 89 to 91, the call
uses SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY,
ts.key()) which targets the tracker store (causing a compile error and risking
mutating the wrong DB); change the call to use the DB wrapper (e.g.
SettingsStoreExt::write_obj(&db, crate::db::setting_keys::HEAD_KEY, ts.key()))
or simply remove the HEAD write entirely since ts is passed into
compute_tipset_state; ensure you use the correct settings store reference or
drop the write to avoid unintended DB mutation.

Copy link
Contributor Author

@hanabi1224 hanabi1224 Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended. Setting the head for exporting the snapshot from the tracker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the ReadOpsTrackingStore definition and understand its structure
ast-grep --pattern 'struct ReadOpsTrackingStore'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for how the snapshot/export mechanism works
rg -nP -C5 'export.*snapshot|snapshot.*export' --type=rust

Length of output: 19887


🏁 Script executed:

#!/bin/bash
# Look for the tracker field usage in the codebase
rg -nP '\btracker\b' --type=rust -C3 -g '!target/**'

Length of output: 17067


🏁 Script executed:

#!/bin/bash
# Find where export_snapshot_to_forest_car is implemented/called
rg -nP 'export_snapshot_to_forest_car|export.*forest.*car' --type=rust -C5

Length of output: 7646


@hanabi1224, thank you for the clarification! You're absolutely right. I now understand the architecture:

The ReadOpsTrackingStore uses its tracker (a MemoryDB) to accumulate all data accessed during the state computation. When export_forest_car is called, it exports from this tracker, creating a minimal snapshot containing only the blocks that were actually read.

Writing HEAD_KEY to db.tracker is indeed necessary because MemoryDB::export_forest_car reads the HEAD_KEY from the store it's exporting. Without this write, the export would fail.

I apologize for the incorrect initial assessment. The code is working as designed.

🐰

  /)_/)
 (•.•)
 />📚  Learning noted!

✏️ Learnings added
Learnt from: hanabi1224
PR: ChainSafe/forest#6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.006Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

chain_config,
genesis_header,
)
.unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unwrap? Is it infallible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

)
.unwrap(),
);
let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unwrap? Is it infallible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

)?;
let epoch = ts.epoch();
SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unwrap? Is it infallible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

chain_config,
genesis_header,
)
.unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unwrap? Is it infallible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Co-authored-by: Hubert <hubert@chainsafe.io>
Copy link
Member

@LesnyRumcajs LesnyRumcajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand those are short-lived commands, but panics are really a bad user experience. Let's propagate errors with helpful messages, by the book.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/tool/subcommands/state_compute_cmd.rs (4)

61-66: Add error context when resolving DB path/config.

Wrap fallible ops with anyhow::Context for actionable CLI errors.

Apply within the selected lines:

-            let (_, config) = read_config(None, Some(chain.clone()))?;
-            db_root(&chain_path(&config))?
+            let (_, config) = read_config(None, Some(chain.clone()))
+                .context("reading forest config")?;
+            db_root(&chain_path(&config))
+                .context("resolving database root path")?

Outside the selected lines, add the import near the other uses:

+use anyhow::Context;

72-82: Avoid unwraps; propagate with context.

Prefer bubbling errors with context over panicking in a CLI tool.

-        let chain_store = Arc::new(
-            ChainStore::new(
+        let chain_store = Arc::new(
+            ChainStore::new(
                 db.clone(),
                 db.clone(),
                 db.clone(),
                 db.clone(),
                 chain_config,
                 genesis_header,
-            )
-            .unwrap(),
+            )
+            .context("initializing ChainStore")?,
         );
-        let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
+        let state_manager = Arc::new(
+            StateManager::new(chain_store.clone()).context("initializing StateManager")?,
+        );
-        let chain_store = Arc::new(
-            ChainStore::new(
+        let chain_store = Arc::new(
+            ChainStore::new(
                 db.clone(),
                 db.clone(),
                 db.clone(),
                 db.clone(),
                 chain_config,
                 genesis_header,
-            )
-            .unwrap(),
+            )
+            .context("initializing ChainStore")?,
         );
-        let state_manager = Arc::new(StateManager::new(chain_store.clone()).unwrap());
+        let state_manager = Arc::new(
+            StateManager::new(chain_store.clone()).context("initializing StateManager")?,
+        );

Also applies to: 90-90, 146-146, 148-148


89-89: Keep writing HEAD to the tracker; add a brief comment to document intent.

This is correct for exporting minimal snapshots from ReadOpsTrackingStore. Add a note to prevent future “wrong store” regressions.

-        SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
+        // Write HEAD to the tracker MemoryDB so export_forest_car() uses the accessed subset.
+        // This intentionally does NOT mutate the underlying on-disk DB.
+        SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;

Based on learnings


149-166: Print iteration index to ease benchmarking output parsing.

Small UX win when running -n > 1.

-        for _ in 0..n.get() {
+        for i in 1..=n.get() {
             let start = Instant::now();
@@
-            println!(
-                "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, took {}.",
-                humantime::format_duration(start.elapsed())
-            );
+            println!(
+                "run {}/{}: epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, took {}.",
+                i,
+                n.get(),
+                humantime::format_duration(start.elapsed())
+            );
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 544c005 and fff6e7b.

📒 Files selected for processing (1)
  • src/tool/subcommands/state_compute_cmd.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.046Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.
📚 Learning: 2025-10-17T14:24:47.046Z
Learnt from: hanabi1224
PR: ChainSafe/forest#6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.046Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.

Applied to files:

  • src/tool/subcommands/state_compute_cmd.rs
🧬 Code graph analysis (1)
src/tool/subcommands/state_compute_cmd.rs (5)
src/cli_shared/mod.rs (2)
  • chain_path (20-22)
  • read_config (24-41)
src/db/mod.rs (1)
  • db_root (320-322)
src/genesis/mod.rs (1)
  • read_genesis_header (21-44)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (2)
  • load_db (73-79)
  • new (190-195)
src/db/car/any.rs (4)
  • new (34-53)
  • try_from (129-131)
  • try_from (136-138)
  • try_from (143-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: All lint checks
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Check
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (3)
src/tool/subcommands/state_compute_cmd.rs (3)

83-88: LGTM: robust tipset resolution across null rounds.

Using ResolveNullTipset::TakeOlder with heaviest tipset is appropriate here.


122-123: Infallible unwrap is fine here.

NonZeroUsize::new(1).unwrap() is safe by construction; no change needed.


99-107: Review comment is incorrect and will not compile.

The suggested diff uses std::fs::File and std::io::BufWriter, which implement the synchronous std::io::Write trait. However, export_forest_car requires W: tokio::io::AsyncWrite + Unpin—these are incompatible types. The code will fail to compile with the proposed changes.

The current implementation already works correctly: Vec<u8> properly implements tokio::io::AsyncWrite, so there is no trait compatibility issue. Additionally, the code is already correctly writing HEAD_KEY to db.tracker (line 89), as intended for ReadOpsTrackingStore snapshots.

If streaming is actually desired for performance reasons, use tokio::fs::File and tokio::io::BufWriter instead—but this refactoring is not necessary for correctness.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/tool/subcommands/state_compute_cmd.rs (1)

119-120: Default for n is fine

NonZeroUsize::new(1).unwrap() is infallible here; keeping it is acceptable.

🧹 Nitpick comments (4)
src/tool/subcommands/state_compute_cmd.rs (4)

86-86: Writing HEAD to tracker is correct for minimal CAR export; add a clarifying comment and avoid an extra alloc

This HEAD write must target the tracker MemoryDB so export_forest_car roots the snapshot correctly. Add a short comment to prevent regressions, and pass a reference to avoid an extra allocation.

Based on learnings

-        SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, ts.key())?;
+        // Export reads from the tracker MemoryDB; seed HEAD so the CAR roots correctly.
+        SettingsStoreExt::write_obj(&db.tracker, crate::db::setting_keys::HEAD_KEY, &ts.key())?;

96-104: Avoid buffering entire snapshot in memory when writing to a file

Exporting to Vec allocates the full CAR in memory even when --export-db-to is provided. Stream directly to the file and derive size from metadata; fall back to Vec only when no path is given.

-        let mut db_snapshot = vec![];
-        db.export_forest_car(&mut db_snapshot).await?;
-        println!(
-            "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, db_snapshot_size: {}",
-            human_bytes::human_bytes(db_snapshot.len() as f64)
-        );
-        if let Some(export_db_to) = export_db_to {
-            std::fs::write(export_db_to, db_snapshot)?;
-        }
+        if let Some(path) = export_db_to {
+            use std::io::Write;
+            let file = std::fs::File::create(&path)?;
+            let mut writer = std::io::BufWriter::new(file);
+            db.export_forest_car(&mut writer).await?;
+            writer.flush()?;
+            let size = std::fs::metadata(&path)?.len();
+            println!(
+                "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, db_snapshot_size: {}",
+                human_bytes::human_bytes(size as f64)
+            );
+        } else {
+            let mut buf = Vec::new();
+            db.export_forest_car(&mut buf).await?;
+            println!(
+                "epoch: {epoch}, state_root: {state_root}, receipt_root: {receipt_root}, db_snapshot_size: {}",
+                human_bytes::human_bytes(buf.len() as f64)
+            );
+        }

45-51: CLI UX: add ValueHint for path args

Helps shell completions and tooling.

-    #[arg(long)]
+    #[arg(long, value_hint = clap::ValueHint::DirPath)]
     db: Option<PathBuf>,
-    #[arg(long)]
+    #[arg(long, value_hint = clap::ValueHint::FilePath)]
     export_db_to: Option<PathBuf>,

112-121: Snapshot path: add ValueHint

Small UX win for positional arg.

 pub struct ReplayComputeCommand {
     /// Path to the database snapshot `CAR` file generated by `forest-tool state compute`
-    snapshot: PathBuf,
+    #[arg(value_hint = clap::ValueHint::FilePath)]
+    snapshot: PathBuf,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fff6e7b and d0c08c6.

📒 Files selected for processing (1)
  • src/tool/subcommands/state_compute_cmd.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.046Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.
📚 Learning: 2025-10-17T14:24:47.046Z
Learnt from: hanabi1224
PR: ChainSafe/forest#6167
File: src/tool/subcommands/state_compute_cmd.rs:89-91
Timestamp: 2025-10-17T14:24:47.046Z
Learning: In `src/tool/subcommands/state_compute_cmd.rs`, when using `ReadOpsTrackingStore` to generate minimal snapshots, `HEAD_KEY` should be written to `db.tracker` (not `db` itself) before calling `export_forest_car()`, because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation.

Applied to files:

  • src/tool/subcommands/state_compute_cmd.rs
🧬 Code graph analysis (1)
src/tool/subcommands/state_compute_cmd.rs (6)
src/cli_shared/mod.rs (2)
  • chain_path (20-22)
  • read_config (24-41)
src/db/mod.rs (1)
  • db_root (320-322)
src/genesis/mod.rs (1)
  • read_genesis_header (21-44)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (2)
  • load_db (73-79)
  • new (190-195)
src/db/car/any.rs (4)
  • new (34-53)
  • try_from (129-131)
  • try_from (136-138)
  • try_from (143-145)
src/networks/mod.rs (1)
  • from_chain (404-414)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: Deploy to Cloudflare Pages
🔇 Additional comments (3)
src/tool/subcommands/state_compute_cmd.rs (3)

61-66: DB root resolution looks correct

Falls back to config-driven chain path when --db is omitted. Clean and predictable.


126-133: PathBuf TryFrom and genesis init LGTM

Using AnyCar::try_from(&snapshot) and loading genesis via chain config over the in‑memory store is sound.


102-104: The review comment is incorrect—export_forest_car() already produces zstd-compressed .forest.car.zst format

The code at lines 102-104 is correct. The export_forest_car() method (src/db/memory.rs lines 28-56) explicitly:

  1. Calls forest::Encoder::compress_stream_default() to compress blocks into zstd frames
  2. Calls forest::Encoder::write() to write the compressed frames in .forest.car.zst format

The data written to disk is already compressed; no additional compression is needed. The PR summary correctly describes the output as .forest.car.zst.

Likely an incorrect or invalid review comment.

@hanabi1224 hanabi1224 enabled auto-merge October 24, 2025 10:47
@hanabi1224 hanabi1224 added this pull request to the merge queue Oct 28, 2025
Merged via the queue into main with commit 3619c44 Oct 28, 2025
51 checks passed
@hanabi1224 hanabi1224 deleted the hm/tool-state-compute branch October 28, 2025 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants