feat: add `--unordered` to `forest-cli snapshot export` #5867

hanabi1224 · 2025-07-28T11:35:47Z

Summary of changes

(Will be ready for review after #5835 is merged)

Changes introduced in this pull request:

add --unordered to forest-cli snapshot export to make use of dead code fn unordered_chain_export which exports CAR blocks in non-deterministic order for better performance with more parallelization
show average export speed in forest-cli snapshot export (splitted into feat: print average speed in forest-cli snapshot export #5869)
reduce db read ops in chain_export and unordered_chain_export (made a separate PR fix: reduce db read ops in chain export #5868 for this)
reduce unneccesary Vec::reserve ops in chain_export

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

New Features
- Added an --unordered flag to snapshot export for non-deterministic CAR block ordering to improve export performance.
- RPC export gained an unordered parameter to match CLI behavior.
Bug Fixes
- Dry-run snapshot exports no longer write temporary files or checksums.
Documentation
- Changelog updated to document unordered export.
Tests
- Added automated calibnet check for unordered snapshot export/import.
Chores
- CI now runs unordered snapshot export checks and reports results in integration status.

…ordered

coderabbitai · 2025-07-28T11:35:53Z

Walkthrough

Adds an --unordered export mode across CLI, RPC, and chain export codepaths via a new public ExportOptions struct; refactors unordered chain streaming to an async fused stream with lifetime and optional sender; updates callers/tests; and adds CI job + script to validate unordered snapshot export/import on calibnet.

Changes

Cohort / File(s)	Summary
Changelog `CHANGELOG.md`	Adds entry documenting new `--unordered` flag for `forest-cli snapshot export`.
Chain export API & callers `src/chain/mod.rs`, `src/chain/tests.rs`, `src/db/gc/snapshot.rs`, `src/tool/subcommands/archive_cmd.rs`	Introduces public `ExportOptions { skip_checksum, unordered, seen }`; changes export entry signatures to accept `Option<ExportOptions>` (export_from_head, export, export_v2, export_to_forest_car); selects `unordered_stream_chain` vs `stream_chain` based on option; updates callers and tests to pass `None` or `Some(ExportOptions{...})`.
CLI snapshot command `src/cli/subcommands/snapshot_cmd.rs`	Adds `unordered` boolean flag to `SnapshotCommands::Export` and includes it in `ForestChainExportParams`; skips temp file persistence/checksum when `dry_run` is set.
RPC export params & handlers `src/rpc/methods/chain.rs`	Adds `unordered: bool` to `ForestChainExportParams`; builds and passes `ExportOptions` to chain export functions; `ChainExport` delegates to `ForestChainExport` with `unordered: false`.
Unordered streaming internals `src/ipld/util.rs`	Refactors `UnorderedChainStream` to `<'a>` with `flume::async::RecvStream<'a>` fused stream, makes `extract_sender` optional, adds `with_seen`, clamps worker count, uses async sends, rewrites `poll_next` to poll fused stream and manage internal queue/error handling.
CI and test script `.github/workflows/forest.yml`, `scripts/tests/calibnet_export_unordered_check.sh`	Adds `calibnet-unordered-export-check` GitHub Actions job and new test script to run unordered export, import snapshot, and validate node startup/sync/health.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as forest-cli
  participant RPC as Forest RPC
  participant Chain as chain::export
  participant IPLD as {stream_chain|unordered_stream_chain}
  participant Writer as AsyncWriterWithChecksum

  User->>CLI: snapshot export --unordered
  CLI->>RPC: ForestChainExport(unordered=true, params)
  RPC->>Chain: export(..., ExportOptions{unordered, skip_checksum, seen})
  alt unordered == true
    Chain->>IPLD: unordered_stream_chain(...).with_seen(seen)
  else
    Chain->>IPLD: stream_chain(...).with_seen(seen)
  end
  IPLD-->>Chain: CAR block stream
  Chain->>Writer: write frames (checksum based on skip_checksum)
  Writer-->>RPC: optional checksum
  RPC-->>CLI: result
  CLI-->>User: export completed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add --format flag to forest-cli snapshot export #5835 — Modifies chain export functions and RPC plumbing; directly overlaps signature and export changes.
fix: reduce db read ops in chain export #5868 — Adjusts UnorderedChainStream task/queue handling in src/ipld/util.rs, overlapping with the unordered stream refactor.
chore: refactor and simplify stream_graph and unordered_stream_graph #5893 — Further changes to unordered stream construction (unordered_stream_chain/unordered_stream_graph) related to this PR.

Suggested labels

RPC

Suggested reviewers

akaladarshi
LesnyRumcajs
sudo-shashank

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 07f0a58 and d9f927c.

📒 Files selected for processing (1)

src/tool/subcommands/archive_cmd.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/tool/subcommands/archive_cmd.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: Build Ubuntu
GitHub Check: All lint checks
GitHub Check: Analyze (go)
GitHub Check: Analyze (rust)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/snapshot-export-unordered

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

CHANGELOG.md (1)
30-33: Align verb tense with existing changelog style

Most entries within the “Added” section use the imperative “Add …” rather than the past-tense “Added …”. Tweaking the verb keeps the style consistent.
- - [#5867](https://github.com/ChainSafe/forest/pull/5867) Added `--unordered` to `forest-cli snapshot export` for exporting `CAR` blocks in non-deterministic order for better performance with more parallelization.
+ - [#5867](https://github.com/ChainSafe/forest/pull/5867) Add `--unordered` to `forest-cli snapshot export` for exporting `CAR` blocks in a non-deterministic order to improve parallelisation performance.
src/chain/mod.rs (1)
66-84: Optimize duplicate Arc::clone in unordered branch.

The db is cloned twice in the unordered branch but only once in the ordered branch. Consider storing the cloned Arc to avoid redundant cloning.
        if unordered {
+           let db_clone = Arc::clone(db);
            futures::future::Either::Left(
                unordered_stream_chain(
-                   Arc::clone(db),
-                   tipset.clone().chain_owned(Arc::clone(db)),
+                   db_clone.clone(),
+                   tipset.clone().chain_owned(db_clone),
                    stateroot_lookup_limit,
                )
                .with_seen(seen),
            )
        } else {
src/ipld/util.rs (1)

413-413: Consider making worker count configurable.

The worker count is hard-coded to clamp between 1 and 8. Consider making this configurable through ExportOptions for better flexibility in different environments.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19c11c8 and bc0e6ff.

📒 Files selected for processing (10)

CHANGELOG.md (1 hunks)
src/blocks/header.rs (2 hunks)
src/chain/mod.rs (4 hunks)
src/cli/subcommands/snapshot_cmd.rs (6 hunks)
src/db/gc/snapshot.rs (2 hunks)
src/ipld/util.rs (15 hunks)
src/rpc/methods/chain.rs (6 hunks)
src/rpc/mod.rs (1 hunks)
src/tool/subcommands/api_cmd/test_snapshots_ignored.txt (1 hunks)
src/tool/subcommands/archive_cmd.rs (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: cargo-publish-dry-run
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: All lint checks

🔇 Additional comments (23)

src/tool/subcommands/api_cmd/test_snapshots_ignored.txt (1)

81-81: Addition keeps the ignore list consistent – LGTM

Including Forest.ChainExport prevents the new RPC from polluting the snapshot baseline and the placement preserves the alphabetical order of the Forest.* block. No further action required.

src/db/gc/snapshot.rs (2)

46-46: LGTM: Clean import update for refactored export API.

The import change from CidHashSet to ExportOptions correctly aligns with the export API refactoring.

228-231: LGTM: Proper adaptation to new ExportOptions struct.

The change correctly uses the new ExportOptions struct while maintaining the same behavior:

skip_checksum: true preserves the previous behavior

..Default::default() ensures unordered: false (deterministic order for GC operations)

The API change is handled cleanly without affecting functionality

src/rpc/mod.rs (1)

81-81: LGTM: Proper registration of new ForestChainExport RPC method.

The addition correctly integrates the new ForestChainExport method into the RPC framework, ensuring it's properly registered and available for use.

src/tool/subcommands/archive_cmd.rs (2)

30-30: LGTM: Correct import for refactored export API.

The import of ExportOptions aligns with the export API refactoring across the codebase.

514-525: LGTM: Proper adaptation to ExportOptions struct.

The export call correctly uses the new ExportOptions struct:

Maintains skip_checksum: true behavior

Preserves the seen parameter for diff export functionality

Uses appropriate defaults for other options (e.g., unordered: false for deterministic archive export)

The function signature and behavior remain unchanged from the caller's perspective

src/blocks/header.rs (2)

17-18: LGTM: Updated imports for improved CID computation.

The import changes support the new car_block method implementation with proper multihash handling.

Also applies to: 22-22

100-110: LGTM: Efficient car_block method with CID caching.

The new car_block method and updated cid method provide an efficient way to get both CID and serialized data:

Avoids duplicate serialization when both CID and data are needed (common in export scenarios)

Uses proper CID v1 with DAG_CBOR codec and Blake2b-256 multihash

The cid method maintains backward compatibility by delegating to car_block

Error handling is appropriate with expect("Infallible") for the well-tested serialization path

src/cli/subcommands/snapshot_cmd.rs (6)

9-9: LGTM: Updated imports for new export functionality.

The imports correctly support the new ForestChainExportParams, timing functionality, and numeric operations for the enhanced export command.

Also applies to: 14-16

38-40: LGTM: Well-documented unordered flag.

The new unordered flag is properly documented, explaining its purpose for enabling non-deterministic traversal to improve performance through better parallelization.

53-53: LGTM: Proper parameter struct usage.

The code correctly uses the new ForestChainExportParams struct and includes the unordered flag. The parameter mapping maintains all existing functionality while adding the new capability.

Also applies to: 89-94

100-100: LGTM: Enhanced progress reporting with export speed.

The progress reporting improvements are well-implemented:

Records start time before spawning the progress task

Calculates export speed as bytes per second

Handles division by zero correctly

Uses appropriate interval (0.5 seconds) for progress updates

Provides useful feedback to users during long-running exports

Also applies to: 105-105, 117-127

136-136: LGTM: Updated to use new ForestChainExport RPC method.

The RPC call correctly uses the new ForestChainExport method with proper timeout handling for long-running snapshot exports.

142-147: LGTM: Proper dry run handling.

The dry run logic correctly prevents file operations when dry_run is true:

Skips checksum file creation

Skips persisting the temporary file

Maintains the export process for testing without side effects

src/chain/mod.rs (2)

20-25: LGTM!

The ExportOptions struct is well-designed with appropriate derives and sensible defaults.

27-38: LGTM!

The function correctly forwards the ExportOptions to the export function.

src/rpc/methods/chain.rs (3)

225-298: LGTM!

The ForestChainExport RPC method is well-implemented with proper concurrency control, input validation, and error handling.

300-337: LGTM!

The wrapper correctly maintains backward compatibility by delegating to ForestChainExport with unordered: false.

886-898: LGTM!

The ForestChainExportParams struct is properly defined with appropriate serde and schema annotations.

src/ipld/util.rs (4)

103-108: LGTM!

The enhancement to carry optional block data in the Emit variant is a good optimization to avoid redundant database fetches.

122-130: LGTM!

The builder-style methods follow Rust conventions and provide a clean API.

220-226: Good optimization for vector reservation.

The check for non-empty vector before reserving capacity is a good performance optimization.

290-307: LGTM!

The Fuse wrapper and PinnedDrop implementation ensure proper cleanup and stream termination.

src/ipld/util.rs

…ordered

akaladarshi

Some comments otherwise LGTM. Very nice work 👌🏽

akaladarshi · 2025-07-30T09:33:05Z

src/ipld/util.rs

-
-                                        for v in new_values.into_iter().rev() {
-                                            cid_vec.push_front(v)
+                                        if !new_values.is_empty() {


nit:

if new_values.is_empty() { continue }

That changes the logic, it always returns sth here

akaladarshi · 2025-07-30T10:16:56Z

src/ipld/util.rs

+    fail_on_dead_links: bool,
+) -> UnorderedChainStream<'a, DB, ITER> {
    let (sender, receiver) = flume::bounded(BLOCK_CHANNEL_LIMIT);
    let (extract_sender, extract_receiver) = flume::unbounded();


Can we also use the flume::bounded for extract sender and receiver ?

Let's tackle this in another PR

akaladarshi · 2025-07-30T10:18:36Z

src/ipld/util.rs

            let mut handles = JoinSet::new();
-
-            for _ in 0..num_cpus::get() {
+            for _ in 0..num_cpus::get().clamp(1, 8) {


Do you think it makes sense to accept this as params as well?

Basically allow user to pass the worker count otherwise use current values as default?

I think it's our responsibility to tune this value. It might turn out this unordered_stream_chain is not fast and we end up removing it.

akaladarshi · 2025-07-30T10:21:14Z

src/ipld/util.rs

                                        let mut new_values = extract_cids(&data)?;
-                                        cid_vec.append(&mut new_values);
+                                        if !new_values.is_empty() {
+                                            cid_vec.append(&mut new_values);


Refactor it to:

let new_values = extract_cids(&data)?; if !new_values.is_empty() { cid_vec.reserve(new_values.len()); cid_vec.extend(new_values); }

This has been done internally in Vec::append

https://doc.rust-lang.org/stable/src/alloc/vec/mod.rs.html#2579

pub fn append(&mut self, other: &mut Self) { unsafe { self.append_elements(other.as_slice() as _); other.set_len(0); } } /// Appends elements to `self` from other buffer. #[cfg(not(no_global_oom_handling))] #[inline] #[track_caller] unsafe fn append_elements(&mut self, other: *const [T]) { let count = other.len(); self.reserve(count); let len = self.len(); unsafe { ptr::copy_nonoverlapping(other as *const T, self.as_mut_ptr().add(len), count) }; self.len += count; }

akaladarshi · 2025-07-30T10:39:27Z

src/ipld/util.rs

                                        let mut new_values = extract_cids(&data)?;
-                                        cid_vec.append(&mut new_values);
+                                        if !new_values.is_empty() {
+                                            cid_vec.append(&mut new_values);


Do you think cid_vec could grow very large since we are also sending the state_root ?

Should we have batch processing here?

Something like this:

const MAX_CID_QUEUE_SIZE: usize = 1000; if cid_vec.len() > MAX_CID_QUEUE_SIZE { for cid in cid_vec.drain(MAX_CID_QUEUE_SIZE..) { let _ = extract_sender.send(cid); } }

extract_sender is bounded, and this will likely get stuck. Actually the original code gets stuck on CI and I have to change it to send_async

akaladarshi · 2025-07-30T10:48:03Z

src/ipld/util.rs

-                            )
-                        }
+                    } else if this.extract_sender.is_empty() {
+                        this.worker_handle.abort();


Right now we are aborting without checking if the workers are still processing the existing load.

Do you think we should make sure that all the workers have finished before aborting it here, we can send signal from the worker side (once all tipset tasks are done) and then abort.

Great point. The old logic did not handle the shutdown gracefully indeed. Fixed by closing the extract_sender instead of forcing worker_handle.abort

akaladarshi · 2025-07-30T10:57:58Z

Also do you thinks it's possible to divide the UnorderedChainStream poll_next function into small chunks, since there are lot of if else blocks there.

Just a suggestion, we can use introduce couple of different structs to handle different tasks:

struct WorkerPool {
    handle: JoinHandle<anyhow::Result<()>>,
    work_sender: flume::Sender<WorkItem>,
    shutdown_sender: Option<oneshot::Sender<()>>,
}

enum WorkItem {
    Process(Cid),
    Shutdown,
}

hanabi1224 · 2025-07-30T12:26:54Z

Hey @akaladarshi Thanks for your review! Regarding stream_chain and unordered_stream_chain refactoring, I've extracted the changes into #5868 please take a look. Your ideas on furture refactoring unordered_stream_chain sound great but it's out of scope of this PR. This PR tries to revive the dead unordered_stream_chain function so that we can test it via forest-cli. I'd suggest we make decisions to either further refactor it or remove it after proving its correctness and usefulness. What do you think?

akaladarshi · 2025-07-30T12:31:13Z

@hanabi1224 I didn't realise this just for reviving the dead code. We can hold on to the changes for now, we can make them once we are sure this what we are going with.

I will approve the PR, we can revisit the comments later.

…ordered

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

scripts/tests/calibnet_eth_mapping_check.sh (1)
25-26: Off-by-one: loop iterates 201 tipsets instead of the requested 200
for ((i=0; i<=NUM_TIPSETS; i++)); do includes both 0 and 200.
If you really want exactly $NUM_TIPSETS tipsets, use a strict < upper bound.
-for ((i=0; i<=NUM_TIPSETS; i++)); do
+for ((i=0; i<NUM_TIPSETS; i++)); do

🧹 Nitpick comments (4)

scripts/tests/calibnet_eth_mapping_check.sh (1)
35-41: Redundant jq check adds needless subprocess work
jq -e already exits with status 0 when the value is neither null nor false.
The string comparison is superfluous—just test the exit code.
-  if [[ $(echo "$JSON" | jq -e '.result.transactions') != "null" ]]; then
+  if echo "$JSON" | jq -e '.result.transactions' >/dev/null; then
.github/workflows/forest.yml (1)

309-330: Consider deduplicating the export-check jobs with a matrix
All three snapshot jobs differ only by one script path. A small matrix would
shrink ~70 duplicated lines and ease future maintenance.
scripts/tests/calibnet_export_unordered_check.sh (2)
15-16: Typo in log message
unordred → unordered.
-echo "Exporting zstd compressed snapshot with unordred graph traversal"
+echo "Exporting zstd compressed snapshot with unordered graph traversal"
30-32: Hard-coded filename couples steps unnecessarily
Import path is fixed to unordered.forest.car.zst; using $(ls *.car.zst)
would make the step resilient to future name changes.
-$FOREST_PATH ... --import-snapshot unordered.forest.car.zst
+$FOREST_PATH ... --import-snapshot "$(ls *.car.zst)"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 494fb1c and 2e73539.

📒 Files selected for processing (11)

.github/workflows/forest.yml (1 hunks)
CHANGELOG.md (1 hunks)
scripts/tests/calibnet_eth_mapping_check.sh (1 hunks)
scripts/tests/calibnet_export_unordered_check.sh (1 hunks)
scripts/tests/harness.sh (1 hunks)
src/chain/mod.rs (7 hunks)
src/chain/tests.rs (1 hunks)
src/cli/subcommands/snapshot_cmd.rs (4 hunks)
src/ipld/util.rs (9 hunks)
src/rpc/methods/chain.rs (7 hunks)
src/tool/subcommands/archive_cmd.rs (2 hunks)

✅ Files skipped from review due to trivial changes (1)

scripts/tests/harness.sh

🚧 Files skipped from review as they are similar to previous changes (6)

src/cli/subcommands/snapshot_cmd.rs
src/tool/subcommands/archive_cmd.rs
src/rpc/methods/chain.rs
CHANGELOG.md
src/ipld/util.rs
src/chain/mod.rs

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.

📚 Learning: in forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic rpc service...

Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.

Applied to files:

scripts/tests/calibnet_eth_mapping_check.sh
scripts/tests/calibnet_export_unordered_check.sh

📚 Learning: in the forest codebase, `vec` can be used as an `asyncwrite` implementation in test contexts. th...

Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.

Applied to files:

src/chain/tests.rs

🧬 Code Graph Analysis (1)

scripts/tests/calibnet_eth_mapping_check.sh (1)

scripts/tests/harness.sh (1)

forest_wait_for_healthcheck_ready (100-103)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: tests
GitHub Check: tests-release
GitHub Check: Build forest binaries on Linux AMD64

🔇 Additional comments (3)

src/chain/tests.rs (1)

62-62: LGTM! Function signature updates align with the API refactor.

The changes correctly update the test function calls to use the new unified ExportOptions parameter. Passing None maintains the default behavior (no checksum skipping, ordered export, empty seen set), ensuring the tests continue to validate the same functionality while adapting to the refactored interface.

Also applies to: 65-65

scripts/tests/calibnet_eth_mapping_check.sh (1)

49-49: Good call: wait for full back-fill before mapping checks
Replacing the ad-hoc health-check command with the helper ensures DB
back-fill is complete, preventing flaky failures in the mapping tests.
scripts/tests/calibnet_export_unordered_check.sh (1)

8-14: Safety: wildcard delete may remove unintended files
rm --force --verbose ./*.{car,car.zst,sha256sum} expands to the literal
patterns when no matches exist, causing an error with set -eu.
Guard with shopt -s nullglob or quote the glob.
+shopt -s nullglob
 rm --force --verbose ./*.{car,car.zst,sha256sum}
+shopt -u nullglob

.github/workflows/forest.yml

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/ipld/util.rs (1)

293-297: Graceful worker shutdown recommended.

PinnedDrop simply calls abort() on the worker task; this can leak in-flight
work and leave the DB in inconsistent state.
Consider:

Dropping extract_sender first so workers exit naturally.

worker_handle.abort() followed by await ing the join handle (with a short
timeout) to ensure cleanup completes.

This will reduce resource leaks and make panics inside workers visible to the
caller.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ccb41d and 10241f8.

📒 Files selected for processing (3)

src/db/gc/snapshot.rs (2 hunks)
src/ipld/util.rs (9 hunks)
src/rpc/methods/chain.rs (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/db/gc/snapshot.rs
src/rpc/methods/chain.rs

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.

📚 Learning: 2025-08-06T15:42:43.520Z

Learnt from: LesnyRumcajs
PR: ChainSafe/forest#5907
File: Cargo.toml:25-36
Timestamp: 2025-08-06T15:42:43.520Z
Learning: The fvm_ipld_amt crate version 0.7.5 is required for the Forest project, but it's not yet published to crates.io, which is why the Git patch pointing to the ref-fvm repository is still necessary.

Applied to files:

src/ipld/util.rs

📚 Learning: 2025-08-06T15:42:43.520Z

Learnt from: LesnyRumcajs
PR: ChainSafe/forest#5907
File: Cargo.toml:25-36
Timestamp: 2025-08-06T15:42:43.520Z
Learning: Forest project requires fvm_ipld_amt version 0.7.5 functionality which is available in the Git revision a633547ae414a333b2d076beef87d4d30cdb7fb4 of the ref-fvm repository, but version 0.7.5 is not yet published to crates.io (only 0.7.4 is available). The Git patch must remain until 0.7.5 is published.

Applied to files:

src/ipld/util.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Build MacOS
GitHub Check: tests-release
GitHub Check: rubocop
GitHub Check: Check
GitHub Check: Analyze (rust)
GitHub Check: Analyze (go)

src/ipld/util.rs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/cli/subcommands/snapshot_cmd.rs (2)

39-41: Clarify nondeterminism and checksum expectations for --unordered.

Unordered export produces a non-reproducible CAR (block order varies), so file checksums will differ across runs. Make this explicit in the flag help to avoid surprises.

Consider:

-        /// Traverse chain in non-deterministic order for better performance with more parallelization.
+        /// Traverse chain in non-deterministic order for better performance with more parallelization.
+        /// Note: output CAR order (and thus file checksum) is non-deterministic and will vary across runs.

136-141: Dry-run: print explicit outcome and defensively gate checksum save.

Minor UX tweak: explicitly state that dry-run writes nothing. Also guard saving checksum by the local skip_checksum flag to avoid accidental writes if the server returns a hash unexpectedly.

-                if !dry_run {
-                    if let Some(hash) = hash_result {
-                        save_checksum(&output_path, hash).await?;
-                    }
-                    temp_path.persist(output_path)?;
-                }
+                if !dry_run {
+                    if !skip_checksum {
+                        if let Some(hash) = hash_result {
+                            save_checksum(&output_path, hash).await?;
+                        }
+                    }
+                    temp_path.persist(output_path)?;
+                } else {
+                    println!("Dry run: no files were written and no checksum saved.");
+                }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2158980 and 6072720.

📒 Files selected for processing (1)

src/cli/subcommands/snapshot_cmd.rs (4 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/cli/subcommands/snapshot_cmd.rs (3)

src/blocks/tipset.rs (1)

hash (470-472)

src/blocks/block.rs (1)

hash (25-27)

src/shim/sector.rs (1)

hash (344-351)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests
GitHub Check: Check
GitHub Check: tests-release
GitHub Check: All lint checks
GitHub Check: cargo-publish-dry-run
GitHub Check: Build MacOS
GitHub Check: Build Ubuntu
GitHub Check: Analyze (rust)
GitHub Check: Analyze (go)

🔇 Additional comments (2)

src/cli/subcommands/snapshot_cmd.rs (2)

57-57: LGTM: flag plumbed through CLI.

The new unordered flag is correctly threaded through the export args.

100-103: End-to-End unordered Support Verified

All paths correctly propagate and honor the unordered flag:

CLI (snapshot_cmd.rs) exposes --unordered and sends it in ForestChainExportParams.

RPC (ForestChainExportParams) includes pub unordered: bool and passes it into ExportOptions unchanged.

Both V1 (export) and V2 (export_v2) calls receive ExportOptions { unordered, … } and select between unordered_stream_chain and the default ordered stream.

Utility layer (ipld::unordered_stream_chain/_graph) and existing tests/scripts (e.g. calibnet_export_unordered_check.sh, benchmark_unordered_graph_traversal) exercise unordered exports.

No further changes required.

…ordered

hanabi1224 added 4 commits July 24, 2025 22:00

feat: add --unordered to forest-cli snapshot export

ef0ada2

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

53cf68d

…ordered

less db read ops

a45e66e

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

5ae2f56

…ordered

changelog

bc0e6ff

hanabi1224 marked this pull request as ready for review July 28, 2025 11:46

hanabi1224 requested a review from a team as a code owner July 28, 2025 11:46

hanabi1224 requested review from akaladarshi and sudo-shashank and removed request for a team July 28, 2025 11:46

coderabbitai bot reviewed Jul 28, 2025

View reviewed changes

src/ipld/util.rs Outdated Show resolved Hide resolved

hanabi1224 mentioned this pull request Jul 28, 2025

fix: reduce db read ops in chain export #5868

Merged

4 tasks

fix

5023a7e

hanabi1224 mentioned this pull request Jul 28, 2025

feat: print average speed in forest-cli snapshot export #5869

Merged

4 tasks

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

494fb1c

…ordered

sudo-shashank previously approved these changes Jul 30, 2025

View reviewed changes

akaladarshi requested changes Jul 30, 2025

View reviewed changes

akaladarshi previously approved these changes Jul 30, 2025

View reviewed changes

Merge branch 'main' into hm/snapshot-export-unordered

10d696e

hanabi1224 marked this pull request as draft July 30, 2025 13:00

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

066fb5f

…ordered

hanabi1224 dismissed stale reviews from akaladarshi and sudo-shashank via 066fb5f August 4, 2025 11:45

hanabi1224 mentioned this pull request Aug 4, 2025

chore: refactor and simplify stream_graph and unordered_stream_graph #5893

Merged

4 tasks

hanabi1224 added 2 commits August 4, 2025 21:22

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

90d06e9

…ordered

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

fe42754

…ordered

fix

f3897d3

hanabi1224 force-pushed the hm/snapshot-export-unordered branch from 9a8ab04 to f3897d3 Compare August 7, 2025 14:39

Merge branch 'main' into hm/snapshot-export-unordered

2e73539

hanabi1224 marked this pull request as ready for review August 7, 2025 15:39

coderabbitai bot reviewed Aug 7, 2025

View reviewed changes

.github/workflows/forest.yml Show resolved Hide resolved

hanabi1224 added 3 commits August 8, 2025 00:00

resolve AI comment

0ccb41d

Merge branch 'main' into hm/snapshot-export-unordered

18a82fa

fix lint errors

10241f8

coderabbitai bot reviewed Aug 8, 2025

View reviewed changes

src/ipld/util.rs Outdated Show resolved Hide resolved

src/ipld/util.rs Show resolved Hide resolved

hanabi1224 added 2 commits August 8, 2025 20:57

resolve AI comments

2158980

Merge branch 'main' into hm/snapshot-export-unordered

6072720

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

hanabi1224 added 4 commits August 12, 2025 21:32

Merge branch 'main' into hm/snapshot-export-unordered

08052ae

Merge branch 'main' into hm/snapshot-export-unordered

0265659

Merge remote-tracking branch 'origin/main' into hm/snapshot-export-un…

07f0a58

…ordered

Merge branch 'main' into hm/snapshot-export-unordered

50c6105

hanabi1224 requested review from akaladarshi and sudo-shashank August 14, 2025 14:22

Merge branch 'main' into hm/snapshot-export-unordered

d9f927c

sudo-shashank approved these changes Aug 18, 2025

View reviewed changes

hanabi1224 added the RPC requires calibnet RPC checks to run on CI label Aug 18, 2025

akaladarshi approved these changes Aug 18, 2025

View reviewed changes

hanabi1224 added this pull request to the merge queue Aug 18, 2025

Merged via the queue into main with commit 442e8e1 Aug 18, 2025
77 checks passed

hanabi1224 deleted the hm/snapshot-export-unordered branch August 18, 2025 14:30

coderabbitai bot mentioned this pull request Aug 28, 2025

chore: remove unordered chain traversal #6014

Merged

4 tasks

coderabbitai bot mentioned this pull request Sep 19, 2025

feat: allow export spine-only snapshots with forest-cli snapshot export -d 0 #6097

Merged

4 tasks

This was referenced Oct 8, 2025

Add new subcommands for forest-cli snasphot #6128

Merged

feat: forest-tool db import #6161

Merged

feat: add --unordered to forest-cli snapshot export #5867

feat: add --unordered to forest-cli snapshot export #5867

Uh oh!

Conversation

hanabi1224 commented Jul 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

akaladarshi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akaladarshi commented Jul 30, 2025

Uh oh!

hanabi1224 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akaladarshi commented Jul 30, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

feat: add `--unordered` to `forest-cli snapshot export` #5867

feat: add `--unordered` to `forest-cli snapshot export` #5867

hanabi1224 commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

hanabi1224 commented Jul 30, 2025 •

edited

Loading