Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Jul 28, 2025

Summary of changes

(Will be ready for review after #5835 is merged)

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • New Features

    • Added an --unordered flag to snapshot export for non-deterministic CAR block ordering to improve export performance.
    • RPC export gained an unordered parameter to match CLI behavior.
  • Bug Fixes

    • Dry-run snapshot exports no longer write temporary files or checksums.
  • Documentation

    • Changelog updated to document unordered export.
  • Tests

    • Added automated calibnet check for unordered snapshot export/import.
  • Chores

    • CI now runs unordered snapshot export checks and reports results in integration status.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Walkthrough

Adds an --unordered export mode across CLI, RPC, and chain export codepaths via a new public ExportOptions struct; refactors unordered chain streaming to an async fused stream with lifetime and optional sender; updates callers/tests; and adds CI job + script to validate unordered snapshot export/import on calibnet.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Adds entry documenting new --unordered flag for forest-cli snapshot export.
Chain export API & callers
src/chain/mod.rs, src/chain/tests.rs, src/db/gc/snapshot.rs, src/tool/subcommands/archive_cmd.rs
Introduces public ExportOptions { skip_checksum, unordered, seen }; changes export entry signatures to accept Option<ExportOptions> (export_from_head, export, export_v2, export_to_forest_car); selects unordered_stream_chain vs stream_chain based on option; updates callers and tests to pass None or Some(ExportOptions{...}).
CLI snapshot command
src/cli/subcommands/snapshot_cmd.rs
Adds unordered boolean flag to SnapshotCommands::Export and includes it in ForestChainExportParams; skips temp file persistence/checksum when dry_run is set.
RPC export params & handlers
src/rpc/methods/chain.rs
Adds unordered: bool to ForestChainExportParams; builds and passes ExportOptions to chain export functions; ChainExport delegates to ForestChainExport with unordered: false.
Unordered streaming internals
src/ipld/util.rs
Refactors UnorderedChainStream to <'a> with flume::async::RecvStream<'a> fused stream, makes extract_sender optional, adds with_seen, clamps worker count, uses async sends, rewrites poll_next to poll fused stream and manage internal queue/error handling.
CI and test script
.github/workflows/forest.yml, scripts/tests/calibnet_export_unordered_check.sh
Adds calibnet-unordered-export-check GitHub Actions job and new test script to run unordered export, import snapshot, and validate node startup/sync/health.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as forest-cli
  participant RPC as Forest RPC
  participant Chain as chain::export
  participant IPLD as {stream_chain|unordered_stream_chain}
  participant Writer as AsyncWriterWithChecksum

  User->>CLI: snapshot export --unordered
  CLI->>RPC: ForestChainExport(unordered=true, params)
  RPC->>Chain: export(..., ExportOptions{unordered, skip_checksum, seen})
  alt unordered == true
    Chain->>IPLD: unordered_stream_chain(...).with_seen(seen)
  else
    Chain->>IPLD: stream_chain(...).with_seen(seen)
  end
  IPLD-->>Chain: CAR block stream
  Chain->>Writer: write frames (checksum based on skip_checksum)
  Writer-->>RPC: optional checksum
  RPC-->>CLI: result
  CLI-->>User: export completed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

RPC

Suggested reviewers

  • akaladarshi
  • LesnyRumcajs
  • sudo-shashank

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 07f0a58 and d9f927c.

📒 Files selected for processing (1)
  • src/tool/subcommands/archive_cmd.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/tool/subcommands/archive_cmd.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: All lint checks
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/snapshot-export-unordered

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224 hanabi1224 marked this pull request as ready for review July 28, 2025 11:46
@hanabi1224 hanabi1224 requested a review from a team as a code owner July 28, 2025 11:46
@hanabi1224 hanabi1224 requested review from akaladarshi and sudo-shashank and removed request for a team July 28, 2025 11:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
CHANGELOG.md (1)

30-33: Align verb tense with existing changelog style

Most entries within the “Added” section use the imperative “Add …” rather than the past-tense “Added …”. Tweaking the verb keeps the style consistent.

- - [#5867](https://github.com/ChainSafe/forest/pull/5867) Added `--unordered` to `forest-cli snapshot export` for exporting `CAR` blocks in non-deterministic order for better performance with more parallelization.
+ - [#5867](https://github.com/ChainSafe/forest/pull/5867) Add `--unordered` to `forest-cli snapshot export` for exporting `CAR` blocks in a non-deterministic order to improve parallelisation performance.
src/chain/mod.rs (1)

66-84: Optimize duplicate Arc::clone in unordered branch.

The db is cloned twice in the unordered branch but only once in the ordered branch. Consider storing the cloned Arc to avoid redundant cloning.

        if unordered {
+           let db_clone = Arc::clone(db);
            futures::future::Either::Left(
                unordered_stream_chain(
-                   Arc::clone(db),
-                   tipset.clone().chain_owned(Arc::clone(db)),
+                   db_clone.clone(),
+                   tipset.clone().chain_owned(db_clone),
                    stateroot_lookup_limit,
                )
                .with_seen(seen),
            )
        } else {
src/ipld/util.rs (1)

413-413: Consider making worker count configurable.

The worker count is hard-coded to clamp between 1 and 8. Consider making this configurable through ExportOptions for better flexibility in different environments.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19c11c8 and bc0e6ff.

📒 Files selected for processing (10)
  • CHANGELOG.md (1 hunks)
  • src/blocks/header.rs (2 hunks)
  • src/chain/mod.rs (4 hunks)
  • src/cli/subcommands/snapshot_cmd.rs (6 hunks)
  • src/db/gc/snapshot.rs (2 hunks)
  • src/ipld/util.rs (15 hunks)
  • src/rpc/methods/chain.rs (6 hunks)
  • src/rpc/mod.rs (1 hunks)
  • src/tool/subcommands/api_cmd/test_snapshots_ignored.txt (1 hunks)
  • src/tool/subcommands/archive_cmd.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
🔇 Additional comments (23)
src/tool/subcommands/api_cmd/test_snapshots_ignored.txt (1)

81-81: Addition keeps the ignore list consistent – LGTM

Including Forest.ChainExport prevents the new RPC from polluting the snapshot baseline and the placement preserves the alphabetical order of the Forest.* block. No further action required.

src/db/gc/snapshot.rs (2)

46-46: LGTM: Clean import update for refactored export API.

The import change from CidHashSet to ExportOptions correctly aligns with the export API refactoring.


228-231: LGTM: Proper adaptation to new ExportOptions struct.

The change correctly uses the new ExportOptions struct while maintaining the same behavior:

  • skip_checksum: true preserves the previous behavior
  • ..Default::default() ensures unordered: false (deterministic order for GC operations)
  • The API change is handled cleanly without affecting functionality
src/rpc/mod.rs (1)

81-81: LGTM: Proper registration of new ForestChainExport RPC method.

The addition correctly integrates the new ForestChainExport method into the RPC framework, ensuring it's properly registered and available for use.

src/tool/subcommands/archive_cmd.rs (2)

30-30: LGTM: Correct import for refactored export API.

The import of ExportOptions aligns with the export API refactoring across the codebase.


514-525: LGTM: Proper adaptation to ExportOptions struct.

The export call correctly uses the new ExportOptions struct:

  • Maintains skip_checksum: true behavior
  • Preserves the seen parameter for diff export functionality
  • Uses appropriate defaults for other options (e.g., unordered: false for deterministic archive export)
  • The function signature and behavior remain unchanged from the caller's perspective
src/blocks/header.rs (2)

17-18: LGTM: Updated imports for improved CID computation.

The import changes support the new car_block method implementation with proper multihash handling.

Also applies to: 22-22


100-110: LGTM: Efficient car_block method with CID caching.

The new car_block method and updated cid method provide an efficient way to get both CID and serialized data:

  • Avoids duplicate serialization when both CID and data are needed (common in export scenarios)
  • Uses proper CID v1 with DAG_CBOR codec and Blake2b-256 multihash
  • The cid method maintains backward compatibility by delegating to car_block
  • Error handling is appropriate with expect("Infallible") for the well-tested serialization path
src/cli/subcommands/snapshot_cmd.rs (6)

9-9: LGTM: Updated imports for new export functionality.

The imports correctly support the new ForestChainExportParams, timing functionality, and numeric operations for the enhanced export command.

Also applies to: 14-16


38-40: LGTM: Well-documented unordered flag.

The new unordered flag is properly documented, explaining its purpose for enabling non-deterministic traversal to improve performance through better parallelization.


53-53: LGTM: Proper parameter struct usage.

The code correctly uses the new ForestChainExportParams struct and includes the unordered flag. The parameter mapping maintains all existing functionality while adding the new capability.

Also applies to: 89-94


100-100: LGTM: Enhanced progress reporting with export speed.

The progress reporting improvements are well-implemented:

  • Records start time before spawning the progress task
  • Calculates export speed as bytes per second
  • Handles division by zero correctly
  • Uses appropriate interval (0.5 seconds) for progress updates
  • Provides useful feedback to users during long-running exports

Also applies to: 105-105, 117-127


136-136: LGTM: Updated to use new ForestChainExport RPC method.

The RPC call correctly uses the new ForestChainExport method with proper timeout handling for long-running snapshot exports.


142-147: LGTM: Proper dry run handling.

The dry run logic correctly prevents file operations when dry_run is true:

  • Skips checksum file creation
  • Skips persisting the temporary file
  • Maintains the export process for testing without side effects
src/chain/mod.rs (2)

20-25: LGTM!

The ExportOptions struct is well-designed with appropriate derives and sensible defaults.


27-38: LGTM!

The function correctly forwards the ExportOptions to the export function.

src/rpc/methods/chain.rs (3)

225-298: LGTM!

The ForestChainExport RPC method is well-implemented with proper concurrency control, input validation, and error handling.


300-337: LGTM!

The wrapper correctly maintains backward compatibility by delegating to ForestChainExport with unordered: false.


886-898: LGTM!

The ForestChainExportParams struct is properly defined with appropriate serde and schema annotations.

src/ipld/util.rs (4)

103-108: LGTM!

The enhancement to carry optional block data in the Emit variant is a good optimization to avoid redundant database fetches.


122-130: LGTM!

The builder-style methods follow Rust conventions and provide a clean API.


220-226: Good optimization for vector reservation.

The check for non-empty vector before reserving capacity is a good performance optimization.


290-307: LGTM!

The Fuse wrapper and PinnedDrop implementation ensure proper cleanup and stream termination.

sudo-shashank
sudo-shashank previously approved these changes Jul 30, 2025
Copy link
Collaborator

@akaladarshi akaladarshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments otherwise LGTM. Very nice work 👌🏽


for v in new_values.into_iter().rev() {
cid_vec.push_front(v)
if !new_values.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if new_values.is_empty() { continue }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That changes the logic, it always returns sth here

fail_on_dead_links: bool,
) -> UnorderedChainStream<'a, DB, ITER> {
let (sender, receiver) = flume::bounded(BLOCK_CHANNEL_LIMIT);
let (extract_sender, extract_receiver) = flume::unbounded();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also use the flume::bounded for extract sender and receiver ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's tackle this in another PR

let mut handles = JoinSet::new();

for _ in 0..num_cpus::get() {
for _ in 0..num_cpus::get().clamp(1, 8) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to accept this as params as well?

Basically allow user to pass the worker count otherwise use current values as default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's our responsibility to tune this value. It might turn out this unordered_stream_chain is not fast and we end up removing it.

src/ipld/util.rs Outdated
let mut new_values = extract_cids(&data)?;
cid_vec.append(&mut new_values);
if !new_values.is_empty() {
cid_vec.append(&mut new_values);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor it to:

let new_values = extract_cids(&data)?;
if !new_values.is_empty() {
    cid_vec.reserve(new_values.len());
    cid_vec.extend(new_values);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been done internally in Vec::append

https://doc.rust-lang.org/stable/src/alloc/vec/mod.rs.html#2579

    pub fn append(&mut self, other: &mut Self) {
        unsafe {
            self.append_elements(other.as_slice() as _);
            other.set_len(0);
        }
    }

    /// Appends elements to `self` from other buffer.
    #[cfg(not(no_global_oom_handling))]
    #[inline]
    #[track_caller]
    unsafe fn append_elements(&mut self, other: *const [T]) {
        let count = other.len();
        self.reserve(count);
        let len = self.len();
        unsafe { ptr::copy_nonoverlapping(other as *const T, self.as_mut_ptr().add(len), count) };
        self.len += count;
    }

src/ipld/util.rs Outdated
let mut new_values = extract_cids(&data)?;
cid_vec.append(&mut new_values);
if !new_values.is_empty() {
cid_vec.append(&mut new_values);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think cid_vec could grow very large since we are also sending the state_root ?

Should we have batch processing here?

Something like this:

const MAX_CID_QUEUE_SIZE: usize = 1000;

if cid_vec.len() > MAX_CID_QUEUE_SIZE {
    for cid in cid_vec.drain(MAX_CID_QUEUE_SIZE..) {
        let _ = extract_sender.send(cid);
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_sender is bounded, and this will likely get stuck. Actually the original code gets stuck on CI and I have to change it to send_async

src/ipld/util.rs Outdated
)
}
} else if this.extract_sender.is_empty() {
this.worker_handle.abort();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we are aborting without checking if the workers are still processing the existing load.

Do you think we should make sure that all the workers have finished before aborting it here, we can send signal from the worker side (once all tipset tasks are done) and then abort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. The old logic did not handle the shutdown gracefully indeed. Fixed by closing the extract_sender instead of forcing worker_handle.abort

@akaladarshi
Copy link
Collaborator

Also do you thinks it's possible to divide the UnorderedChainStream poll_next function into small chunks, since there are lot of if else blocks there.

Just a suggestion, we can use introduce couple of different structs to handle different tasks:

struct WorkerPool {
    handle: JoinHandle<anyhow::Result<()>>,
    work_sender: flume::Sender<WorkItem>,
    shutdown_sender: Option<oneshot::Sender<()>>,
}

enum WorkItem {
    Process(Cid),
    Shutdown,
}

@hanabi1224
Copy link
Contributor Author

hanabi1224 commented Jul 30, 2025

Hey @akaladarshi Thanks for your review! Regarding stream_chain and unordered_stream_chain refactoring, I've extracted the changes into #5868 please take a look. Your ideas on furture refactoring unordered_stream_chain sound great but it's out of scope of this PR. This PR tries to revive the dead unordered_stream_chain function so that we can test it via forest-cli. I'd suggest we make decisions to either further refactor it or remove it after proving its correctness and usefulness. What do you think?

@akaladarshi
Copy link
Collaborator

@hanabi1224 I didn't realise this just for reviving the dead code. We can hold on to the changes for now, we can make them once we are sure this what we are going with.

I will approve the PR, we can revisit the comments later.

akaladarshi
akaladarshi previously approved these changes Jul 30, 2025
@hanabi1224 hanabi1224 marked this pull request as draft July 30, 2025 13:00
@hanabi1224 hanabi1224 force-pushed the hm/snapshot-export-unordered branch from 9a8ab04 to f3897d3 Compare August 7, 2025 14:39
@hanabi1224 hanabi1224 marked this pull request as ready for review August 7, 2025 15:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
scripts/tests/calibnet_eth_mapping_check.sh (1)

25-26: Off-by-one: loop iterates 201 tipsets instead of the requested 200
for ((i=0; i<=NUM_TIPSETS; i++)); do includes both 0 and 200.
If you really want exactly $NUM_TIPSETS tipsets, use a strict < upper bound.

-for ((i=0; i<=NUM_TIPSETS; i++)); do
+for ((i=0; i<NUM_TIPSETS; i++)); do
🧹 Nitpick comments (4)
scripts/tests/calibnet_eth_mapping_check.sh (1)

35-41: Redundant jq check adds needless subprocess work
jq -e already exits with status 0 when the value is neither null nor false.
The string comparison is superfluous—just test the exit code.

-  if [[ $(echo "$JSON" | jq -e '.result.transactions') != "null" ]]; then
+  if echo "$JSON" | jq -e '.result.transactions' >/dev/null; then
.github/workflows/forest.yml (1)

309-330: Consider deduplicating the export-check jobs with a matrix
All three snapshot jobs differ only by one script path. A small matrix would
shrink ~70 duplicated lines and ease future maintenance.

scripts/tests/calibnet_export_unordered_check.sh (2)

15-16: Typo in log message
unordredunordered.

-echo "Exporting zstd compressed snapshot with unordred graph traversal"
+echo "Exporting zstd compressed snapshot with unordered graph traversal"

30-32: Hard-coded filename couples steps unnecessarily
Import path is fixed to unordered.forest.car.zst; using $(ls *.car.zst)
would make the step resilient to future name changes.

-$FOREST_PATH ... --import-snapshot unordered.forest.car.zst
+$FOREST_PATH ... --import-snapshot "$(ls *.car.zst)"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 494fb1c and 2e73539.

📒 Files selected for processing (11)
  • .github/workflows/forest.yml (1 hunks)
  • CHANGELOG.md (1 hunks)
  • scripts/tests/calibnet_eth_mapping_check.sh (1 hunks)
  • scripts/tests/calibnet_export_unordered_check.sh (1 hunks)
  • scripts/tests/harness.sh (1 hunks)
  • src/chain/mod.rs (7 hunks)
  • src/chain/tests.rs (1 hunks)
  • src/cli/subcommands/snapshot_cmd.rs (4 hunks)
  • src/ipld/util.rs (9 hunks)
  • src/rpc/methods/chain.rs (7 hunks)
  • src/tool/subcommands/archive_cmd.rs (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • scripts/tests/harness.sh
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/cli/subcommands/snapshot_cmd.rs
  • src/tool/subcommands/archive_cmd.rs
  • src/rpc/methods/chain.rs
  • CHANGELOG.md
  • src/ipld/util.rs
  • src/chain/mod.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.
📚 Learning: in forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic rpc service...
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.

Applied to files:

  • scripts/tests/calibnet_eth_mapping_check.sh
  • scripts/tests/calibnet_export_unordered_check.sh
📚 Learning: in the forest codebase, `vec` can be used as an `asyncwrite` implementation in test contexts. th...
Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.

Applied to files:

  • src/chain/tests.rs
🧬 Code Graph Analysis (1)
scripts/tests/calibnet_eth_mapping_check.sh (1)
scripts/tests/harness.sh (1)
  • forest_wait_for_healthcheck_ready (100-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
🔇 Additional comments (3)
src/chain/tests.rs (1)

62-62: LGTM! Function signature updates align with the API refactor.

The changes correctly update the test function calls to use the new unified ExportOptions parameter. Passing None maintains the default behavior (no checksum skipping, ordered export, empty seen set), ensuring the tests continue to validate the same functionality while adapting to the refactored interface.

Also applies to: 65-65

scripts/tests/calibnet_eth_mapping_check.sh (1)

49-49: Good call: wait for full back-fill before mapping checks
Replacing the ad-hoc health-check command with the helper ensures DB
back-fill is complete, preventing flaky failures in the mapping tests.

scripts/tests/calibnet_export_unordered_check.sh (1)

8-14: Safety: wildcard delete may remove unintended files
rm --force --verbose ./*.{car,car.zst,sha256sum} expands to the literal
patterns when no matches exist, causing an error with set -eu.
Guard with shopt -s nullglob or quote the glob.

+shopt -s nullglob
 rm --force --verbose ./*.{car,car.zst,sha256sum}
+shopt -u nullglob

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/ipld/util.rs (1)

293-297: Graceful worker shutdown recommended.

PinnedDrop simply calls abort() on the worker task; this can leak in-flight
work and leave the DB in inconsistent state.
Consider:

  1. Dropping extract_sender first so workers exit naturally.
  2. worker_handle.abort() followed by await ing the join handle (with a short
    timeout) to ensure cleanup completes.

This will reduce resource leaks and make panics inside workers visible to the
caller.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ccb41d and 10241f8.

📒 Files selected for processing (3)
  • src/db/gc/snapshot.rs (2 hunks)
  • src/ipld/util.rs (9 hunks)
  • src/rpc/methods/chain.rs (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/db/gc/snapshot.rs
  • src/rpc/methods/chain.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5835
File: src/chain/tests.rs:58-76
Timestamp: 2025-08-04T13:36:22.993Z
Learning: In the Forest codebase, `Vec<u8>` can be used as an `AsyncWrite` implementation in test contexts. The user confirmed that tests using `&mut Vec<u8>` with `export` and `export_v2` functions compile and pass both locally and on CI.
📚 Learning: 2025-08-06T15:42:43.520Z
Learnt from: LesnyRumcajs
PR: ChainSafe/forest#5907
File: Cargo.toml:25-36
Timestamp: 2025-08-06T15:42:43.520Z
Learning: The fvm_ipld_amt crate version 0.7.5 is required for the Forest project, but it's not yet published to crates.io, which is why the Git patch pointing to the ref-fvm repository is still necessary.

Applied to files:

  • src/ipld/util.rs
📚 Learning: 2025-08-06T15:42:43.520Z
Learnt from: LesnyRumcajs
PR: ChainSafe/forest#5907
File: Cargo.toml:25-36
Timestamp: 2025-08-06T15:42:43.520Z
Learning: Forest project requires fvm_ipld_amt version 0.7.5 functionality which is available in the Git revision a633547ae414a333b2d076beef87d4d30cdb7fb4 of the ref-fvm repository, but version 0.7.5 is not yet published to crates.io (only 0.7.4 is available). The Git patch must remain until 0.7.5 is published.

Applied to files:

  • src/ipld/util.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build MacOS
  • GitHub Check: tests-release
  • GitHub Check: rubocop
  • GitHub Check: Check
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/cli/subcommands/snapshot_cmd.rs (2)

39-41: Clarify nondeterminism and checksum expectations for --unordered.

Unordered export produces a non-reproducible CAR (block order varies), so file checksums will differ across runs. Make this explicit in the flag help to avoid surprises.

Consider:

-        /// Traverse chain in non-deterministic order for better performance with more parallelization.
+        /// Traverse chain in non-deterministic order for better performance with more parallelization.
+        /// Note: output CAR order (and thus file checksum) is non-deterministic and will vary across runs.

136-141: Dry-run: print explicit outcome and defensively gate checksum save.

Minor UX tweak: explicitly state that dry-run writes nothing. Also guard saving checksum by the local skip_checksum flag to avoid accidental writes if the server returns a hash unexpectedly.

-                if !dry_run {
-                    if let Some(hash) = hash_result {
-                        save_checksum(&output_path, hash).await?;
-                    }
-                    temp_path.persist(output_path)?;
-                }
+                if !dry_run {
+                    if !skip_checksum {
+                        if let Some(hash) = hash_result {
+                            save_checksum(&output_path, hash).await?;
+                        }
+                    }
+                    temp_path.persist(output_path)?;
+                } else {
+                    println!("Dry run: no files were written and no checksum saved.");
+                }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2158980 and 6072720.

📒 Files selected for processing (1)
  • src/cli/subcommands/snapshot_cmd.rs (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/cli/subcommands/snapshot_cmd.rs (3)
src/blocks/tipset.rs (1)
  • hash (470-472)
src/blocks/block.rs (1)
  • hash (25-27)
src/shim/sector.rs (1)
  • hash (344-351)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests
  • GitHub Check: Check
  • GitHub Check: tests-release
  • GitHub Check: All lint checks
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
src/cli/subcommands/snapshot_cmd.rs (2)

57-57: LGTM: flag plumbed through CLI.

The new unordered flag is correctly threaded through the export args.


100-103: End-to-End unordered Support Verified

All paths correctly propagate and honor the unordered flag:

  • CLI (snapshot_cmd.rs) exposes --unordered and sends it in ForestChainExportParams.
  • RPC (ForestChainExportParams) includes pub unordered: bool and passes it into ExportOptions unchanged.
  • Both V1 (export) and V2 (export_v2) calls receive ExportOptions { unordered, … } and select between unordered_stream_chain and the default ordered stream.
  • Utility layer (ipld::unordered_stream_chain/_graph) and existing tests/scripts (e.g. calibnet_export_unordered_check.sh, benchmark_unordered_graph_traversal) exercise unordered exports.

No further changes required.

@hanabi1224 hanabi1224 added the RPC requires calibnet RPC checks to run on CI label Aug 18, 2025
@hanabi1224 hanabi1224 added this pull request to the merge queue Aug 18, 2025
Merged via the queue into main with commit 442e8e1 Aug 18, 2025
77 checks passed
@hanabi1224 hanabi1224 deleted the hm/snapshot-export-unordered branch August 18, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants