Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Aug 28, 2025

Summary of changes

This PR removes unordered chain traversal for its performance gain not meeting expectation.

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes #6013

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • Refactor
    • Removed unordered snapshot export option; export and benchmarking now use a deterministic DFS/ordered stream.
    • Deleted unordered graph traversal benchmark variant.
  • Tests
    • Removed the calibnet unordered snapshot end-to-end test.
  • Chores
    • Removed CI job and its integration-status dependency for unordered export checks.
  • Documentation
    • Removed related CLI benchmark command from docs and changelog entries updated.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

Walkthrough

Removes unordered export/traversal across the codebase: deletes CI job and calibnet unordered export test script; drops unordered flags and parameters from CLI and RPC; removes unordered stream types/functions; refactors IPLD streaming to a deterministic DFS-based ChainStream; updates tool commands and benchmarks to use stream_graph and remove unordered variants.

Changes

Cohort / File(s) Summary
CI & test script
.github/workflows/forest.yml, scripts/tests/calibnet_export_unordered_check.sh
Removed CI job and shell script that performed calibnet unordered snapshot export checks.
Chain export API
src/chain/mod.rs
Dropped unordered option and import; export always uses stream_chain(...).with_seen(...); ExportOptions no longer contains unordered.
CLI & RPC
src/cli/subcommands/snapshot_cmd.rs, src/rpc/methods/chain.rs
Removed --unordered/unordered from Snapshot export CLI and ForestChainExportParams; RPC no longer accepts or forwards unordered.
IPLD traversal refactor
src/ipld/util.rs
Replaced unordered multithreaded stream with deterministic DFS ChainStream; removed UnorderedChainStream, unordered_stream_chain, unordered_stream_graph; added DFS task queue, simplified concurrency and APIs.
Tooling & benchmarks
src/tool/subcommands/archive_cmd.rs, src/tool/subcommands/benchmark_cmd.rs, docs/docs/users/reference/cli.sh, CHANGELOG.md
Switched to stream_graph for export diffs; removed unordered benchmark variant and related docs/changelog entries.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant CLI as forest / forest-tool
  participant RPC as RPC Server
  participant Chain as Chain Export
  participant IPLD as ipld::stream_chain
  participant Writer as CAR Writer

  User->>CLI: export snapshot (no unordered)
  CLI->>RPC: ForestChainExport(params)
  RPC->>Chain: build ExportOptions(skip_checksum, seen)
  Chain->>IPLD: stream_chain(tipsets).with_seen(seen)
  IPLD-->>Chain: DFS stream of blocks
  loop For each block
    Chain->>Writer: write block
  end
  Writer-->>CLI: export complete
  CLI-->>User: done
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective Addressed Explanation
Remove fn unordered_stream_chain and fn unordered_stream_graph (#6013)
Deprecate unordered traversal across interfaces (CLI/RPC/benchmarks) (#6013)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Remove calibnet unordered export CI job (.github/workflows/forest.yml) CI job removal is operational/CI maintenance and not explicitly required by #6013.
Delete calibnet unordered export test script (scripts/tests/calibnet_export_unordered_check.sh) Removing the specific test script is not requested by #6013, which targets code-level deprecation.

Possibly related PRs

Suggested labels

RPC

Suggested reviewers

  • LesnyRumcajs
  • elmattic
  • akaladarshi

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between abccbf2 and e3ebbf8.

📒 Files selected for processing (2)
  • CHANGELOG.md (1 hunks)
  • docs/docs/users/reference/cli.sh (0 hunks)
💤 Files with no reviewable changes (1)
  • docs/docs/users/reference/cli.sh
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Ubuntu
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: All lint checks
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/remove-unordered-chain-traversal

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/chain/mod.rs (1)

34-38: Remove outdated --unordered references in docs and CHANGELOG
Delete or update all --unordered and unordered-graph-traversal sections in docs/docs/users/reference/cli.md and docs/docs/users/reference/cli.sh, and add a CHANGELOG entry noting the removal of the --unordered flag from forest-cli snapshot export.

🧹 Nitpick comments (1)
src/ipld/util.rs (1)

146-169: Constructor semantics clear; suggest clarifying docs on dead-link policy

stream_chain (errors on dead links) vs stream_graph (tolerates dead links) is a key semantic split. Consider a short rustdoc note on each to prevent misuse.

 pub fn stream_chain<DB: Blockstore, T: Borrow<Tipset>, ITER: Iterator<Item = T> + Unpin>(
@@
 ) -> ChainStream<DB, ITER> {
-    ChainStream {
+    // Errors on missing links (strict). Use `stream_graph` to ignore them.
+    ChainStream {
@@
 pub fn stream_graph<DB: Blockstore, T: Borrow<Tipset>, ITER: Iterator<Item = T> + Unpin>(
@@
 ) -> ChainStream<DB, ITER> {
-    stream_chain(db, tipset_iter, stateroot_limit).fail_on_dead_links(false)
+    // Tolerant traversal for inspection/diff pre-scan; missing links are ignored.
+    stream_chain(db, tipset_iter, stateroot_limit).fail_on_dead_links(false)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 15cdbb3 and abccbf2.

📒 Files selected for processing (8)
  • .github/workflows/forest.yml (0 hunks)
  • scripts/tests/calibnet_export_unordered_check.sh (0 hunks)
  • src/chain/mod.rs (2 hunks)
  • src/cli/subcommands/snapshot_cmd.rs (0 hunks)
  • src/ipld/util.rs (1 hunks)
  • src/rpc/methods/chain.rs (0 hunks)
  • src/tool/subcommands/archive_cmd.rs (2 hunks)
  • src/tool/subcommands/benchmark_cmd.rs (1 hunks)
💤 Files with no reviewable changes (4)
  • scripts/tests/calibnet_export_unordered_check.sh
  • .github/workflows/forest.yml
  • src/cli/subcommands/snapshot_cmd.rs
  • src/rpc/methods/chain.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.

Applied to files:

  • src/chain/mod.rs
  • src/ipld/util.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: All lint checks
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (9)
src/tool/subcommands/archive_cmd.rs (2)

39-39: Replace unordered traversal import with deterministic DFS — LGTM

Import switch to stream_graph aligns with the PR objective and new API.


551-558: Diff pre-scan now ignores dead links; verify downstream export won’t fail on missing keys

stream_graph(..) defaults to fail_on_dead_links(false). That’s desirable for building the seen set, but if any CIDs are missing in the diff base they won’t enter seen and could later cause export(..) (which uses stream_chain with failures enabled) to error. Please sanity-check with a diff where the base lacks some DAG nodes to ensure export still succeeds or fails intentionally.

Would you like me to add a focused test that simulates a missing message/state-root in the diff base and asserts export behavior?

src/tool/subcommands/benchmark_cmd.rs (1)

10-10: Import update to new streaming APIs — LGTM

Both stream_chain and stream_graph are used below; import set is correct.

src/chain/mod.rs (2)

16-16: Switch to stream_chain import — LGTM

Matches the unified deterministic traversal approach.


159-165: Always-deterministic chain streaming with seen-filter — LGTM

stream_chain(...).with_seen(seen) simplifies control flow and preserves strict missing-link errors for exports.

src/ipld/util.rs (4)

58-72: DFS iterator is simple and correct

Reversing pushes preserves intended DFS order while allowing extension.


171-189: Synchronous Blockstore::get inside poll_next may block the reactor

Blockstore::get is sync; on some stores (e.g., CAR-on-disk) this can perform IO in the poll path. If this stream is polled on a core Tokio runtime thread, it may reduce fairness. If practical, consider using a buffered prefetcher or documenting the expectation that callers wrap with par_buffer/compression pipeline (as you do) to keep throughput acceptable.

Do you want a micro-benchmark script comparing old unordered vs new DFS on large .forest.car.zst inputs to quantify any regression across store types?


206-217: CID expansion preserves DFS and avoids revisits — LGTM

extract_cids on DAG-CBOR plus seen.insert ensures deterministic, duplicate-free emission.


248-279: Genesis parent emission and depth gating mirror Lotus behavior

Special-casing epoch 0 and limiting deep walks by stateroot_limit is correct and matches expectations.

@hanabi1224 hanabi1224 marked this pull request as ready for review August 28, 2025 15:22
@hanabi1224 hanabi1224 requested a review from a team as a code owner August 28, 2025 15:22
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and akaladarshi and removed request for a team August 28, 2025 15:22
@hanabi1224 hanabi1224 added this pull request to the merge queue Sep 2, 2025
Merged via the queue into main with commit f1c5bd7 Sep 2, 2025
60 checks passed
@hanabi1224 hanabi1224 deleted the hm/remove-unordered-chain-traversal branch September 2, 2025 08:07
@coderabbitai coderabbitai bot mentioned this pull request Oct 16, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate unordered graph traversal as it shows no perf gain

4 participants