-
Notifications
You must be signed in to change notification settings - Fork 182
chore: remove unordered chain traversal #6014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoves unordered export/traversal across the codebase: deletes CI job and calibnet unordered export test script; drops unordered flags and parameters from CLI and RPC; removes unordered stream types/functions; refactors IPLD streaming to a deterministic DFS-based ChainStream; updates tool commands and benchmarks to use stream_graph and remove unordered variants. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant CLI as forest / forest-tool
participant RPC as RPC Server
participant Chain as Chain Export
participant IPLD as ipld::stream_chain
participant Writer as CAR Writer
User->>CLI: export snapshot (no unordered)
CLI->>RPC: ForestChainExport(params)
RPC->>Chain: build ExportOptions(skip_checksum, seen)
Chain->>IPLD: stream_chain(tipsets).with_seen(seen)
IPLD-->>Chain: DFS stream of blocks
loop For each block
Chain->>Writer: write block
end
Writer-->>CLI: export complete
CLI-->>User: done
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/chain/mod.rs (1)
34-38: Remove outdated--unorderedreferences in docs and CHANGELOG
Delete or update all--unorderedandunordered-graph-traversalsections indocs/docs/users/reference/cli.mdanddocs/docs/users/reference/cli.sh, and add a CHANGELOG entry noting the removal of the--unorderedflag fromforest-cli snapshot export.
🧹 Nitpick comments (1)
src/ipld/util.rs (1)
146-169: Constructor semantics clear; suggest clarifying docs on dead-link policy
stream_chain(errors on dead links) vsstream_graph(tolerates dead links) is a key semantic split. Consider a short rustdoc note on each to prevent misuse.pub fn stream_chain<DB: Blockstore, T: Borrow<Tipset>, ITER: Iterator<Item = T> + Unpin>( @@ ) -> ChainStream<DB, ITER> { - ChainStream { + // Errors on missing links (strict). Use `stream_graph` to ignore them. + ChainStream { @@ pub fn stream_graph<DB: Blockstore, T: Borrow<Tipset>, ITER: Iterator<Item = T> + Unpin>( @@ ) -> ChainStream<DB, ITER> { - stream_chain(db, tipset_iter, stateroot_limit).fail_on_dead_links(false) + // Tolerant traversal for inspection/diff pre-scan; missing links are ignored. + stream_chain(db, tipset_iter, stateroot_limit).fail_on_dead_links(false) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
.github/workflows/forest.yml(0 hunks)scripts/tests/calibnet_export_unordered_check.sh(0 hunks)src/chain/mod.rs(2 hunks)src/cli/subcommands/snapshot_cmd.rs(0 hunks)src/ipld/util.rs(1 hunks)src/rpc/methods/chain.rs(0 hunks)src/tool/subcommands/archive_cmd.rs(2 hunks)src/tool/subcommands/benchmark_cmd.rs(1 hunks)
💤 Files with no reviewable changes (4)
- scripts/tests/calibnet_export_unordered_check.sh
- .github/workflows/forest.yml
- src/cli/subcommands/snapshot_cmd.rs
- src/rpc/methods/chain.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-08T12:11:55.266Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5867
File: src/ipld/util.rs:461-487
Timestamp: 2025-08-08T12:11:55.266Z
Learning: Forest (src/ipld/util.rs, Rust): In UnorderedChainStream::poll_next, dropping `extract_sender` (when no more tipsets and the extract queue is empty) is the intended shutdown signal for workers. Any subsequent attempt to enqueue work after this drop is a logic error and should be treated as an error; do not change `send()` to ignore a missing sender.
Applied to files:
src/chain/mod.rssrc/ipld/util.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: All lint checks
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (9)
src/tool/subcommands/archive_cmd.rs (2)
39-39: Replace unordered traversal import with deterministic DFS — LGTMImport switch to
stream_graphaligns with the PR objective and new API.
551-558: Diff pre-scan now ignores dead links; verify downstream export won’t fail on missing keys
stream_graph(..)defaults tofail_on_dead_links(false). That’s desirable for building theseenset, but if any CIDs are missing in the diff base they won’t enterseenand could later causeexport(..)(which usesstream_chainwith failures enabled) to error. Please sanity-check with a diff where the base lacks some DAG nodes to ensure export still succeeds or fails intentionally.Would you like me to add a focused test that simulates a missing message/state-root in the diff base and asserts export behavior?
src/tool/subcommands/benchmark_cmd.rs (1)
10-10: Import update to new streaming APIs — LGTMBoth
stream_chainandstream_graphare used below; import set is correct.src/chain/mod.rs (2)
16-16: Switch tostream_chainimport — LGTMMatches the unified deterministic traversal approach.
159-165: Always-deterministic chain streaming with seen-filter — LGTM
stream_chain(...).with_seen(seen)simplifies control flow and preserves strict missing-link errors for exports.src/ipld/util.rs (4)
58-72: DFS iterator is simple and correctReversing pushes preserves intended DFS order while allowing extension.
171-189: SynchronousBlockstore::getinsidepoll_nextmay block the reactor
Blockstore::getis sync; on some stores (e.g., CAR-on-disk) this can perform IO in the poll path. If this stream is polled on a core Tokio runtime thread, it may reduce fairness. If practical, consider using a buffered prefetcher or documenting the expectation that callers wrap withpar_buffer/compression pipeline (as you do) to keep throughput acceptable.Do you want a micro-benchmark script comparing old unordered vs new DFS on large
.forest.car.zstinputs to quantify any regression across store types?
206-217: CID expansion preserves DFS and avoids revisits — LGTM
extract_cidson DAG-CBOR plusseen.insertensures deterministic, duplicate-free emission.
248-279: Genesis parent emission and depth gating mirror Lotus behaviorSpecial-casing epoch 0 and limiting deep walks by
stateroot_limitis correct and matches expectations.
Summary of changes
This PR removes unordered chain traversal for its performance gain not meeting expectation.
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes #6013
Other information and links
Change checklist
Summary by CodeRabbit