-
Notifications
You must be signed in to change notification settings - Fork 182
fix: improve forest-tool snapshot validate error message
#5969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughConstrain noisy crates to warn in the default logger, include source text in tipset-state error messages, adjust snapshot epoch window by +1 and remove an explicit progress-bar drop, and enrich IPLD DFS traversal with an IterateType enum plus richer task context and error messages. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as SnapshotCmd
participant SM as StateManager
participant Log as Logger
User->>CLI: forest snapshot validate ...
CLI->>Log: create progress bar
CLI->>SM: validate_tipsets(ts, last_epoch = ts.epoch() - epochs + 1)
alt success
SM-->>CLI: Ok(())
CLI->>Log: pb.finish_with_message("✅ verified!")
CLI-->>User: Ok
else error
SM-->>CLI: Err(e)
CLI->>Log: pb.abandon_with_message("❌ Transaction result differs from Lotus: {e}")
CLI-->>User: bail(e)
end
sequenceDiagram
autonumber
participant DFS as IPLD_DFS
participant Store as Blockstore
Note over DFS,Store #E8F0FE: IterateType + richer task context (epoch, block CID)
DFS->>Store: fetch(root_cid)
alt block with messages
DFS->>DFS: enqueue Iterate(epoch, block_cid, IterateType.Message(messages_cid), msg_queue)
else block with state root
DFS->>DFS: enqueue Iterate(epoch, block_cid, IterateType.StateRoot(state_root_cid), state_queue)
end
loop process queue
DFS->>Store: fetch(next_cid)
alt missing key
Store-->>DFS: Err
DFS-->>User: "[Iterate] missing key: {cid} from {type} in block {block_cid} at epoch {epoch}"
else found
Store-->>DFS: Block
DFS->>DFS: enqueue Iterate(..., propagated epoch/block/type, child_queue)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
9e06205 to
86834c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/cli_shared/logger/mod.rs (1)
166-170: Good call reducing noisy crates to warn in tool modeThese directives will materially quieten
forest-toolwithout hiding errors. One minor maintainability thought: the same four directives now live in bothdefault_env_filter()anddefault_tool_filter(). Consider extracting them into a sharedconst COMMON_WARN_DIRECTIVES: &[&str]and building each list by chaining, to avoid future drift between the two.src/state_manager/mod.rs (1)
1730-1731: Preserve error chains while improving the top-level messageSwitching to
map_err(|e| anyhow!("...: {e}"))makes the display nicer but discards the original error as the source, weakening backtraces/causes. Prefer attaching context while keeping the chain, and print the chain at the call site.Suggested change here:
- ) - .map_err(|e| anyhow::anyhow!("couldn't compute tipset state: {e}"))?; + ) + .context("couldn't compute tipset state")?;And in the CLI (see comment in snapshot_cmd.rs), print
{e:#}(pretty Display) to surface the source error text and context chain.src/tool/subcommands/snapshot_cmd.rs (2)
434-450: Surface full error context to usersNow that
validate_tipsetsattaches context, you can print the full cause chain in the spinner message so users see the underlying error too. This keeps rich diagnostics while preserving the progress UI.- Err(e) => { - pb.abandon_with_message(format!("❌ Transaction result differs from Lotus: {e}")); - anyhow::bail!(e); - } + Err(e) => { + // `{e:#}` prints the context chain; typically single-line unless sources include newlines. + pb.abandon_with_message(format!("❌ Transaction result differs from Lotus: {e:#}")); + anyhow::bail!(e); + }If multi-line messages prove noisy in practice, join the chain manually:
let msg = e.chain().map(ToString::to_string).collect::<Vec<_>>().join(": ");and printmsg.
408-429: Clarify semantics of--check-staterootshelp textThe off-by-one change (
ts.epoch() – epochs as i64 + 1) is correct and ensures exactlyepochstipsets are validated (yieldingepochs – 1transitions). However, the CLI doc comment for--check-staterootsstill reads “Number of recent epochs to scan for bad messages/transactions,” which is misleading:• It actually controls state‐root validation (not message/transaction scans).
• The number of transitions checked is one less than the provided value.Please update both occurrences in
src/tool/subcommands/snapshot_cmd.rs(around lines 57–59 and 74–76) to clarify:- /// Number of recent epochs to scan for bad messages/transactions + /// Number of recent state roots to validate (transitions checked = this value – 1) #[arg(long, default_value_t = 60)] check_stateroots: u32,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/cli_shared/logger/mod.rs(1 hunks)src/state_manager/mod.rs(1 hunks)src/tool/subcommands/snapshot_cmd.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/tool/subcommands/snapshot_cmd.rs (1)
src/state_manager/mod.rs (3)
validate_tipsets(1506-1519)validate_tipsets(1698-1748)chain_config(244-246)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: All lint checks
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (1)
src/tool/subcommands/snapshot_cmd.rs (1)
406-407: Spinner lifecycle is now explicit — niceMoving away from
.with_finish(...)here is cleaner since you’re explicitly finishing/abandoning in the success/error paths below. No action needed.
86834c6 to
30cabb3
Compare
LesnyRumcajs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a changelog entry to the tool.
src/ipld/util.rs
Outdated
| } | ||
| } | ||
| Iterate(cid_vec) => { | ||
| Iterate(epoch, block_cid, ty, cid_vec) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's ty? let's use a more meaningful name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's type but reserved as keyword. How about _type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/ipld/util.rs (4)
104-107: Make IterateType self-describing (derive Debug) and centralize formatting via Display.Deriving Debug helps troubleshooting. Implementing Display avoids ad-hoc string construction at call sites.
Apply:
-enum IterateType { +#[derive(Debug)] +enum IterateType { Message(Cid), StateRoot(Cid), } + +impl std::fmt::Display for IterateType { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + IterateType::Message(c) => write!(f, "message {c}"), + IterateType::StateRoot(c) => write!(f, "state root {c}"), + } + } +}
208-239: Avoid variable shadowing and reuse IterateType’s Display for formatting.Using a local string named ty shadows the enum binding and adds noise. Prefer a distinct binding and delegate formatting to Display.
Apply:
- Iterate(epoch, block_cid, ty, cid_vec) => { + Iterate(epoch, block_cid, src, cid_vec) => { ... - } else if fail_on_dead_links { - let ty = match ty { - IterateType::Message(c) => { - format!("message {c}") - } - IterateType::StateRoot(c) => { - format!("state root {c}") - } - }; - return Poll::Ready(Some(Err(anyhow::anyhow!( - "[Iterate] missing key: {cid} from {ty} in block {block_cid} at epoch {epoch}" - )))); + } else if fail_on_dead_links { + return Poll::Ready(Some(Err(anyhow::anyhow!( + "[Iterate] missing key: {cid} from {src} in block {block_cid} at epoch {epoch}" + )))); }
451-452: Align unordered worker error text with the new taxonomy.Use the “[Iterate]” prefix here for consistency. Consider propagating block/epoch/source in the unordered path in a follow-up (larger refactor).
- .send_async(Err(anyhow::anyhow!("missing key: {cid}"))) + .send_async(Err(anyhow::anyhow!("[Iterate] missing key: {cid}")))
520-522: Match unordered Emit error with ordered path.Prefix the unordered-queue miss with “[Emit]” for parity.
- return Poll::Ready(Some(Err(anyhow::anyhow!("missing key: {cid}")))); + return Poll::Ready(Some(Err(anyhow::anyhow!("[Emit] missing key: {cid}"))));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/ipld/util.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/ipld/util.rs (1)
src/blocks/tipset.rs (2)
epoch(306-308)epoch(543-545)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (rust)
🔇 Additional comments (5)
src/ipld/util.rs (5)
113-114: Good call: enriching Iterate with epoch, block CID, and source improves diagnostics.The additional context should materially reduce triage time when a dead link is hit.
203-204: Clearer error taxonomy for Emit path.Prefixing with “[Emit]” makes it obvious which stage failed.
266-273: Capturing message-traversal context is spot on.Carrying epoch, block CID, and the specific message root will make downstream errors actionable.
282-288: State-root traversal context looks good.Consistent with the message path and respects the stateroot_limit logic and the epoch==0 special case.
200-239: No lingering error message references foundSearch across the codebase for the old “[Emit] missing key:” and “[Iterate] missing key:” patterns returned only the two occurrences in src/ipld/util.rs (lines 203 and 237), which you’ve already updated. There are no matches in tests, documentation, or other modules—everything is safe to merge.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ipld/util.rs (1)
188-201: Ensure consistent “missing key” prefixes across all error messagesI ran the grep check and found two remaining unprefixed
missing key: {cid}diagnostics insrc/ipld/util.rs:
- Line 451: within the
block_sender.send_async(Err(anyhow::anyhow!("missing key: {cid}")))call- Line 521: inside the
Poll::Ready(Some(Err(anyhow::anyhow!("missing key: {cid}"))))branchTo align with the existing taxonomy (
"[Emit]","[Iterate]", etc.), please update these to include an appropriate context prefix. For example:- .send_async(Err(anyhow::anyhow!("missing key: {cid}"))) + .send_async(Err(anyhow::anyhow!("[Send] missing key: {cid}")))and
- Err(anyhow::anyhow!("missing key: {cid}")) + Err(anyhow::anyhow!("[Emit] missing key: {cid}"))(or another label that best reflects the operation.)
—
🧹 Nitpick comments (5)
src/ipld/util.rs (5)
104-108: Derive common traits on IterateType for easier debugging and ergonomics.Add Copy, Clone, and Debug—Cid is Copy in this codebase, so deriving these is cheap and improves logs/tests.
+#[derive(Copy, Clone, Debug)] enum IterateType { Message(Cid), StateRoot(Cid), }
208-239: Great contextualization of dead-link errors in Iterate path.Including epoch, block CID, and whether the origin was messages vs state root will materially speed up operator troubleshooting.
Two very minor nits you may consider:
- Since the variable is used, naming it origin (or kind) instead of _type avoids the underscore convention for “intentionally unused” bindings.
- Consider normalizing the wording to “state-root” (hyphenated) for consistency with other docs, e.g., “from state-root {c}”.
448-453: Unify error taxonomy in the unordered stream with the ordered stream.Ordered stream uses “[Emit] …” and “[Iterate] …” prefixes; unordered still reports a bare “missing key: …”. Prefixing unordered emit-path errors with “[Emit]” aligns tooling and tests that parse logs.
Apply both diffs:
--- a/src/ipld/util.rs +++ b/src/ipld/util.rs @@ - let _ = block_sender - .send_async(Err(anyhow::anyhow!("missing key: {cid}"))) + let _ = block_sender + .send_async(Err(anyhow::anyhow!("[Emit] missing key: {cid}"))) .await;--- a/src/ipld/util.rs +++ b/src/ipld/util.rs @@ - } else if fail_on_dead_links { - return Poll::Ready(Some(Err(anyhow::anyhow!("missing key: {cid}")))); + } else if fail_on_dead_links { + return Poll::Ready(Some(Err(anyhow::anyhow!("[Emit] missing key: {cid}")))); }Also applies to: 520-522
481-509: Minor: surface send() errors with more context.The error includes the inner value; if you decide to keep enriching diagnostics, consider adding a short prefix like “extract queue:” to disambiguate from ordered-stream errors in downstream logs.
180-201: Note for future: consider adding a Display impl for IterateType.Would let you avoid building a temporary String for type_display and keep formatting terse: “[Iterate] missing key: {cid} from {origin} in block …”.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)src/ipld/util.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/ipld/util.rs (1)
src/blocks/tipset.rs (2)
epoch(306-308)epoch(543-545)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: Link Check
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
🔇 Additional comments (7)
src/ipld/util.rs (7)
203-205: LGTM: clearer “[Emit] missing key” diagnostic.The prefix makes it obvious the failure occurred in the emit path, not while expanding links.
266-274: LGTM: carrying origin context into queued Iterate tasks.Passing epoch, block CID, and IterateType when enqueueing messages and state-root walks is a clean way to preserve provenance without adding global state.
Also applies to: 281-289
335-368: Concurrency worker pool bounds look sane.Clamping to 1..=8 workers keeps CPU and memory pressure reasonable while extracting CIDs—good trade-off for I/O-bound workloads.
511-585: Graceful shutdown path is robust.
- The PinnedDrop aborts the worker task.
- Extract sender is dropped when the queue drains, allowing workers to exit naturally.
- Remaining queue is flushed before polling the channel again.
No action needed.
144-168: Public API remains backwards compatible at construction sites.stream_chain/stream_graph builders keep signatures; internal Task variant change is encapsulated within ChainStream.
583-594: LGTM: ipld_to_cid helper.Simple and correct; avoids noise where Links are the only values of interest.
257-263: Genesis parent emission preserved.Keeps prior behavior to surface the dummy parent—important for validating genesis snapshots consistently with Lotus.
8010cc9 to
f51fd1e
Compare
akaladarshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, otherwise LGTM
a37ec63
Summary of changes
Changes introduced in this pull request:
forest-tool snapshot validateerror message--check-staterootsso 900 works for a snapshot that contains 900 recent state rootsforest-toolReference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
Bug Fixes
Refactor
Chores
Documentation