-
Notifications
You must be signed in to change notification settings - Fork 182
feat: add forest-cli f3 ready to check if F3 is in sync
#5963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new forest-cli subcommand Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant CLI as forest-cli (f3 ready)
participant F3 as F3 Service
participant Node as Chain API
U->>CLI: f3 ready [--wait, --threshold=N]
CLI->>F3: Check F3IsRunning
alt F3 not running
CLI-->>U: Error: F3 is not running
else F3 running
loop Every 1s
CLI->>Node: get ChainHead
CLI->>F3: get LatestCertificate
CLI->>CLI: Compare epochs (cert_head + N >= chain_head)
alt In sync
CLI-->>U: Ready (print heads) and exit 0
else Not in sync
alt --wait=false
CLI-->>U: Not ready (status) and exit 1
else --wait=true
CLI-->>U: Status update (in-place) and continue
end
end
end
end
sequenceDiagram
autonumber
participant S as calibnet_export_f3_check.sh
participant CLI as forest-cli (f3 ready)
participant F3 as F3 Service
participant Snap as Snapshot Export
S->>S: Cleanup
S->>CLI: f3 ready --wait (with 10m timeout)
CLI->>F3: Readiness loop (as above)
alt Ready within timeout
S->>Snap: Export zstd snapshot (v2)
else Timeout / not ready
S-->>S: Exit with error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/cli/subcommands/f3_cmd.rs (3)
7-7: Remove unused importWrite.The
Writetrait import appears to be unused. Thewrite!macro works without explicitly importing the trait.-use std::{borrow::Cow, io::Write as _, sync::LazyLock, time::Duration}; +use std::{borrow::Cow, sync::LazyLock, time::Duration};
138-144: Consider better error handling for RPC failures.The
get_headshelper function could fail due to network issues or RPC errors. Currently, these errors are silently ignored in the main loop (line 150).Consider logging the error or implementing a retry mechanism with backoff:
async fn get_heads( client: &rpc::Client, ) -> anyhow::Result<(Tipset, FinalityCertificate)> { let cert_head = client.call(F3GetLatestCertificate::request(())?).await?; let chain_head = client.call(ChainHead::request(())?).await?; Ok((chain_head, cert_head)) } + +// In the main loop, add error handling: +for attempt in ticker { + match get_heads(&client).await { + Ok((chain_head, cert_head)) => { + // existing logic + } + Err(e) if wait => { + // Log the error but continue trying + eprintln!("Failed to fetch heads: {e}. Retrying..."); + continue; + } + Err(e) => { + // Exit on error if not waiting + anyhow::bail!("Failed to fetch heads: {e}"); + } + } +}
146-181: Consider improving the terminal output handling.The current implementation has a few areas that could be improved:
- The ticker starts from 0 which causes an immediate check without any delay
- The terminal manipulation could be more robust
- No handling for terminal I/O errors
-let ticker = Ticker::new(0.., Duration::from_secs(1)); +// Start from 1 to have initial delay, or check immediately then wait +let ticker = Ticker::new(1.., Duration::from_secs(1)); let mut stdout = std::io::stdout(); let mut text = String::new(); + +// Check immediately before entering the loop +if let Ok((chain_head, cert_head)) = get_heads(&client).await { + if cert_head.chain_head().epoch + threshold as i64 >= chain_head.epoch() { + println!( + "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}", + chain_head.epoch(), + cert_head.chain_head().epoch + ); + return Ok(()); + } else if !wait { + anyhow::bail!( + "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}", + chain_head.epoch(), + cert_head.chain_head().epoch + ); + } +} + for _ in ticker { if let Ok((chain_head, cert_head)) = get_heads(&client).await { if !text.is_empty() { - write!( + let _ = write!( stdout, "\r{}{}", anes::MoveCursorUp(1), anes::ClearLine::All, - )?; + ); + let _ = stdout.flush(); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
CHANGELOG.md(1 hunks)docs/docs/users/reference/cli.sh(1 hunks)scripts/tests/calibnet_export_f3_check.sh(1 hunks)src/cli/subcommands/f3_cmd.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (1)
src/blocks/tipset.rs (3)
blocks(517-519)epoch(306-308)epoch(543-545)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Forest CLI checks
- GitHub Check: Snapshot unordered export checks
- GitHub Check: Snapshot export checks v2 with F3 data
- GitHub Check: Snapshot export checks
- GitHub Check: Calibnet check
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
🔇 Additional comments (4)
docs/docs/users/reference/cli.sh (1)
91-91: LGTM! Documentation update properly placed.The new line correctly adds documentation generation for the
forest-cli f3 readycommand, following the established pattern and maintaining alphabetical ordering within the F3 subcommands section.scripts/tests/calibnet_export_f3_check.sh (1)
15-16: scripts/tests/calibnet_export_f3_check.sh – Enhance F3 readiness checkIt looks like this is the only place in our test suite using a
timeout … f3 readyinvocation. Before merging, please consider the following:
- Add explicit error handling around the readiness check.
• Capture the exit code oftimeout 10m $FOREST_CLI_PATH f3 ready --waitand, on failure, either retry a few times or emit a clear warning so downstream steps aren’t left hanging silently.- Confirm that a 10-minute timeout is adequate.
• Reach out to the SRE/ops team or review recent sync-duration metrics for F3 to ensure this window covers peak load scenarios. If 10 minutes is too tight, adjust accordingly.CHANGELOG.md (1)
40-40: LGTM! Changelog entry is properly formatted.The entry follows the established format with PR link and concise description.
src/cli/subcommands/f3_cmd.rs (1)
95-103: LGTM! Well-designed command structure with sensible defaults.The
Readycommand is properly documented with clear parameter descriptions. The default threshold of 20 epochs aligns with the referenced internal monitoring threshold mentioned in the PR description.
src/cli/subcommands/f3_cmd.rs
Outdated
| Err(e) if !wait => { | ||
| anyhow::bail!("{e}"); | ||
| } | ||
| _ => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed? Isn't it already exhaustive match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, there's an if wait case. Refactored to putting if inside Err block
src/cli/subcommands/f3_cmd.rs
Outdated
| cert_head.chain_head().epoch | ||
| ); | ||
| if !wait { | ||
| anyhow::bail!("{text}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using bail is correct option, since since it represents errors and would cause confusion. Otherwise, F3 not being in sync is really an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use non-zero exit code to represent failure so that a bash script does not have to check output text of this command. How about changing bail to process::exit?
src/cli/subcommands/f3_cmd.rs
Outdated
| anyhow::bail!("Failed to check F3 sync status: {e}"); | ||
| } | ||
| // When waiting, log the error but continue | ||
| eprintln!("Warning: Failed to fetch heads: {e}. Retrying..."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanabi1224 We really should error out here after small number of retries otherwise we will get:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it exits after 3 consecutive retries
| break; | ||
| } else { | ||
| text = format!( | ||
| "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of saying not in sync should we say F3 sync is in progress retry later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not enough info about whether F3 sync is in progress or gets stuck.
src/cli/subcommands/f3_cmd.rs
Outdated
| for _ in ticker { | ||
| match get_heads(&client).await { | ||
| Ok((chain_head, cert_head)) => { | ||
| if !text.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use ProgressBar::new_spinner() here when wait flag is set for more intuitive/clear response in the terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, switched to ProgressBar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
src/cli/subcommands/f3_cmd.rs (5)
147-151: Spinner ergonomics: hide when not waiting; clear on exit; don’t garble stderr.
- Only show a spinner when
--waitis set; otherwise use a hidden PB to avoid flicker for single-shot checks.- Use
finish_and_clear()so we don’t leave spinner artifacts in the terminal.- When printing warnings while the spinner is active, use
pb.suspend(...)to avoid jumbled output.- let pb = ProgressBar::new_spinner().with_style( - ProgressStyle::with_template("{spinner} {msg}") - .expect("indicatif template must be valid"), - ); - pb.enable_steady_tick(std::time::Duration::from_millis(100)); + let pb = if wait { + ProgressBar::new_spinner().with_style( + ProgressStyle::with_template("{spinner} {msg}") + .expect("indicatif template must be valid"), + ) + } else { + // Hidden PB renders nothing; safe to call set_message on it. + ProgressBar::hidden() + }; + if wait { + pb.enable_steady_tick(Duration::from_millis(100)); + } @@ - pb.finish(); + pb.finish_and_clear(); break; @@ - pb.finish(); + pb.finish_and_clear(); std::process::exit(1); } @@ - if !wait { - anyhow::bail!("Failed to check F3 sync status: {e}"); - } + if !wait { + pb.finish_and_clear(); + anyhow::bail!("Failed to check F3 sync status: {e}"); + } @@ - if num_consecutive_fetch_failtures >= 3 { - eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); + if num_consecutive_fetch_failures >= 3 { + pb.suspend(|| eprintln!("Warning: Failed to fetch heads: {e}. Exiting...")); std::process::exit(2); } else { - eprintln!("Warning: Failed to fetch heads: {e}. Retrying..."); + pb.suspend(|| eprintln!("Warning: Failed to fetch heads: {e}. Retrying...")); }Also applies to: 175-179, 181-193
96-104: Use a non-negative threshold type and a safe epoch diff (no overflow).
- Threshold is conceptually a non-negative epoch gap. Prefer
u64overusizeto avoid platform-dependent width and to align withabs_diffresult.- Avoid
cert_epoch + threshold as i64 >= chain_epochwhich can overflow and reads less clearly than comparing a gap.@@ pub enum F3Commands { - Ready { + Ready { @@ - #[arg(long, default_value_t = 20)] - threshold: usize, + #[arg(long, default_value_t = 20)] + threshold: u64, }, @@ - if cert_head.chain_head().epoch + threshold as i64 >= chain_head.epoch() - { - let text = format!( - "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}", - chain_head.epoch(), - cert_head.chain_head().epoch - ); + let chain_epoch = chain_head.epoch(); + let cert_epoch = cert_head.chain_head().epoch; + let gap = chain_epoch.abs_diff(cert_epoch); + if gap <= threshold { + let text = format!( + "[+] F3 is in sync (gap: {gap} <= threshold: {threshold}). Chain head epoch: {chain_epoch}, F3 head epoch: {cert_epoch}" + ); pb.set_message(text); - pb.finish(); + pb.finish_and_clear(); break; } else { - let text = format!( - "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}", - chain_head.epoch(), - cert_head.chain_head().epoch - ); + let text = format!( + "[-] F3 is not in sync (gap: {gap} > threshold: {threshold}). Chain head epoch: {chain_epoch}, F3 head epoch: {cert_epoch}" + ); pb.set_message(text); if !wait { - pb.finish(); + pb.finish_and_clear(); std::process::exit(1); } }Also applies to: 158-175
152-152: Typo: renamenum_consecutive_fetch_failtures→num_consecutive_fetch_failures.Minor but improves readability and greppability.
- let mut num_consecutive_fetch_failtures = 0; + let mut num_consecutive_fetch_failures = 0; @@ - num_consecutive_fetch_failtures = 0; + num_consecutive_fetch_failures = 0; @@ - num_consecutive_fetch_failtures += 1; - if num_consecutive_fetch_failtures >= 3 { + num_consecutive_fetch_failures += 1; + if num_consecutive_fetch_failures >= 3 {Also applies to: 157-157, 186-187
139-146: Fetch chain and F3 heads concurrently to reduce per-tick latency.Both RPCs are independent; running them in parallel trims a round-trip per tick.
- async fn get_heads( - client: &rpc::Client, - ) -> anyhow::Result<(Tipset, FinalityCertificate)> { - let cert_head = client.call(F3GetLatestCertificate::request(())?).await?; - let chain_head = client.call(ChainHead::request(())?).await?; - Ok((chain_head, cert_head)) - } + async fn get_heads( + client: &rpc::Client, + ) -> anyhow::Result<(Tipset, FinalityCertificate)> { + let (cert_head, chain_head) = tokio::try_join!( + client.call(F3GetLatestCertificate::request(())?), + client.call(ChainHead::request(())?), + )?; + Ok((chain_head, cert_head)) + }
175-179: Consider avoidingprocess::exitinside command handlers.Forest CLI generally returns
anyhow::Result<()>from subcommands and lets the top-level decide exit codes and formatting. Callingprocess::exithere sidesteps unified error handling and cleanup. If distinct exit codes are desired (e.g., 1 = not in sync, 2 = RPC failures), consider returning a custom error type carrying a code and mapping it in the CLI entrypoint.Would you like me to draft a tiny
CliExiterror type and a mapping in the CLI main so subcommands can signal exit codes without hard-exiting?Also applies to: 188-190
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/cli/subcommands/f3_cmd.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Applied to files:
src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (2)
src/blocks/tipset.rs (3)
blocks(517-519)epoch(306-308)epoch(543-545)src/rpc/methods/f3/types.rs (1)
chain_head(541-544)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
- GitHub Check: All lint checks
🔇 Additional comments (1)
src/cli/subcommands/f3_cmd.rs (1)
134-138: Fail fast on F3 not running — good guard.Early
F3IsRunningcheck with a clear error is the right UX and prevents wasted polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/cli/subcommands/f3_cmd.rs (1)
152-155: Nice switch totokio::time::interval.This avoids blocking the async runtime. Matches prior feedback.
🧹 Nitpick comments (5)
src/cli/subcommands/f3_cmd.rs (5)
138-145: Fetch heads concurrently to reduce per-iteration latency.Both RPCs are independent; running them in parallel trims one network RTT per tick and keeps the UI snappier.
- async fn get_heads( - client: &rpc::Client, - ) -> anyhow::Result<(Tipset, FinalityCertificate)> { - let cert_head = client.call(F3GetLatestCertificate::request(())?).await?; - let chain_head = client.call(ChainHead::request(())?).await?; - Ok((chain_head, cert_head)) - } + async fn get_heads( + client: &rpc::Client, + ) -> anyhow::Result<(Tipset, FinalityCertificate)> { + let (cert_head, chain_head) = tokio::try_join!( + async { + // F3 latest certificate + client.call(F3GetLatestCertificate::request(())?).await + }, + async { + // Chain head + client.call(ChainHead::request(())?).await + } + )?; + Ok((chain_head, cert_head)) + }
146-151: Spinner setup LGTM; minor polish available.Consider
finish_and_clear()on exit paths to restore a clean prompt (see suggestions below), but the current setup is fine.
151-151: Fix variable name typo (failtures→failures).Minor readability/maintainability nit.
- let mut num_consecutive_fetch_failtures = 0; + let mut num_consecutive_fetch_failures = 0; @@ - num_consecutive_fetch_failtures = 0; + num_consecutive_fetch_failures = 0; @@ - num_consecutive_fetch_failtures += 1; - if num_consecutive_fetch_failtures >= 3 { + num_consecutive_fetch_failures += 1; + if num_consecutive_fetch_failures >= 3 {Also applies to: 157-157, 186-187
181-193: End the spinner cleanly before exiting, and keep stdout/stderr separation tidy.Call
finish_and_clear()on error exits to avoid leaving spinner artifacts in the terminal and to make stderr warnings more readable.Err(e) => { if !wait { - anyhow::bail!("Failed to check F3 sync status: {e}"); + pb.finish_and_clear(); + anyhow::bail!("Failed to check F3 sync status: {e}"); } - num_consecutive_fetch_failtures += 1; - if num_consecutive_fetch_failtures >= 3 { - eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); - std::process::exit(2); + num_consecutive_fetch_failures += 1; + if num_consecutive_fetch_failures >= 3 { + pb.finish_and_clear(); + eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); + std::process::exit(2); } else { eprintln!("Warning: Failed to fetch heads: {e}. Retrying..."); } }
176-179: Prefer returning errors overprocess::exitfrom deep in command logic.Using
process::exithere bypasses normal drop/cleanup paths and mixes exit-code handling with business logic. Consider returning ananyhow::Error(or a custom error that carries an exit code) and letting the top-level CLI main map it to exit codes, to keep behavior consistent across commands.If you want, I can scan the repo to see how other subcommands handle exit codes and propose a consistent pattern.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/cli/subcommands/f3_cmd.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Applied to files:
src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (2)
src/blocks/tipset.rs (3)
blocks(517-519)epoch(306-308)epoch(543-545)src/rpc/methods/f3/types.rs (1)
chain_head(541-544)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
src/cli/subcommands/f3_cmd.rs (3)
10-11: Import looks correct.Importing Tipset alongside TipsetKey is appropriate, as ChainHead returns a Tipset. No issues.
26-26: Good choice using indicatif for terminal UX.The spinner makes the wait mode clearer to users. No issues.
133-136: Early check for F3 process is good.Fail-fast when F3 isn’t running avoids confusing “not in sync” messaging. Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/rpc/methods/f3/types.rs (3)
21-21: Good call enforcing non-emptiness via a dedicated type alias.Adopting nunny::Vec as NonEmpty aligns with the project’s preference for fail-fast invariants and removes a whole class of empty-vector checks.
If we intend to use NonEmpty across multiple modules, consider re-exporting it once (e.g., in a prelude or a common types module) to avoid scattering
use nunny::Vec as NonEmpty;throughout the codebase.
541-543: Same here:chain_headis now infallible under the NonEmpty invariant.Matches how the new CLI readiness check uses the head safely.
Optionally add
#[inline]to both tiny getters for ergonomics; not critical.
498-501: ECChain non-empty invariant confirmed – update schema for clarity
- Type safety: Enforcing
FinalityCertificate.ec_chainasNonEmpty<ECTipSet>is correct—callers can now use.first()/.last()without checking emptiness.- Serde support:
crate::lotus_jsonhandlesNonEmpty<T>generically (viaimpl<T> HasLotusJson for nunny::Vec<T>in src/lotus_json/nonempty.rs), so no runtime or compile-time errors will arise when serializing/deserializingNonEmpty`.- Call-site safety: No occurrences of
ec_chain.is_empty(),.remove(), or.truncate()were found in the codebase, so switching toNonEmptydoes not break existing logic.- Schema clarity: The generated OpenRPC schema still uses a nullable array (
Nullable_Array_of_ECTipSet) withoutminItems. To reflect the new invariant without altering the JSON shape, update theschemarsattribute to addmin_items = 1on a plainVec<ECTipSet>.Suggested diff:
#[serde(rename = "ECChain", with = "crate::lotus_json")] -#[schemars(with = "LotusJson<NonEmpty<ECTipSet>>")] +#[schemars(min_items = 1, with = "LotusJson<Vec<ECTipSet>>")] pub ec_chain: NonEmpty<ECTipSet>,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/cli/subcommands/f3_cmd.rs(5 hunks)src/rpc/methods/f3/types.rs(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cli/subcommands/f3_cmd.rs
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: docker-lint
- GitHub Check: python-lint
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Deploy to Cloudflare Pages
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
src/rpc/methods/f3/types.rs (1)
537-539: Getter now leverages the non-empty guarantee—nice and safe.Using
self.ec_chain.first()(NonEmpty) removes the need forOption/unwraps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/cli/subcommands/f3_cmd.rs (4)
94-102: Clarify threshold semantics and expose exit-code behavior in help.Minor UX polish: consider making threshold units explicit in help (epochs) and documenting the exit codes this subcommand may return (e.g., 0 in-sync, 1 not in sync, 2 fetch failures). This saves users from having to infer behavior from logs.
Apply:
- /// Checks if F3 is in sync. + /// Checks if F3 is in sync. Ready { - /// Wait until F3 is in sync. + /// Wait until F3 is in sync. #[arg(long)] wait: bool, - /// The threshold of the epoch gap between chain head and F3 head within which F3 is considered in sync. - #[arg(long, default_value_t = 20)] + /// The maximum allowed gap (in epochs) between the chain head and the F3 head to be considered in sync. + /// Exit codes: 0=in sync, 1=not in sync, 2=RPC fetch failures. + #[arg(long, default_value_t = 20, value_name = "EPOCHS")] threshold: usize, },
131-200: Fix typo in counter name and include gap in messages for better observability.
- Rename
num_consecutive_fetch_failtures→num_consecutive_fetch_failures.- Compute and display the epoch gap. This makes CI logs and user output more actionable without redoing mental math.
Apply:
- pb.enable_steady_tick(std::time::Duration::from_millis(100)); - let mut num_consecutive_fetch_failtures = 0; + pb.enable_steady_tick(std::time::Duration::from_millis(100)); + let mut num_consecutive_fetch_failures = 0; @@ - Ok((chain_head, cert_head)) => { - num_consecutive_fetch_failtures = 0; - if cert_head - .chain_head() - .epoch - .saturating_add(threshold.try_into()?) - >= chain_head.epoch() - { - let text = format!( - "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}", - chain_head.epoch(), - cert_head.chain_head().epoch - ); - pb.set_message(text); - pb.finish(); - break; - } else { - let text = format!( - "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}", - chain_head.epoch(), - cert_head.chain_head().epoch - ); - pb.set_message(text); - if !wait { - pb.finish(); - std::process::exit(1); - } - } - } + Ok((chain_head, cert_head)) => { + num_consecutive_fetch_failures = 0; + let chain_epoch = chain_head.epoch(); + let cert_epoch = cert_head.chain_head().epoch; + let gap = chain_epoch.saturating_sub(cert_epoch); + if cert_epoch.saturating_add(threshold.try_into()?) >= chain_epoch { + let text = format!( + "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}, gap: {}", + chain_epoch, + cert_epoch, + gap + ); + pb.set_message(text); + pb.finish(); + break; + } else { + let text = format!( + "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}, gap: {}", + chain_epoch, + cert_epoch, + gap + ); + pb.set_message(text); + if !wait { + pb.finish(); + std::process::exit(1); + } + } + } @@ - num_consecutive_fetch_failtures += 1; - if num_consecutive_fetch_failtures >= 3 { + num_consecutive_fetch_failures += 1; + if num_consecutive_fetch_failures >= 3 { eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); std::process::exit(2); } else { eprintln!("Warning: Failed to fetch heads: {e}. Retrying..."); }Optional: move
interval.tick().await;to the end of the loop to avoid a 1-second initial delay before the first check.
145-149: Minor consistency: prefer the importedDurationalias.You already
use std::time::Duration;Duration::from_millis(100)reads cleaner thanstd::time::Duration::from_millis(100).Apply:
- pb.enable_steady_tick(std::time::Duration::from_millis(100)); + pb.enable_steady_tick(Duration::from_millis(100));
185-187: Exit-code consistency for non-wait RPC failures (optional).Today, non-wait mode bails here (likely mapped to exit 1 by the caller), while other failure paths use explicit
process::exit(2). If you want strict, scripted semantics, consider exiting with a well-defined code here too (e.g., 2) for parity.Apply:
- if !wait { - anyhow::bail!("Failed to check F3 sync status: {e}"); - } + if !wait { + eprintln!("Failed to check F3 sync status: {e}"); + std::process::exit(2); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/cli/subcommands/f3_cmd.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.
Applied to files:
src/cli/subcommands/f3_cmd.rs
📚 Learning: 2025-08-13T09:43:20.301Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
Applied to files:
src/cli/subcommands/f3_cmd.rs
📚 Learning: 2025-08-18T03:09:47.932Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.
Applied to files:
src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (2)
src/blocks/tipset.rs (3)
blocks(517-519)epoch(306-308)epoch(543-545)src/rpc/methods/f3/types.rs (1)
chain_head(541-543)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: docker-lint
- GitHub Check: Check
- GitHub Check: Deploy to Cloudflare Pages
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (rust)
- GitHub Check: Analyze (actions)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
src/cli/subcommands/f3_cmd.rs (3)
10-10: Importing Tipset is correct and necessary here.Using
Tipsetaligns withChainHead's return type and enables.epoch()usage below. Looks good.
25-25: Good call switching to indicatif for UX.The spinner improves the CLI feedback loop and pairs well with the async
tokio::time::intervalused later.
450-451: Non-empty EC chain: direct.last()usage is correct now.Given
ec_chainis enforced non-empty, avoidingOptionhandling here is appropriate and removes a potential panic site. Good cleanup.
Summary of changes
This PR adds a command for checking F3 readiness before exporting a v2 F3-aware snapshot. The default threshold 20 aligns with the internal monitoring threshold.
Changes introduced in this pull request:
forest-cli f3 readyto check if F3 is in synchttps://github.com/ChainSafe/forest/actions/runs/17123908476/job/48571511268?pr=5963#step:7:941
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
New Features
Documentation
Chores
Breaking Changes