Skip to content

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Aug 21, 2025

Summary of changes

This PR adds a command for checking F3 readiness before exporting a v2 F3-aware snapshot. The default threshold 20 aligns with the internal monitoring threshold.

Changes introduced in this pull request:

  • add forest-cli f3 ready to check if F3 is in sync

https://github.com/ChainSafe/forest/actions/runs/17123908476/job/48571511268?pr=5963#step:7:941

[-] F3 is not in sync. Chain head epoch: 2948186, F3 head epoch: 2947570
[-] F3 is not in sync. Chain head epoch: 2948186, F3 head epoch: 2947825
[-] F3 is not in sync. Chain head epoch: 2948186, F3 head epoch: 2948092
[+] F3 is in sync. Chain head epoch: 2948186, F3 head epoch: 2948182

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • New Features

    • Added a forest-cli "f3 ready" command to check F3 synchronization with a --wait option, configurable threshold, and live spinner status.
  • Documentation

    • CLI reference and changelog updated to document the new "f3 ready" command.
  • Chores

    • Export process now waits for F3 readiness before creating snapshots.
  • Breaking Changes

    • Finality certificate chains are now required to be non-empty; integrations assuming empty chains may need updates.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Walkthrough

Adds a new forest-cli subcommand f3 ready (flags: --wait, --threshold) that polls F3 and node heads to determine sync readiness; updates CLI docs and CHANGELOG; invokes the readiness check in the calibnet export script before snapshot export; and tightens FinalityCertificate.ec_chain to be non-empty.

Changes

Cohort / File(s) Summary
CLI: f3 ready implementation
src/cli/subcommands/f3_cmd.rs
Adds F3Commands::Ready { wait: bool, threshold: usize }; implements async polling (1s ticker) to fetch ChainHead and LatestCertificate, compares epochs (cert_head + threshold >= chain_head), prints in-place status via an indicatif spinner, supports --wait vs immediate exit, retries transient fetch errors, and adjusts imports (indicatif, Tipset).
RPC types: FinalityCertificate ec_chain non-empty
src/rpc/methods/f3/types.rs
Changes FinalityCertificate.ec_chain from Vec<ECTipSet> to NonEmpty<ECTipSet> (alias via nunny::Vec as NonEmpty), updates schemars annotation to LotusJson<NonEmpty<ECTipSet>>, and simplifies chain_base/chain_head to rely on non-empty guarantees.
Docs & changelog
CHANGELOG.md, docs/docs/users/reference/cli.sh
Inserts a changelog entry for "forest-cli f3 ready" and extends CLI docs generation to include the f3 ready command.
Export script integration
scripts/tests/calibnet_export_f3_check.sh
Adds a pre-export readiness call to forest-cli f3 ready (10-minute timeout) with wait messaging before exporting the zstd snapshot; other script logic unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant CLI as forest-cli (f3 ready)
  participant F3 as F3 Service
  participant Node as Chain API

  U->>CLI: f3 ready [--wait, --threshold=N]
  CLI->>F3: Check F3IsRunning
  alt F3 not running
    CLI-->>U: Error: F3 is not running
  else F3 running
    loop Every 1s
      CLI->>Node: get ChainHead
      CLI->>F3: get LatestCertificate
      CLI->>CLI: Compare epochs (cert_head + N >= chain_head)
      alt In sync
        CLI-->>U: Ready (print heads) and exit 0
      else Not in sync
        alt --wait=false
          CLI-->>U: Not ready (status) and exit 1
        else --wait=true
          CLI-->>U: Status update (in-place) and continue
        end
      end
    end
  end
Loading
sequenceDiagram
  autonumber
  participant S as calibnet_export_f3_check.sh
  participant CLI as forest-cli (f3 ready)
  participant F3 as F3 Service
  participant Snap as Snapshot Export

  S->>S: Cleanup
  S->>CLI: f3 ready --wait (with 10m timeout)
  CLI->>F3: Readiness loop (as above)
  alt Ready within timeout
    S->>Snap: Export zstd snapshot (v2)
  else Timeout / not ready
    S-->>S: Exit with error
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

RPC

Suggested reviewers

  • akaladarshi
  • LesnyRumcajs
  • elmattic
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/cli-f3-ready

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224 hanabi1224 marked this pull request as ready for review August 21, 2025 10:26
@hanabi1224 hanabi1224 requested a review from a team as a code owner August 21, 2025 10:26
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and akaladarshi and removed request for a team August 21, 2025 10:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/cli/subcommands/f3_cmd.rs (3)

7-7: Remove unused import Write.

The Write trait import appears to be unused. The write! macro works without explicitly importing the trait.

-use std::{borrow::Cow, io::Write as _, sync::LazyLock, time::Duration};
+use std::{borrow::Cow, sync::LazyLock, time::Duration};

138-144: Consider better error handling for RPC failures.

The get_heads helper function could fail due to network issues or RPC errors. Currently, these errors are silently ignored in the main loop (line 150).

Consider logging the error or implementing a retry mechanism with backoff:

 async fn get_heads(
     client: &rpc::Client,
 ) -> anyhow::Result<(Tipset, FinalityCertificate)> {
     let cert_head = client.call(F3GetLatestCertificate::request(())?).await?;
     let chain_head = client.call(ChainHead::request(())?).await?;
     Ok((chain_head, cert_head))
 }
+
+// In the main loop, add error handling:
+for attempt in ticker {
+    match get_heads(&client).await {
+        Ok((chain_head, cert_head)) => {
+            // existing logic
+        }
+        Err(e) if wait => {
+            // Log the error but continue trying
+            eprintln!("Failed to fetch heads: {e}. Retrying...");
+            continue;
+        }
+        Err(e) => {
+            // Exit on error if not waiting
+            anyhow::bail!("Failed to fetch heads: {e}");
+        }
+    }
+}

146-181: Consider improving the terminal output handling.

The current implementation has a few areas that could be improved:

  1. The ticker starts from 0 which causes an immediate check without any delay
  2. The terminal manipulation could be more robust
  3. No handling for terminal I/O errors
-let ticker = Ticker::new(0.., Duration::from_secs(1));
+// Start from 1 to have initial delay, or check immediately then wait
+let ticker = Ticker::new(1.., Duration::from_secs(1));
 let mut stdout = std::io::stdout();
 let mut text = String::new();
+
+// Check immediately before entering the loop
+if let Ok((chain_head, cert_head)) = get_heads(&client).await {
+    if cert_head.chain_head().epoch + threshold as i64 >= chain_head.epoch() {
+        println!(
+            "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}",
+            chain_head.epoch(),
+            cert_head.chain_head().epoch
+        );
+        return Ok(());
+    } else if !wait {
+        anyhow::bail!(
+            "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}",
+            chain_head.epoch(),
+            cert_head.chain_head().epoch
+        );
+    }
+}
+
 for _ in ticker {
     if let Ok((chain_head, cert_head)) = get_heads(&client).await {
         if !text.is_empty() {
-            write!(
+            let _ = write!(
                 stdout,
                 "\r{}{}",
                 anes::MoveCursorUp(1),
                 anes::ClearLine::All,
-            )?;
+            );
+            let _ = stdout.flush();
         }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9752098 and e825b35.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • docs/docs/users/reference/cli.sh (1 hunks)
  • scripts/tests/calibnet_export_f3_check.sh (1 hunks)
  • src/cli/subcommands/f3_cmd.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (1)
src/blocks/tipset.rs (3)
  • blocks (517-519)
  • epoch (306-308)
  • epoch (543-545)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: Forest CLI checks
  • GitHub Check: Snapshot unordered export checks
  • GitHub Check: Snapshot export checks v2 with F3 data
  • GitHub Check: Snapshot export checks
  • GitHub Check: Calibnet check
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
docs/docs/users/reference/cli.sh (1)

91-91: LGTM! Documentation update properly placed.

The new line correctly adds documentation generation for the forest-cli f3 ready command, following the established pattern and maintaining alphabetical ordering within the F3 subcommands section.

scripts/tests/calibnet_export_f3_check.sh (1)

15-16: scripts/tests/calibnet_export_f3_check.sh – Enhance F3 readiness check

It looks like this is the only place in our test suite using a timeout … f3 ready invocation. Before merging, please consider the following:

  • Add explicit error handling around the readiness check.
    • Capture the exit code of timeout 10m $FOREST_CLI_PATH f3 ready --wait and, on failure, either retry a few times or emit a clear warning so downstream steps aren’t left hanging silently.
  • Confirm that a 10-minute timeout is adequate.
    • Reach out to the SRE/ops team or review recent sync-duration metrics for F3 to ensure this window covers peak load scenarios. If 10 minutes is too tight, adjust accordingly.
CHANGELOG.md (1)

40-40: LGTM! Changelog entry is properly formatted.

The entry follows the established format with PR link and concise description.

src/cli/subcommands/f3_cmd.rs (1)

95-103: LGTM! Well-designed command structure with sensible defaults.

The Ready command is properly documented with clear parameter descriptions. The default threshold of 20 epochs aligns with the referenced internal monitoring threshold mentioned in the PR description.

Err(e) if !wait => {
anyhow::bail!("{e}");
}
_ => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? Isn't it already exhaustive match?

Copy link
Contributor Author

@hanabi1224 hanabi1224 Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, there's an if wait case. Refactored to putting if inside Err block

LesnyRumcajs
LesnyRumcajs previously approved these changes Aug 21, 2025
cert_head.chain_head().epoch
);
if !wait {
anyhow::bail!("{text}");
Copy link
Collaborator

@akaladarshi akaladarshi Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using bail is correct option, since since it represents errors and would cause confusion. Otherwise, F3 not being in sync is really an error.

Copy link
Contributor Author

@hanabi1224 hanabi1224 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use non-zero exit code to represent failure so that a bash script does not have to check output text of this command. How about changing bail to process::exit?

anyhow::bail!("Failed to check F3 sync status: {e}");
}
// When waiting, log the error but continue
eprintln!("Warning: Failed to fetch heads: {e}. Retrying...");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanabi1224 We really should error out here after small number of retries otherwise we will get:

Screenshot 2025-08-22 at 7 41 07 PM

Copy link
Contributor Author

@hanabi1224 hanabi1224 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it exits after 3 consecutive retries

break;
} else {
text = format!(
"[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saying not in sync should we say F3 sync is in progress retry later ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not enough info about whether F3 sync is in progress or gets stuck.

for _ in ticker {
match get_heads(&client).await {
Ok((chain_head, cert_head)) => {
if !text.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use ProgressBar::new_spinner() here when wait flag is set for more intuitive/clear response in the terminal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, switched to ProgressBar

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/cli/subcommands/f3_cmd.rs (5)

147-151: Spinner ergonomics: hide when not waiting; clear on exit; don’t garble stderr.

  • Only show a spinner when --wait is set; otherwise use a hidden PB to avoid flicker for single-shot checks.
  • Use finish_and_clear() so we don’t leave spinner artifacts in the terminal.
  • When printing warnings while the spinner is active, use pb.suspend(...) to avoid jumbled output.
-                let pb = ProgressBar::new_spinner().with_style(
-                    ProgressStyle::with_template("{spinner} {msg}")
-                        .expect("indicatif template must be valid"),
-                );
-                pb.enable_steady_tick(std::time::Duration::from_millis(100));
+                let pb = if wait {
+                    ProgressBar::new_spinner().with_style(
+                        ProgressStyle::with_template("{spinner} {msg}")
+                            .expect("indicatif template must be valid"),
+                    )
+                } else {
+                    // Hidden PB renders nothing; safe to call set_message on it.
+                    ProgressBar::hidden()
+                };
+                if wait {
+                    pb.enable_steady_tick(Duration::from_millis(100));
+                }
@@
-                                pb.finish();
+                                pb.finish_and_clear();
                                 break;
@@
-                                    pb.finish();
+                                    pb.finish_and_clear();
                                     std::process::exit(1);
                                 }
@@
-                            if !wait {
-                                anyhow::bail!("Failed to check F3 sync status: {e}");
-                            }
+                            if !wait {
+                                pb.finish_and_clear();
+                                anyhow::bail!("Failed to check F3 sync status: {e}");
+                            }
@@
-                            if num_consecutive_fetch_failtures >= 3 {
-                                eprintln!("Warning: Failed to fetch heads: {e}. Exiting...");
+                            if num_consecutive_fetch_failures >= 3 {
+                                pb.suspend(|| eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."));
                                 std::process::exit(2);
                             } else {
-                                eprintln!("Warning: Failed to fetch heads: {e}. Retrying...");
+                                pb.suspend(|| eprintln!("Warning: Failed to fetch heads: {e}. Retrying..."));
                             }

Also applies to: 175-179, 181-193


96-104: Use a non-negative threshold type and a safe epoch diff (no overflow).

  • Threshold is conceptually a non-negative epoch gap. Prefer u64 over usize to avoid platform-dependent width and to align with abs_diff result.
  • Avoid cert_epoch + threshold as i64 >= chain_epoch which can overflow and reads less clearly than comparing a gap.
@@ pub enum F3Commands {
-    Ready {
+    Ready {
@@
-        #[arg(long, default_value_t = 20)]
-        threshold: usize,
+        #[arg(long, default_value_t = 20)]
+        threshold: u64,
     },
@@
-                            if cert_head.chain_head().epoch + threshold as i64 >= chain_head.epoch()
-                            {
-                                let text = format!(
-                                    "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}",
-                                    chain_head.epoch(),
-                                    cert_head.chain_head().epoch
-                                );
+                            let chain_epoch = chain_head.epoch();
+                            let cert_epoch = cert_head.chain_head().epoch;
+                            let gap = chain_epoch.abs_diff(cert_epoch);
+                            if gap <= threshold {
+                                let text = format!(
+                                    "[+] F3 is in sync (gap: {gap} <= threshold: {threshold}). Chain head epoch: {chain_epoch}, F3 head epoch: {cert_epoch}"
+                                );
                                 pb.set_message(text);
-                                pb.finish();
+                                pb.finish_and_clear();
                                 break;
                             } else {
-                                let text = format!(
-                                    "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}",
-                                    chain_head.epoch(),
-                                    cert_head.chain_head().epoch
-                                );
+                                let text = format!(
+                                    "[-] F3 is not in sync (gap: {gap} > threshold: {threshold}). Chain head epoch: {chain_epoch}, F3 head epoch: {cert_epoch}"
+                                );
                                 pb.set_message(text);
                                 if !wait {
-                                    pb.finish();
+                                    pb.finish_and_clear();
                                     std::process::exit(1);
                                 }
                             }

Also applies to: 158-175


152-152: Typo: rename num_consecutive_fetch_failturesnum_consecutive_fetch_failures.

Minor but improves readability and greppability.

-                let mut num_consecutive_fetch_failtures = 0;
+                let mut num_consecutive_fetch_failures = 0;
@@
-                            num_consecutive_fetch_failtures = 0;
+                            num_consecutive_fetch_failures = 0;
@@
-                            num_consecutive_fetch_failtures += 1;
-                            if num_consecutive_fetch_failtures >= 3 {
+                            num_consecutive_fetch_failures += 1;
+                            if num_consecutive_fetch_failures >= 3 {

Also applies to: 157-157, 186-187


139-146: Fetch chain and F3 heads concurrently to reduce per-tick latency.

Both RPCs are independent; running them in parallel trims a round-trip per tick.

-                async fn get_heads(
-                    client: &rpc::Client,
-                ) -> anyhow::Result<(Tipset, FinalityCertificate)> {
-                    let cert_head = client.call(F3GetLatestCertificate::request(())?).await?;
-                    let chain_head = client.call(ChainHead::request(())?).await?;
-                    Ok((chain_head, cert_head))
-                }
+                async fn get_heads(
+                    client: &rpc::Client,
+                ) -> anyhow::Result<(Tipset, FinalityCertificate)> {
+                    let (cert_head, chain_head) = tokio::try_join!(
+                        client.call(F3GetLatestCertificate::request(())?),
+                        client.call(ChainHead::request(())?),
+                    )?;
+                    Ok((chain_head, cert_head))
+                }

175-179: Consider avoiding process::exit inside command handlers.

Forest CLI generally returns anyhow::Result<()> from subcommands and lets the top-level decide exit codes and formatting. Calling process::exit here sidesteps unified error handling and cleanup. If distinct exit codes are desired (e.g., 1 = not in sync, 2 = RPC failures), consider returning a custom error type carrying a code and mapping it in the CLI entrypoint.

Would you like me to draft a tiny CliExit error type and a mapping in the CLI main so subcommands can signal exit codes without hard-exiting?

Also applies to: 188-190

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9d9b1b7 and e8e8783.

📒 Files selected for processing (1)
  • src/cli/subcommands/f3_cmd.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.

Applied to files:

  • src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (2)
src/blocks/tipset.rs (3)
  • blocks (517-519)
  • epoch (306-308)
  • epoch (543-545)
src/rpc/methods/f3/types.rs (1)
  • chain_head (541-544)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
  • GitHub Check: All lint checks
🔇 Additional comments (1)
src/cli/subcommands/f3_cmd.rs (1)

134-138: Fail fast on F3 not running — good guard.

Early F3IsRunning check with a clear error is the right UX and prevents wasted polling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/cli/subcommands/f3_cmd.rs (1)

152-155: Nice switch to tokio::time::interval.

This avoids blocking the async runtime. Matches prior feedback.

🧹 Nitpick comments (5)
src/cli/subcommands/f3_cmd.rs (5)

138-145: Fetch heads concurrently to reduce per-iteration latency.

Both RPCs are independent; running them in parallel trims one network RTT per tick and keeps the UI snappier.

-                async fn get_heads(
-                    client: &rpc::Client,
-                ) -> anyhow::Result<(Tipset, FinalityCertificate)> {
-                    let cert_head = client.call(F3GetLatestCertificate::request(())?).await?;
-                    let chain_head = client.call(ChainHead::request(())?).await?;
-                    Ok((chain_head, cert_head))
-                }
+                async fn get_heads(
+                    client: &rpc::Client,
+                ) -> anyhow::Result<(Tipset, FinalityCertificate)> {
+                    let (cert_head, chain_head) = tokio::try_join!(
+                        async {
+                            // F3 latest certificate
+                            client.call(F3GetLatestCertificate::request(())?).await
+                        },
+                        async {
+                            // Chain head
+                            client.call(ChainHead::request(())?).await
+                        }
+                    )?;
+                    Ok((chain_head, cert_head))
+                }

146-151: Spinner setup LGTM; minor polish available.

Consider finish_and_clear() on exit paths to restore a clean prompt (see suggestions below), but the current setup is fine.


151-151: Fix variable name typo (failturesfailures).

Minor readability/maintainability nit.

-                let mut num_consecutive_fetch_failtures = 0;
+                let mut num_consecutive_fetch_failures = 0;
@@
-                            num_consecutive_fetch_failtures = 0;
+                            num_consecutive_fetch_failures = 0;
@@
-                            num_consecutive_fetch_failtures += 1;
-                            if num_consecutive_fetch_failtures >= 3 {
+                            num_consecutive_fetch_failures += 1;
+                            if num_consecutive_fetch_failures >= 3 {

Also applies to: 157-157, 186-187


181-193: End the spinner cleanly before exiting, and keep stdout/stderr separation tidy.

Call finish_and_clear() on error exits to avoid leaving spinner artifacts in the terminal and to make stderr warnings more readable.

                         Err(e) => {
                             if !wait {
-                                anyhow::bail!("Failed to check F3 sync status: {e}");
+                                pb.finish_and_clear();
+                                anyhow::bail!("Failed to check F3 sync status: {e}");
                             }
 
-                            num_consecutive_fetch_failtures += 1;
-                            if num_consecutive_fetch_failtures >= 3 {
-                                eprintln!("Warning: Failed to fetch heads: {e}. Exiting...");
-                                std::process::exit(2);
+                            num_consecutive_fetch_failures += 1;
+                            if num_consecutive_fetch_failures >= 3 {
+                                pb.finish_and_clear();
+                                eprintln!("Warning: Failed to fetch heads: {e}. Exiting...");
+                                std::process::exit(2);
                             } else {
                                 eprintln!("Warning: Failed to fetch heads: {e}. Retrying...");
                             }
                         }

176-179: Prefer returning errors over process::exit from deep in command logic.

Using process::exit here bypasses normal drop/cleanup paths and mixes exit-code handling with business logic. Consider returning an anyhow::Error (or a custom error that carries an exit code) and letting the top-level CLI main map it to exit codes, to keep behavior consistent across commands.

If you want, I can scan the repo to see how other subcommands handle exit codes and propose a consistent pattern.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e8e8783 and 9dcf6d7.

📒 Files selected for processing (1)
  • src/cli/subcommands/f3_cmd.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.

Applied to files:

  • src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (2)
src/blocks/tipset.rs (3)
  • blocks (517-519)
  • epoch (306-308)
  • epoch (543-545)
src/rpc/methods/f3/types.rs (1)
  • chain_head (541-544)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: Build MacOS
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
src/cli/subcommands/f3_cmd.rs (3)

10-11: Import looks correct.

Importing Tipset alongside TipsetKey is appropriate, as ChainHead returns a Tipset. No issues.


26-26: Good choice using indicatif for terminal UX.

The spinner makes the wait mode clearer to users. No issues.


133-136: Early check for F3 process is good.

Fail-fast when F3 isn’t running avoids confusing “not in sync” messaging. Looks good.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/rpc/methods/f3/types.rs (3)

21-21: Good call enforcing non-emptiness via a dedicated type alias.

Adopting nunny::Vec as NonEmpty aligns with the project’s preference for fail-fast invariants and removes a whole class of empty-vector checks.

If we intend to use NonEmpty across multiple modules, consider re-exporting it once (e.g., in a prelude or a common types module) to avoid scattering use nunny::Vec as NonEmpty; throughout the codebase.


541-543: Same here: chain_head is now infallible under the NonEmpty invariant.

Matches how the new CLI readiness check uses the head safely.

Optionally add #[inline] to both tiny getters for ergonomics; not critical.


498-501: ECChain non-empty invariant confirmed – update schema for clarity

  • Type safety: Enforcing FinalityCertificate.ec_chain as NonEmpty<ECTipSet> is correct—callers can now use .first()/.last() without checking emptiness.
  • Serde support: crate::lotus_json handles NonEmpty<T> generically (via impl<T> HasLotusJson for nunny::Vec<T> in src/lotus_json/nonempty.rs), so no runtime or compile-time errors will arise when serializing/deserializing NonEmpty`.
  • Call-site safety: No occurrences of ec_chain.is_empty(), .remove(), or .truncate() were found in the codebase, so switching to NonEmpty does not break existing logic.
  • Schema clarity: The generated OpenRPC schema still uses a nullable array (Nullable_Array_of_ECTipSet) without minItems. To reflect the new invariant without altering the JSON shape, update the schemars attribute to add min_items = 1 on a plain Vec<ECTipSet>.

Suggested diff:

 #[serde(rename = "ECChain", with = "crate::lotus_json")]
-#[schemars(with = "LotusJson<NonEmpty<ECTipSet>>")]
+#[schemars(min_items = 1, with = "LotusJson<Vec<ECTipSet>>")]
 pub ec_chain: NonEmpty<ECTipSet>,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9dcf6d7 and 8fa0e8c.

📒 Files selected for processing (2)
  • src/cli/subcommands/f3_cmd.rs (5 hunks)
  • src/rpc/methods/f3/types.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cli/subcommands/f3_cmd.rs
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: docker-lint
  • GitHub Check: python-lint
  • GitHub Check: tests
  • GitHub Check: tests-release
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: Deploy to Cloudflare Pages
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
src/rpc/methods/f3/types.rs (1)

537-539: Getter now leverages the non-empty guarantee—nice and safe.

Using self.ec_chain.first() (NonEmpty) removes the need for Option/unwraps.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/cli/subcommands/f3_cmd.rs (4)

94-102: Clarify threshold semantics and expose exit-code behavior in help.

Minor UX polish: consider making threshold units explicit in help (epochs) and documenting the exit codes this subcommand may return (e.g., 0 in-sync, 1 not in sync, 2 fetch failures). This saves users from having to infer behavior from logs.

Apply:

-    /// Checks if F3 is in sync.
+    /// Checks if F3 is in sync.
     Ready {
-        /// Wait until F3 is in sync.
+        /// Wait until F3 is in sync.
         #[arg(long)]
         wait: bool,
-        /// The threshold of the epoch gap between chain head and F3 head within which F3 is considered in sync.
-        #[arg(long, default_value_t = 20)]
+        /// The maximum allowed gap (in epochs) between the chain head and the F3 head to be considered in sync.
+        /// Exit codes: 0=in sync, 1=not in sync, 2=RPC fetch failures.
+        #[arg(long, default_value_t = 20, value_name = "EPOCHS")]
         threshold: usize,
     },

131-200: Fix typo in counter name and include gap in messages for better observability.

  • Rename num_consecutive_fetch_failturesnum_consecutive_fetch_failures.
  • Compute and display the epoch gap. This makes CI logs and user output more actionable without redoing mental math.

Apply:

-                pb.enable_steady_tick(std::time::Duration::from_millis(100));
-                let mut num_consecutive_fetch_failtures = 0;
+                pb.enable_steady_tick(std::time::Duration::from_millis(100));
+                let mut num_consecutive_fetch_failures = 0;
@@
-                        Ok((chain_head, cert_head)) => {
-                            num_consecutive_fetch_failtures = 0;
-                            if cert_head
-                                .chain_head()
-                                .epoch
-                                .saturating_add(threshold.try_into()?)
-                                >= chain_head.epoch()
-                            {
-                                let text = format!(
-                                    "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}",
-                                    chain_head.epoch(),
-                                    cert_head.chain_head().epoch
-                                );
-                                pb.set_message(text);
-                                pb.finish();
-                                break;
-                            } else {
-                                let text = format!(
-                                    "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}",
-                                    chain_head.epoch(),
-                                    cert_head.chain_head().epoch
-                                );
-                                pb.set_message(text);
-                                if !wait {
-                                    pb.finish();
-                                    std::process::exit(1);
-                                }
-                            }
-                        }
+                        Ok((chain_head, cert_head)) => {
+                            num_consecutive_fetch_failures = 0;
+                            let chain_epoch = chain_head.epoch();
+                            let cert_epoch = cert_head.chain_head().epoch;
+                            let gap = chain_epoch.saturating_sub(cert_epoch);
+                            if cert_epoch.saturating_add(threshold.try_into()?) >= chain_epoch {
+                                let text = format!(
+                                    "[+] F3 is in sync. Chain head epoch: {}, F3 head epoch: {}, gap: {}",
+                                    chain_epoch,
+                                    cert_epoch,
+                                    gap
+                                );
+                                pb.set_message(text);
+                                pb.finish();
+                                break;
+                            } else {
+                                let text = format!(
+                                    "[-] F3 is not in sync. Chain head epoch: {}, F3 head epoch: {}, gap: {}",
+                                    chain_epoch,
+                                    cert_epoch,
+                                    gap
+                                );
+                                pb.set_message(text);
+                                if !wait {
+                                    pb.finish();
+                                    std::process::exit(1);
+                                }
+                            }
+                        }
@@
-                            num_consecutive_fetch_failtures += 1;
-                            if num_consecutive_fetch_failtures >= 3 {
+                            num_consecutive_fetch_failures += 1;
+                            if num_consecutive_fetch_failures >= 3 {
                                 eprintln!("Warning: Failed to fetch heads: {e}. Exiting...");
                                 std::process::exit(2);
                             } else {
                                 eprintln!("Warning: Failed to fetch heads: {e}. Retrying...");
                             }

Optional: move interval.tick().await; to the end of the loop to avoid a 1-second initial delay before the first check.


145-149: Minor consistency: prefer the imported Duration alias.

You already use std::time::Duration; Duration::from_millis(100) reads cleaner than std::time::Duration::from_millis(100).

Apply:

-                pb.enable_steady_tick(std::time::Duration::from_millis(100));
+                pb.enable_steady_tick(Duration::from_millis(100));

185-187: Exit-code consistency for non-wait RPC failures (optional).

Today, non-wait mode bails here (likely mapped to exit 1 by the caller), while other failure paths use explicit process::exit(2). If you want strict, scripted semantics, consider exiting with a well-defined code here too (e.g., 2) for parity.

Apply:

-                            if !wait {
-                                anyhow::bail!("Failed to check F3 sync status: {e}");
-                            }
+                            if !wait {
+                                eprintln!("Failed to check F3 sync status: {e}");
+                                std::process::exit(2);
+                            }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8fa0e8c and 40efb12.

📒 Files selected for processing (1)
  • src/cli/subcommands/f3_cmd.rs (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: scripts/tests/calibnet_eth_mapping_check.sh:49-49
Timestamp: 2025-08-07T09:37:03.079Z
Learning: In Forest project scripts, `forest_wait_api` (called within `forest_init`) ensures basic RPC service readiness, while `forest_wait_for_healthcheck_ready` performs additional comprehensive checks including DB backfill completion. When using `--backfill-db` flag, basic RPC operations can proceed after `forest_init`, but operations requiring complete DB backfill should wait for `forest_wait_for_healthcheck_ready`.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-08-11T14:00:47.060Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5886
File: src/daemon/db_util.rs:236-259
Timestamp: 2025-08-11T14:00:47.060Z
Learning: In Forest's snapshot import (`src/daemon/db_util.rs`), when F3 CID is present in snapshot metadata but the actual F3 data is missing, this should cause a hard error as it indicates snapshot corruption. Only errors during the F3 import process itself (after data is successfully retrieved) should be non-fatal and logged.

Applied to files:

  • src/cli/subcommands/f3_cmd.rs
📚 Learning: 2025-08-13T09:43:20.301Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

Applied to files:

  • src/cli/subcommands/f3_cmd.rs
📚 Learning: 2025-08-18T03:09:47.932Z
Learnt from: hanabi1224
PR: ChainSafe/forest#5944
File: src/chain/store/index.rs:0-0
Timestamp: 2025-08-18T03:09:47.932Z
Learning: In Forest's tipset_by_height caching implementation, hanabi1224 prefers performance-conscious solutions that leverage finality guarantees rather than expensive chain walking for fork detection. The approach of constraining cache lookups to finalized epochs (using CHECKPOINT_INTERVAL >= CHAIN_FINALITY) provides fork safety without the performance cost of ancestry verification.

Applied to files:

  • src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (2)
src/blocks/tipset.rs (3)
  • blocks (517-519)
  • epoch (306-308)
  • epoch (543-545)
src/rpc/methods/f3/types.rs (1)
  • chain_head (541-543)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: All lint checks
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: docker-lint
  • GitHub Check: Check
  • GitHub Check: Deploy to Cloudflare Pages
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (rust)
  • GitHub Check: Analyze (actions)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
src/cli/subcommands/f3_cmd.rs (3)

10-10: Importing Tipset is correct and necessary here.

Using Tipset aligns with ChainHead's return type and enables .epoch() usage below. Looks good.


25-25: Good call switching to indicatif for UX.

The spinner improves the CLI feedback loop and pairs well with the async tokio::time::interval used later.


450-451: Non-empty EC chain: direct .last() usage is correct now.

Given ec_chain is enforced non-empty, avoiding Option handling here is appropriate and removes a potential panic site. Good cleanup.

@hanabi1224 hanabi1224 added this pull request to the merge queue Aug 26, 2025
Merged via the queue into main with commit 75262fb Aug 26, 2025
77 of 78 checks passed
@hanabi1224 hanabi1224 deleted the hm/cli-f3-ready branch August 26, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants