-
Notifications
You must be signed in to change notification settings - Fork 182
feat: add --no-progress-timeout to forest-cli f3 ready
#6057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant CLI as forest-cli (f3 ready)
participant F3 as F3 Service
participant N as Node/Chain
Note over CLI: init wait, threshold, no_progress_timeout\nlast_f3_head_epoch=0, last_progress=now
U->>CLI: f3 ready --no-progress-timeout T
loop poll interval
CLI->>F3: fetch cert_head
F3-->>CLI: cert_head (epoch_e)
CLI->>N: fetch chain_head
N-->>CLI: chain_head (epoch_c)
alt F3 head advanced (epoch_e > last_f3_head_epoch)
CLI->>CLI: update last_f3_head_epoch, last_progress=now
end
alt In-sync (epoch_e within threshold of epoch_c)
CLI-->>U: exit 0
else No progress for T
CLI-->>U: warn and exit 3
else Continue waiting
CLI->>CLI: sleep and retry
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches
🧪 Generate unit tests
Comment |
5a2e68e to
40e99a2
Compare
40e99a2 to
14aac3e
Compare
src/cli/subcommands/f3_cmd.rs
Outdated
| /// The threshold of the epoch gap between chain head and F3 head within which F3 is considered in sync. | ||
| #[arg(long, default_value_t = 20)] | ||
| threshold: usize, | ||
| /// Exit after F3 making no progress for this duration. It has no effect when `--wait` is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but we could ensure that --wait is used without resorting to docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Running `target/debug/forest-cli f3 ready --no-progress-timeout 5m`
error: the following required arguments were not provided:
--wait
Usage: forest-cli f3 ready --wait --no-progress-timeout <NO_PROGRESS_TIMEOUT>
For more information, try '--help'.
src/cli/subcommands/f3_cmd.rs
Outdated
| eprintln!( | ||
| "Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..." | ||
| ); | ||
| std::process::exit(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exit code 1 and 2 have been used above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use some constants to easily indicate the exit code cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/cli/subcommands/f3_cmd.rs (4)
161-161: Fix typo “failtures” → “failures” (user-facing logs, variable name)Minor polish; keeps logs professional and avoids confusion.
Apply this diff:
- let mut num_consecutive_fetch_failtures = 0; + let mut num_consecutive_fetch_failures = 0; @@ - num_consecutive_fetch_failtures = 0; + num_consecutive_fetch_failures = 0; @@ - num_consecutive_fetch_failtures += 1; - if num_consecutive_fetch_failtures >= 3 { + num_consecutive_fetch_failures += 1; + if num_consecutive_fetch_failures >= 3 { eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); std::process::exit(2); } else { eprintln!("Warning: Failed to fetch heads: {e}. Retrying..."); }Also applies to: 170-171, 205-211
162-166: Avoid per-iteration TryInto; precompute threshold onceSaves work in the hot loop and avoids an error path inside it.
Apply this diff:
let no_progress_timeout_duration: Duration = no_progress_timeout.into(); let mut interval = tokio::time::interval(Duration::from_secs(1)); + let threshold_i64: i64 = threshold.try_into()?; let mut last_f3_head_epoch = 0; let mut last_progress = Instant::now(); @@ - if f3_head_epoch.saturating_add(threshold.try_into()?) + if f3_head_epoch.saturating_add(threshold_i64) >= chain_head.epoch()Also applies to: 176-177
171-175: Only treat monotonic increases as “progress”Using != will reset the timer on reorgs or oscillations. ‘>’ better matches “made progress”.
Apply this diff:
- if f3_head_epoch != last_f3_head_epoch { + if f3_head_epoch > last_f3_head_epoch { last_f3_head_epoch = f3_head_epoch; last_progress = Instant::now(); }
215-220: Gate the no-progress check by --wait and finish spinner before exitMakes the documented behavior explicit and leaves a clean terminal. Also uses elapsed() idiom.
Apply this diff:
- if last_progress + no_progress_timeout_duration < Instant::now() { - eprintln!( - "Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..." - ); - std::process::exit(3); - } + if wait && last_progress.elapsed() >= no_progress_timeout_duration { + pb.finish_and_clear(); + eprintln!("Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..."); + std::process::exit(3); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)src/cli/subcommands/f3_cmd.rs(5 hunks)
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (1)
src/rpc/methods/f3/types.rs (1)
chain_head(541-543)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
🔇 Additional comments (1)
src/cli/subcommands/f3_cmd.rs (1)
106-109: New --no-progress-timeout flag looks goodThe CLI shape and default are reasonable, and the help text clarifies it only applies with --wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/cli/subcommands/f3_cmd.rs (2)
99-99: Prefer arg-level requires over ArgGroup here.Clap groups often need an explicit id and tend to be heavier than necessary. Making
--no-progress-timeoutdirectly require--waitis simpler and avoids potential macro pitfalls.- #[group(args = ["no_progress_timeout"], requires = "wait")] Ready { /// Wait until F3 is in sync. #[arg(long)] wait: bool, @@ - /// Exit after F3 making no progress for this duration. - #[arg(long, default_value = "10m")] + /// Exit after F3 making no progress for this duration. + #[arg(long, default_value = "10m", requires = "wait")] no_progress_timeout: humantime::Duration, },Also applies to: 107-110
216-221: Finish the spinner before exiting on no-progress timeout.Minor UX polish: finalize the progress bar to avoid a lingering spinner on stderr.
- if last_progress + no_progress_timeout_duration < Instant::now() { - eprintln!( + if last_progress + no_progress_timeout_duration < Instant::now() { + pb.finish(); + eprintln!( "Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..." ); std::process::exit(3); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/subcommands/f3_cmd.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (1)
src/rpc/methods/f3/types.rs (1)
chain_head(541-543)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: All lint checks
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
🔇 Additional comments (4)
src/cli/subcommands/f3_cmd.rs (4)
7-11: LGTM: added Instant import is appropriate for timeout tracking.
139-143: LGTM: Ready variant destructuring updated correctly.
163-167: LGTM: progress tracking state initialization is sound.
171-177: LGTM: head-epoch change detection correctly updates last_progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/cli/subcommands/f3_cmd.rs (1)
144-147: Good: exit codes are centralized as constantsThis aligns with prior review feedback and improves clarity for callers and tests.
🧹 Nitpick comments (1)
src/cli/subcommands/f3_cmd.rs (1)
213-214: Finish spinner before exiting on repeated fetch failuresMinor UX nit: finish the spinner before
exit(2)to avoid a dangling spinner on stderr.Apply this diff:
- eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); - std::process::exit(EXIT_CODE_F3_FAIL_TO_FETCH_HEAD); + eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); + pb.finish(); + std::process::exit(EXIT_CODE_F3_FAIL_TO_FETCH_HEAD);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/subcommands/f3_cmd.rs(6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (1)
src/rpc/methods/f3/types.rs (1)
chain_head(541-543)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: Build MacOS
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: All lint checks
🔇 Additional comments (2)
src/cli/subcommands/f3_cmd.rs (2)
176-181: Progress tracking logic looks goodTracking
f3_head_epochchanges to bumplast_progressis sound and cheap. Nice.
7-11: Import ofInstantis appropriateUsing
std::time::Instanthere is fine alongsidetokio::time::interval.
src/cli/subcommands/f3_cmd.rs
Outdated
| #[group(args = ["no_progress_timeout"], requires = "wait")] | ||
| Ready { | ||
| /// Wait until F3 is in sync. | ||
| #[arg(long)] | ||
| wait: bool, | ||
| /// The threshold of the epoch gap between chain head and F3 head within which F3 is considered in sync. | ||
| #[arg(long, default_value_t = 20)] | ||
| threshold: usize, | ||
| /// Exit after F3 making no progress for this duration. | ||
| #[arg(long, default_value = "10m")] | ||
| no_progress_timeout: humantime::Duration, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Bug: --no-progress-timeout applies even when the flag isn’t passed (due to default_value)
With default_value = "10m", the timeout is always set and the no-progress exit will trigger after 10 minutes even if the user did not pass --no-progress-timeout. The PR objective states the flag should cause exit when provided. Fix by making the arg optional, only enabling the behavior when present, and using a “default when present without value”.
Apply these diffs:
- Make the arg optional and bind
requiresdirectly on the arg (drop the group):
- #[group(args = ["no_progress_timeout"], requires = "wait")]
+ // requires handled at arg level
Ready {
@@
- /// Exit after F3 making no progress for this duration.
- #[arg(long, default_value = "10m")]
- no_progress_timeout: humantime::Duration,
+ /// Exit after F3 making no progress for this duration (only when the flag is provided).
+ #[arg(long, requires = "wait", num_args(0..=1), default_missing_value = "10m")]
+ no_progress_timeout: Option<humantime::Duration>,- Store an optional duration:
- let no_progress_timeout_duration: Duration = no_progress_timeout.into();
+ let no_progress_timeout_duration: Option<Duration> =
+ no_progress_timeout.map(Into::into);- Guard the timeout check and use
elapsed()(more idiomatic, avoids time arithmetic):
- if last_progress + no_progress_timeout_duration < Instant::now() {
- eprintln!(
- "Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..."
- );
- std::process::exit(EXIT_CODE_F3_NO_PROGRESS_TIMEOUT);
- }
+ if let Some(timeout) = no_progress_timeout_duration {
+ if last_progress.elapsed() >= timeout {
+ pb.finish();
+ eprintln!(
+ "Warning: F3 made no progress in the past {}. Exiting...",
+ humantime::format_duration(timeout)
+ );
+ std::process::exit(EXIT_CODE_F3_NO_PROGRESS_TIMEOUT);
+ }
+ }To confirm behavior, after the change please verify:
forest-cli f3 ready --waitwaits indefinitely without exiting on stall.forest-cli f3 ready --wait --no-progress-timeoutuses a default 10m and exits on stall.forest-cli f3 ready --wait --no-progress-timeout 30sexits after ~30s stall.
Also applies to: 167-171, 220-225
Make --no-progress-timeout truly optional and only active when passed
default_value = "10m" means the 10-minute timeout always applies. Refactor in src/cli/subcommands/f3_cmd.rs (lines 99–110, 167–171, 220–225):
- Drop the
#[group(...)]; add on the flag itself - Change
#[arg(long, default_value = "10m")] no_progress_timeout: humantime::Duration,
to
#[arg(long, requires = "wait", num_args(0..=1), default_missing_value = "10m")] no_progress_timeout: Option<humantime::Duration>, - Map to
Option<Duration>and wrap your timeout check inif let Some(timeout) = no_progress_timeout.map(Into::into) { if last_progress.elapsed() >= timeout { /* exit */ } }
This ensures --no-progress-timeout only takes effect when provided.
🤖 Prompt for AI Agents
In src/cli/subcommands/f3_cmd.rs around lines 99–110 (and update uses at 167–171
and 220–225), the current #[group(...)] and default_value = "10m" make the
10-minute no-progress timeout always active; instead remove the #[group(...)]
and apply the attributes on the flag itself, change the field to an optional
humantime::Duration (use #[arg(long, requires = "wait", num_args(0..=1),
default_missing_value = "10m")] no_progress_timeout:
Option<humantime::Duration>), update all code that consumes it to map
Option<humantime::Duration> into Option<std::time::Duration> (e.g.,
no_progress_timeout.map(Into::into)) and wrap the timeout check in if let
Some(timeout) = ... { if last_progress.elapsed() >= timeout { /* exit */ } } so
the timeout only takes effect when the flag is provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified the group usage. Always having a 10m timeout is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanabi1224 Understood! Having a default 10-minute timeout makes sense from a fail-fast perspective - it prevents the command from hanging indefinitely even when users don't explicitly specify the timeout flag. This aligns well with preventing operational issues in automated scenarios.
✏️ Learnings added
Learnt from: hanabi1224
PR: ChainSafe/forest#6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.905Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/cli/subcommands/f3_cmd.rs (6)
106-109: Make default behavior explicit in help and allow flag without valueGiven the default 10m behavior is desired even when the flag isn’t passed, please (a) state this in the help text, and (b) allow users to specify the flag with no value to use the default.
Apply:
- /// Exit after F3 making no progress for this duration. - #[arg(long, default_value = "10m", requires = "wait")] + /// Exit after F3 makes no progress for this duration while waiting (default: 10m). + #[arg( + long, + requires = "wait", + default_value = "10m", + num_args(0..=1), + default_missing_value = "10m" + )] no_progress_timeout: humantime::Duration,Please confirm the changelog and CLI help text clearly convey that:
forest-cli f3 ready --waituses a default 10m no-progress timeout (implicit), and--no-progress-timeoutcan be provided without a value to use 10m.
138-146: Good: dedicated exit codes; consider centralizingNice improvement using named constants. Consider centralizing CLI exit codes (e.g., a small enum or module-level consts shared across subcommands) for consistency and discoverability.
180-181: Tiny nit: precompute threshold conversion onceMinor alloc/parse avoidable in the loop; precompute once above the loop and reuse.
Example (outside this hunk):
let threshold_i64: i64 = threshold.try_into()?; ... if f3_head_epoch.saturating_add(threshold_i64) >= chain_head.epoch() { ... }
200-201: Finish and clear the spinner before exiting (cleaner TTY)Use finish_and_clear to avoid leaving spinner artifacts on the terminal.
- pb.finish(); + pb.finish_and_clear(); std::process::exit(EXIT_CODE_F3_NOT_IN_SYNC);
212-213: Also clear the spinner on fetch-failure exitConsistent UX on all early exits.
- eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); - std::process::exit(EXIT_CODE_F3_FAIL_TO_FETCH_HEAD); + pb.finish_and_clear(); + eprintln!("Warning: Failed to fetch heads: {e}. Exiting..."); + std::process::exit(EXIT_CODE_F3_FAIL_TO_FETCH_HEAD);
219-224: Gate timeout under --wait and use elapsed() (more idiomatic); clear spinner before exitThe check only matters while waiting. elapsed() reads clearer than Instant arithmetic and avoids edge cases.
- if last_progress + no_progress_timeout_duration < Instant::now() { - eprintln!( - "Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..." - ); - std::process::exit(EXIT_CODE_F3_NO_PROGRESS_TIMEOUT); - } + if wait && last_progress.elapsed() >= no_progress_timeout_duration { + pb.finish_and_clear(); + eprintln!( + "Warning: F3 made no progress in the past {}. Exiting...", + humantime::format_duration(no_progress_timeout_duration) + ); + std::process::exit(EXIT_CODE_F3_NO_PROGRESS_TIMEOUT); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cli/subcommands/f3_cmd.rs(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.905Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
📚 Learning: 2025-09-09T10:37:17.905Z
Learnt from: hanabi1224
PR: ChainSafe/forest#6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.905Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.
Applied to files:
src/cli/subcommands/f3_cmd.rs
🧬 Code graph analysis (1)
src/cli/subcommands/f3_cmd.rs (1)
src/rpc/methods/f3/types.rs (1)
chain_head(541-543)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build MacOS
- GitHub Check: Build Ubuntu
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: All lint checks
🔇 Additional comments (3)
src/cli/subcommands/f3_cmd.rs (3)
7-11: LGTM: importsBringing Instant and LazyLock in is appropriate for the new timeout/progress logic.
166-170: Progress tracking setup looks soundStoring the converted Duration once and initializing last_progress is fine. No functional concerns here.
175-179: Solid: progress tick tied to F3 head epoch changesUpdating last_progress only when F3 head advances captures true “work made” and avoids false positives from transient RPC successes.
Summary of changes
Changes introduced in this pull request:
--no-progress-timeouttoforest-cli f3 readyto exit when F3 is stuck for the given timeout.Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit
New Features
Documentation
Tests