feat(narrate): retry visibility — structured events + live CLI narration (#91)#113
Conversation
…ion (#91) Replaces every silent `thread::sleep(delay)` at a retry-backoff site with a structured event + human-readable warn line. Closes the 60-minute silent- window problem from v0.3.0-rc.1 where operators had to run an external sparse-index monitor just to know the train was alive. ## Type addition (shipper-types) New `EventType::RetryBackoffStarted` variant carries the full retry context: - `attempt` — the just-failed attempt number (1-indexed) - `max_attempts` — total budget - `delay_ms` — how long we'll sleep - `next_attempt_at: DateTime<Utc>` — computed wake time (useful for dashboards and ETAs) - `reason: ErrorClass` — what class of failure triggered the retry - `message: String` — one-line human-facing description Additive. No breaking changes. ## Wire-in Two helpers, one per retry loop shape: - `engine::parallel::publish::emit_retry_backoff` — takes Arc<Mutex<_>> wrappers; used at the 3 retry-backoff sites in the parallel publish loop (cargo-failure retry, readiness-not-visible retry, readiness-errored retry) - `engine::mod::emit_retry_backoff_event` — takes raw &mut references; used at the 2 retry-backoff sites in the sequential publish loop Each helper: 1. Records a `RetryBackoffStarted` event and flushes to events.jsonl 2. Warns via Reporter with the rich format: `name@version: <message> (<ErrorClass>); next attempt in <duration> (attempt N/M)` 3. Sleeps for the configured delay ## Behavior delta **Before:** silent `thread::sleep(delay)` on every retry; operators stared at a quiet CI log for 9–12 minutes between each rate-limit window. **After:** every retry emits one event + one warn line carrying reason, duration, next-attempt time, and progress (N of M). Three readiness-retry sites that were FULLY silent before now also narrate. ## Test plan - [x] `cargo check` + `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] shipper-types tests: 268 pass - [x] `engine::parallel` + `engine::tests` + `state::consistency`: 199 pass - [x] Updated one existing assertion that matched the old "retrying in" text to the new "next attempt in" text ## Scope Closes #91's PR 1. Follow-up PR 2 (CLI-specific rendering polish like `shipper watch` subcommand, ETA/progress line aggregation) is tracked under the same issue but out of scope for this PR — the structured event is here for any consumer to render however they like. ## Related - Parent issue: #91 (umbrella: #103 Narrate — Partial) - Master roadmap: #109 (Now block item 3)
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 9 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0ac2d4ce6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| attempt.saturating_add(1), | ||
| max_attempts, |
There was a problem hiding this comment.
Avoid advertising a retry after the final allowed attempt
emit_retry_backoff_event always reports the next attempt as attempt + 1, but the loop still invokes this helper when attempt == max_attempts; after that sleep, the loop exits and no further publish attempt happens. This yields impossible progress like attempt 4/3 and emits RetryBackoffStarted for a retry that cannot occur, which can mislead operators or any tooling that reads retry telemetry (the parallel helper repeats the same pattern in engine/parallel/publish.rs).
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
… follow-on) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109
… follow-on) (#115) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109
Implements PR 1 from the scout's plan on issue #103. Wires the existing RetryBackoffStarted event (landed in PR #113) to a live CLI countdown in the publish progress reporter, lifting the previously dead `ProgressReporter::set_status` function and replacing the silent `thread::sleep(delay)` window with per-second narration in TTY mode. ## What changed - `engine::Reporter` (and the parallel `Reporter` mirror) gain a new `retry_wait(name, version, attempt, max_attempts, delay, reason, msg)` default method. The default impl preserves the pre-#103 behavior (single `warn()` line + `thread::sleep(delay)`), so every existing reporter implementation continues to behave exactly as before. - The two retry-backoff helpers (`engine::emit_retry_backoff_event` and `engine::parallel::publish::emit_retry_backoff`) now record the `RetryBackoffStarted` event and then delegate the human-facing message + the backoff sleep to `Reporter::retry_wait`. The events.jsonl schema is unchanged. - `ProgressReporter::set_status` is lifted off `#[allow(dead_code)]`. - New `ProgressReporter::retry_countdown(...)` renders: - **TTY:** `[1/N] retrying name@ver in 45s... (attempt 2/5, reason: Retryable) - <msg>` refreshed once per second via `set_status` until the wait expires. - **Non-TTY (CI):** a single `[retry] name@ver: msg (Retryable); next attempt in 45s (attempt 2/5)` line via `eprintln!`, then `thread::sleep(delay)` — no per-second spam in pipeline logs. - **Quiet:** silent, but still blocks for the full delay. - `CliReporter` grows an optional `progress: Option<ProgressReporter>` field. The `publish` and `resume` subcommands install the existing per-registry `ProgressReporter` on the reporter before the engine call (and `take_progress().finish()` after), so the engine's retry-narration path drives the live countdown. - `CliReporter::retry_wait` overrides the trait default to route through the installed progress, with a graceful fallback to the warn-line behavior for callers without a progress bar (e.g. tests). ## Acceptance criteria from the scout's plan - [x] When `RetryBackoffStarted` is emitted, operator sees: - TTY: countdown line ticking down once per second - Non-TTY: one-shot `[retry] ...` line then sleep - [x] Event is still emitted to events.jsonl; no event-schema changes - [x] `ProgressReporter::set_status` is no longer dead code - [x] Existing reporter implementations (TestReporter, CollectingReporter, SendReporter) continue to compile and behave the same via the default trait impl ## Tests - `progress::tests`: 4 new cases — silent-mode blocks for delay, zero delay returns immediately, non-TTY one-shot line drives the !is_tty branch, exotic attempt/max boundaries don't panic. - `lib::tests`: 3 new cases — `CliReporter::retry_wait` without progress blocks for delay, with progress routes through countdown, default trait impl preserves the canonical warn-line format. - All 113 `shipper-cli` lib tests pass; all 22 `cli_e2e` integration tests pass; all 1890 `shipper-core` tests pass; all 39 `shipper` facade tests pass.
* feat(narrate): live CLI countdown during retry/backoff waits (#103 PR 1) Implements PR 1 from the scout's plan on issue #103. Wires the existing RetryBackoffStarted event (landed in PR #113) to a live CLI countdown in the publish progress reporter, lifting the previously dead `ProgressReporter::set_status` function and replacing the silent `thread::sleep(delay)` window with per-second narration in TTY mode. ## What changed - `engine::Reporter` (and the parallel `Reporter` mirror) gain a new `retry_wait(name, version, attempt, max_attempts, delay, reason, msg)` default method. The default impl preserves the pre-#103 behavior (single `warn()` line + `thread::sleep(delay)`), so every existing reporter implementation continues to behave exactly as before. - The two retry-backoff helpers (`engine::emit_retry_backoff_event` and `engine::parallel::publish::emit_retry_backoff`) now record the `RetryBackoffStarted` event and then delegate the human-facing message + the backoff sleep to `Reporter::retry_wait`. The events.jsonl schema is unchanged. - `ProgressReporter::set_status` is lifted off `#[allow(dead_code)]`. - New `ProgressReporter::retry_countdown(...)` renders: - **TTY:** `[1/N] retrying name@ver in 45s... (attempt 2/5, reason: Retryable) - <msg>` refreshed once per second via `set_status` until the wait expires. - **Non-TTY (CI):** a single `[retry] name@ver: msg (Retryable); next attempt in 45s (attempt 2/5)` line via `eprintln!`, then `thread::sleep(delay)` — no per-second spam in pipeline logs. - **Quiet:** silent, but still blocks for the full delay. - `CliReporter` grows an optional `progress: Option<ProgressReporter>` field. The `publish` and `resume` subcommands install the existing per-registry `ProgressReporter` on the reporter before the engine call (and `take_progress().finish()` after), so the engine's retry-narration path drives the live countdown. - `CliReporter::retry_wait` overrides the trait default to route through the installed progress, with a graceful fallback to the warn-line behavior for callers without a progress bar (e.g. tests). ## Acceptance criteria from the scout's plan - [x] When `RetryBackoffStarted` is emitted, operator sees: - TTY: countdown line ticking down once per second - Non-TTY: one-shot `[retry] ...` line then sleep - [x] Event is still emitted to events.jsonl; no event-schema changes - [x] `ProgressReporter::set_status` is no longer dead code - [x] Existing reporter implementations (TestReporter, CollectingReporter, SendReporter) continue to compile and behave the same via the default trait impl ## Tests - `progress::tests`: 4 new cases — silent-mode blocks for delay, zero delay returns immediately, non-TTY one-shot line drives the !is_tty branch, exotic attempt/max boundaries don't panic. - `lib::tests`: 3 new cases — `CliReporter::retry_wait` without progress blocks for delay, with progress routes through countdown, default trait impl preserves the canonical warn-line format. - All 113 `shipper-cli` lib tests pass; all 22 `cli_e2e` integration tests pass; all 1890 `shipper-core` tests pass; all 39 `shipper` facade tests pass. * fix(parallel): release reporter lock before retry sleep * fix(narrate): restore live retry feedback in parallel mode
Closes #91 PR 1 — the highest-leverage operator-UX fix from ROADMAP's Now block.
Problem
During v0.3.0-rc.1's first real publish, Shipper absorbed 41 retries across a 69-minute run and emitted zero live signal during the 9–12 minute rate-limit windows. Operators had to run an external sparse-index polling monitor to know the train was alive. Three `thread::sleep(delay)` sites were fully silent.
Solution
New `EventType::RetryBackoffStarted` variant carries the full retry context:
`attempt`, `max_attempts`, `delay_ms`, `next_attempt_at: DateTime`, `reason: ErrorClass`, `message: String`.
Two helpers, one per retry loop shape:
Each emits the structured event, warns via Reporter with rich format, then sleeps.
Behaviour delta
Before: silent `thread::sleep(delay)` on every retry.
After: every retry emits one event + one warn line:
```
shipper-sparse-index@0.3.0-rc.1: HTTP 429 Too Many Requests (Retryable); next attempt in 10m 30s (attempt 3/12)
```
Three previously-silent readiness-retry sites now also narrate.
Files
Net: 3 files, +199 / -24.
Test plan
Out of scope
Related