feat(dispatch): crates.io-aware backoff for new-crate rate limits (#94) by EffortlessSteven · Pull Request #114 · EffortlessMetrics/shipper

EffortlessSteven · 2026-04-17T04:58:08Z

Closes #94 PR 1 — the retry efficiency gap from v0.3.0-rc.1.

Problem

Generic exponential backoff from `opts.base_delay` wastes ~5 attempts when a new-crate publish hits crates.io's documented 10-minute new-crate rate-limit window. The v0.3.0-rc.1 retrospective estimated 35 of 41 retries were preventable for exactly this reason.

Solution

Add a layered `registry_aware_backoff`: if a retry's error message looks like a rate-limit AND the crate is brand-new, floor the delay at the documented 10-minute window. Everything else keeps generic exponential.

New helpers (`runtime/execution`)

`CRATES_IO_NEW_CRATE_WINDOW = 10 * 60s` — documented constant from crates.io rate-limit docs
`looks_like_rate_limit(msg)` — detects "429", "rate limit", "too many requests" phrasings in cargo stderr
`registry_aware_backoff(..., is_new_crate, error_message)` — wraps `backoff_delay`, floors at the window only when both conditions match

Wire-in (both paths, lazy detection)

Both parallel and sequential retry loops replace `backoff_delay` with `registry_aware_backoff`, but:

`crate_exists` / `check_new_crate` is only queried when the error message matches a rate-limit phrasing → zero extra network calls on the happy path or on generic network-blip retries
The answer is cached per-package via `Option` → subsequent retries don't re-query

Test plan

`cargo clippy --all-targets -- -D warnings` clean
`cargo fmt --check` clean
5 new unit tests:
- `looks_like_rate_limit` positive + negative phrasings
- extends for new-crate + rate-limit (floored at 10 min)
- UNCHANGED for existing-crate + rate-limit (higher per-minute budget)
- UNCHANGED for new-crate + non-rate-limit (don't wait 10 min for a network blip)
- respects longer generic delay when it exceeds the window
192 engine tests pass (no regressions)
shipper-types (268), state::consistency (9), reconcile (2) all green

An initial eager version of this change added an extra registry request at publish-start that broke 32 existing tests whose mock servers weren't provisioned for it. The lazy-detection refactor fixed that cleanly.

Expected impact

On the next first-publish wave, rate-limited retries of brand-new crates skip the wasted short backoffs and go directly to ~10-minute waits. Projected ~35-retry savings per the v0.3.0-rc.1 pattern. Narration from #91 makes the wait visible, so the operator sees one big wait instead of six small misleading ones.

Out of scope

`Retry-After` HTTP header parsing (requires cargo stderr → structured response plumbing) — follow-up Registry-aware backoff: layer documented constraints > Retry-After > exponential #94 PR 2
Full `RegistryProfile` trait — follow-up Registry-aware backoff: layer documented constraints > Retry-After > exponential #94 PR 3
`PublishRegime` enum threaded through `PlannedPackage` — not needed for this win; lazy detection at retry time is sufficient

Parent: Registry-aware backoff: layer documented constraints > Retry-After > exponential #94
Umbrella: Competency: Profile / Adapt — gap audit and roadmap #106 (Profile/Adapt — was Partial, moves toward Done)
Master roadmap: Roadmap: post-v0.3.0-rc.1 product scorecard across nine competencies #109 (Now block item 4)

Before: generic exponential backoff on retry, starting at opts.base_delay and doubling. For a new-crate publish hitting crates.io's documented 10-minute rate-limit window, this burned ~5 guaranteed-to-fail attempts (10s, 20s, 40s, 80s, 160s) before the window could plausibly have cleared. The v0.3.0-rc.1 retrospective estimated 35 of 41 retries were preventable for exactly this reason. After: when a retry's error message looks like a rate-limit signal AND the crate is brand-new, floor the delay at 10 minutes. Everything else keeps the generic exponential behavior. ## New helpers (runtime/execution) - `CRATES_IO_NEW_CRATE_WINDOW = 10 * 60s` — documented crates.io constant - `looks_like_rate_limit(msg)` — detects "429" / "rate limit" / "too many requests" phrasings in cargo stderr - `registry_aware_backoff(base, max, attempt, strategy, jitter, is_new_crate, error_message)` — wraps `backoff_delay`, floors at CRATES_IO_NEW_CRATE_WINDOW only when both `is_new_crate` and `looks_like_rate_limit(msg)` are true ## Wire-in (lazy, no happy-path regressions) Both the parallel and sequential retry loops now: 1. Use `registry_aware_backoff` instead of `backoff_delay` on the Retryable/Ambiguous branch 2. Only query `crate_exists` / `check_new_crate` when the error message matches a rate-limit phrasing — zero extra network calls on the happy path or on generic network-blip retries 3. Cache the is_new_crate answer per-package via `Option<bool>`, so subsequent retries of the same crate don't re-query ## Test plan - [x] `cargo check` + `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 5 new unit tests in runtime::execution covering: - `looks_like_rate_limit` positive and negative phrasings - `registry_aware_backoff` extends for new-crate + rate-limit - `registry_aware_backoff` UNCHANGED for existing-crate + rate-limit - `registry_aware_backoff` UNCHANGED for new-crate + non-rate-limit - `registry_aware_backoff` respects longer generic delay when it exceeds window - [x] 192 engine tests pass (no regressions after lazy-detection refactor; initial eager version added an extra registry request that test mocks weren't provisioned for — lazy path fixed that) - [x] shipper-types (268), state::consistency (9), reconcile (2) all green ## Expected real-world impact On the next first-publish wave (hypothetical v0.3.0-rc.2 or v0.3.1 first-publish), rate-limited retries of brand-new crates skip the wasted short backoffs and wait ~10 minutes directly. Projected ~35-retry savings per the v0.3.0-rc.1 pattern. ## Out of scope - `Retry-After` HTTP header parsing (requires plumbing cargo stderr into a structured response) — follow-up #94 PR 2 - Full `RegistryProfile` abstraction — follow-up #94 PR 3 - Propagating a `PublishRegime` enum through PlannedPackage — not strictly needed for this win; lazy detection at retry time is sufficient ## Related - Parent issue: #94 (Dispatch) - Competency umbrella: #106 (Profile/Adapt — was Partial, now Partial+) - Master roadmap: #109 (Now block item 4)

gemini-code-assist · 2026-04-17T04:58:13Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-04-17T04:58:16Z

Warning

Rate limit exceeded

@EffortlessSteven has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 35 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 35 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8e34bded-b1d3-4fc4-99f2-f7630dedce15

📥 Commits

Reviewing files that changed from the base of the PR and between dbfb76c and 8bd2a23.

📒 Files selected for processing (3)

crates/shipper/src/engine/mod.rs
crates/shipper/src/engine/parallel/publish.rs
crates/shipper/src/runtime/execution/mod.rs

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/94-registry-aware-backoff

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8bd2a23545

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T05:01:38Z

                        ErrorClass::Retryable | ErrorClass::Ambiguous => {
-                            let delay = backoff_delay(
+                            let is_new_crate =
+                                if crate::runtime::execution::looks_like_rate_limit(&msg) {


Detect rate limits from cargo stderr, not classifier summary

looks_like_rate_limit(&msg) is checking the normalized classifier message (msg from classify_cargo_failure) rather than the raw cargo output, so this condition never sees strings like 429/too many requests and the crates.io-aware 10-minute floor is effectively disabled. In practice, retryable failures currently produce generic summaries (e.g. transient failure text), so is_new_crate stays false and this optimization never activates; the same pattern is also present in the parallel publish path.

Useful? React with 👍 / 👎.

codecov · 2026-04-17T05:31:04Z

Codecov Report

❌ Patch coverage is 94.56522% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/shipper/src/engine/mod.rs	70.00%	3 Missing ⚠️
crates/shipper/src/engine/parallel/publish.rs	75.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

… follow-on) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109

… follow-on) (#115) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

EffortlessSteven merged commit 4ee237f into main Apr 17, 2026
20 of 21 checks passed

EffortlessSteven deleted the feat/94-registry-aware-backoff branch April 17, 2026 05:31

EffortlessSteven mentioned this pull request Apr 17, 2026

feat(reconcile): resume-path reconciliation for Ambiguous packages (#99 follow-on) #115

Merged

3 tasks

This was referenced Apr 17, 2026

Roadmap: post-v0.3.0-rc.1 product scorecard across nine competencies #109

Open

Competency: Survive — gap audit and roadmap #101

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dispatch): crates.io-aware backoff for new-crate rate limits (#94)#114

feat(dispatch): crates.io-aware backoff for new-crate rate limits (#94)#114
EffortlessSteven merged 1 commit into
mainfrom
feat/94-registry-aware-backoff

EffortlessSteven commented Apr 17, 2026

Uh oh!

gemini-code-assist Bot commented Apr 17, 2026

Uh oh!

coderabbitai Bot commented Apr 17, 2026

Rate limit exceeded

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

codecov Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

EffortlessSteven commented Apr 17, 2026

Problem

Solution

New helpers (`runtime/execution`)

Wire-in (both paths, lazy detection)

Test plan

Expected impact

Out of scope

Related

Uh oh!

gemini-code-assist Bot commented Apr 17, 2026

Uh oh!

coderabbitai Bot commented Apr 17, 2026

Rate limit exceeded

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 17, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant