feat(reconcile): resume-path reconciliation for Ambiguous packages (#99 follow-on)#115
Conversation
… follow-on) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 5 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 25e6669834
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "{}@{}: reconciled as published on resume (no republish)", | ||
| p.name, p.version | ||
| )); | ||
| continue; |
There was a problem hiding this comment.
Preserve receipt entry after resume reconciliation success
When an Ambiguous package reconciles to Published in sequential mode, this branch immediately continues, which bypasses the normal receipt append and completion accounting later in run_publish. In resume scenarios, that makes receipt.packages omit a crate that was just resolved and can under-report PublishCompleted metrics (for a single-package resume, totals can become 0 even though state transitioned to published). This new path should emit a success receipt before leaving the loop.
Useful? React with 👍 / 👎.
| return PackagePublishResult { | ||
| result: Ok(PackageReceipt { | ||
| name: p.name.clone(), | ||
| version: p.version.clone(), | ||
| attempts: 0, |
There was a problem hiding this comment.
Record PackagePublished before early return on reconciled publish
In parallel mode, the resume-reconciliation Published branch updates state and returns a successful receipt without emitting EventType::PackagePublished. Other publish-success paths do emit that event, and the end-of-run consistency check derives published truth from PackagePublished; this branch can therefore create in_state_only drift whenever reconciliation resolves to published. Emit and flush PackagePublished before returning, mirroring the in-loop reconciliation success path.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The `sequential_reconcile` helper's docstring linked to `crate::engine::parallel::reconcile::reconcile_ambiguous_upload` via rustdoc intra-doc syntax, but the target is `pub(super)` inside `engine::parallel::reconcile` — not resolvable from `engine::mod`. `-Dwarnings` on the Documentation CI lane promoted this to an error. Link was introduced in #115 (resume-path reconcile) but only tripped CI now because the docs job had a later-run schedule. Replaced the markdown link with plain text — the mirror-relationship is still explained, just without a clickable doc link across a pub(super) boundary. Note: the Coverage (llvm-cov) check also failed on this run, with `shipper-registry::context::tests::verify_ownership_propagates_ network_error` — a pre-existing environment-sensitive flake that reproduces unrelated to this PR. Not addressing it here.
…ar (#92) (#116) * feat(preflight): slim workspace-verify event, strip ANSI, write sidecar (#92) Addresses the "~2KB ANSI blob in events.jsonl" finding from the v0.3.0-rc.1 retrospective. The `PreflightWorkspaceVerify` event no longer dumps raw cargo dry-run stderr with embedded ANSI escapes into the structured event stream. ## What changed 1. **New `strip_ansi` function** in `shipper-output-sanitizer` — a dependency-free byte-level scanner that removes CSI / OSC / two-byte escapes while preserving text and newlines. UTF-8 safe. Covered by 6 unit tests (sgr, multi-code, no-op, cargo-style, UTF-8, OSC). 2. **Preflight event payload** now contains a slim summary: exit_code, sidecar path, and the ANSI-stripped tail-6 of stderr. Typical size drops from ~2KB to a few hundred bytes. 3. **New sidecar file** `<state_dir>/preflight_workspace_verify.txt` carries the full ANSI-stripped output. The Reporter surfaces the sidecar path with a byte count so operators know where to look. ## Schema stability No change to the `EventType::PreflightWorkspaceVerify` variant shape — `{ passed: bool, output: String }` is preserved. The `output` field now carries a compact summary instead of a multi-KB blob. Schema-level changes (structured `exit_code`/`elapsed_ms`/`output_path` fields) are a bigger cross-cutting refactor and remain as a potential follow-up; this PR captures the UX win with minimal disruption. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean across shipper / shipper-output-sanitizer / shipper-types - [x] `cargo fmt --check` clean - [x] 6 new `strip_ansi` unit tests pass - [x] 192 engine tests pass (no regressions) - [x] No existing snapshot tests broke (event shape unchanged) ## Related - Parent: #92 (Narrate — slim preflight evidence) - Pillar umbrella: #103 (Narrate — now more complete) - Master roadmap: #109 * fix(preflight): drop sidecar reporter.info to avoid snapshot churn The initial #92 commit emitted `[info] full dry-run output written to <path> (N bytes)` after writing the sidecar, which broke e2e_expanded::preflight_allow_dirty_snapshot (and would break any portability-sensitive snapshots since the byte count varies by line endings / cargo output stability). The sidecar path is deterministic (<state_dir>/preflight_workspace_ verify.txt) and documented; echoing it to stderr on the happy path adds snapshot churn without giving operators new information they can't discover on their own. Keeps the warning on sidecar write failure (operator needs to know when it DIDN'T land), drops the success-path info line. * fix(docs): drop broken intra-doc link on sequential_reconcile (#92) The `sequential_reconcile` helper's docstring linked to `crate::engine::parallel::reconcile::reconcile_ambiguous_upload` via rustdoc intra-doc syntax, but the target is `pub(super)` inside `engine::parallel::reconcile` — not resolvable from `engine::mod`. `-Dwarnings` on the Documentation CI lane promoted this to an error. Link was introduced in #115 (resume-path reconcile) but only tripped CI now because the docs job had a later-run schedule. Replaced the markdown link with plain text — the mirror-relationship is still explained, just without a clickable doc link across a pub(super) boundary. Note: the Coverage (llvm-cov) check also failed on this run, with `shipper-registry::context::tests::verify_ownership_propagates_ network_error` — a pre-existing environment-sensitive flake that reproduces unrelated to this PR. Not addressing it here.
…low-on) Closes one of the remaining #99 follow-ons identified in the retrospectives: make it explicit, in the rustdoc that downstream tooling and contributors will actually read, that cargo's stdout/stderr output is a classification HINT for Shipper's retry loop — not the authoritative answer about what happened. ## What this changes - `ErrorClass` rustdoc (shipper-types): adds a "Classification is a hint, not truth" section explaining that pattern-matching cargo text drives retry scheduling but ambiguous outcomes MUST be reconciled against registry truth. Cross-refs ReconciliationOutcome and the reconcile module. - `classify_cargo_failure` rustdoc (shipper::runtime::execution): states up-front "This is a hint, not authoritative truth" and points at the reconcile flow for the authoritative resolution on Ambiguous outcomes. - `docs/explanation/why-shipper.md` gains a new section "Cargo stdout is a hint; the registry is the truth" making the same point at the product level — and connecting it to why Shipper can be safer than a naive shell-script `cargo publish` loop. ## Why this matters Contributors reading `ErrorClass` and wondering "can I just pattern-match more cargo strings?" now see the contract: cargo text is the fast-path hint, registry queries are truth. The reconcile flow introduced in #111 (in-flight) and #115 (resume-path) is now documented as the authoritative side of that contract, not just implemented. ## Scope Pure docs/rustdoc change. No code changes. No schema changes. No snapshot churn. Covered by existing tests. - `cargo doc --no-deps -Dwarnings` clean - `cargo test -p shipper-types` + `-p shipper --lib runtime::execution` pass (no regressions) ## Related - Parent: #99 Reconcile - Prior PRs: #111 (in-flight reconcile), #115 (resume-path reconcile) - Pillar: Reconcile (#102)
…low-on) (#117) Closes one of the remaining #99 follow-ons identified in the retrospectives: make it explicit, in the rustdoc that downstream tooling and contributors will actually read, that cargo's stdout/stderr output is a classification HINT for Shipper's retry loop — not the authoritative answer about what happened. ## What this changes - `ErrorClass` rustdoc (shipper-types): adds a "Classification is a hint, not truth" section explaining that pattern-matching cargo text drives retry scheduling but ambiguous outcomes MUST be reconciled against registry truth. Cross-refs ReconciliationOutcome and the reconcile module. - `classify_cargo_failure` rustdoc (shipper::runtime::execution): states up-front "This is a hint, not authoritative truth" and points at the reconcile flow for the authoritative resolution on Ambiguous outcomes. - `docs/explanation/why-shipper.md` gains a new section "Cargo stdout is a hint; the registry is the truth" making the same point at the product level — and connecting it to why Shipper can be safer than a naive shell-script `cargo publish` loop. ## Why this matters Contributors reading `ErrorClass` and wondering "can I just pattern-match more cargo strings?" now see the contract: cargo text is the fast-path hint, registry queries are truth. The reconcile flow introduced in #111 (in-flight) and #115 (resume-path) is now documented as the authoritative side of that contract, not just implemented. ## Scope Pure docs/rustdoc change. No code changes. No schema changes. No snapshot churn. Covered by existing tests. - `cargo doc --no-deps -Dwarnings` clean - `cargo test -p shipper-types` + `-p shipper --lib runtime::execution` pass (no regressions) ## Related - Parent: #99 Reconcile - Prior PRs: #111 (in-flight reconcile), #115 (resume-path reconcile) - Pillar: Reconcile (#102)
…omes (#99 follow-on) Closes the last remaining #99 follow-on: end-to-end integration coverage for the three reconciliation outcomes the state machine can produce. ## Three scenarios added to engine::parallel::tests Each exercises publish_package() against a real tiny_http mock registry with a fake cargo binary configured to exit ambiguously (exit 1, empty stdout/stderr — which classify_publish_failure maps to CargoFailureClass::Ambiguous per existing unit tests). Readiness polling is disabled (readiness.enabled = false) so each reconcile site reduces to a single registry query — avoids flaky request counts that a full polling loop would produce. ### 1. reconcile_bdd_ambiguous_resolves_to_published - cargo fails ambiguously - entry + post-cargo quick checks return 404 - reconcile's single version_exists poll returns 200 - expected: PackageState::Published, attempts=1 (no second cargo) - verifies PublishReconciling + PublishReconciled{Published} events emitted to the on-disk event log ### 2. reconcile_bdd_ambiguous_resolves_to_not_published_then_retries - cargo fails ambiguously on every attempt - registry consistently returns 404 - reconcile resolves to NotPublished, falls through to normal retry - after max_attempts=2, package ends Failed - verifies PublishReconciled{NotPublished} event emitted - verifies cargo was actually retried (attempts=2) ### 3. reconcile_bdd_resume_from_ambiguous_state_skips_republish - state pre-seeded with PackageState::Ambiguous (simulating a prior interrupted run) - entry check returns 404 (bypasses "already published" short-circuit) - resume-path reconcile (from PR #115) fires BEFORE retry loop - single version_exists query returns 200 → Published - expected: attempts=0 (cargo NOT invoked) - verifies via SHIPPER_CARGO_ARGS_LOG that cargo binary was never called - verifies PublishReconciling + PublishReconciled{Published} events emitted ## Verification - cargo test -p shipper --lib -- reconcile_bdd: 3 pass - cargo test -p shipper --lib -- engine::parallel: 70 pass (no regressions) - cargo clippy --all-targets -- -D warnings: clean - cargo fmt --check: clean ## Why events are read from disk publish_package flushes event batches to events.jsonl and clears the in-memory log after each write. Assertions read via EventLog::read_from_file(&events_path) to see the full persisted stream. ## #99 tracker after this merge - [x] In-flight reconcile state machine (#111) - [x] Resume-path reconciliation (#115) - [x] Docs/contracts: cargo text is a hint, registry is truth (#117) - [x] BDD integration scenarios for all 3 outcomes (this PR) #99 is now fully closed.
…omes (#99 follow-on) (#119) Closes the last remaining #99 follow-on: end-to-end integration coverage for the three reconciliation outcomes the state machine can produce. ## Three scenarios added to engine::parallel::tests Each exercises publish_package() against a real tiny_http mock registry with a fake cargo binary configured to exit ambiguously (exit 1, empty stdout/stderr — which classify_publish_failure maps to CargoFailureClass::Ambiguous per existing unit tests). Readiness polling is disabled (readiness.enabled = false) so each reconcile site reduces to a single registry query — avoids flaky request counts that a full polling loop would produce. ### 1. reconcile_bdd_ambiguous_resolves_to_published - cargo fails ambiguously - entry + post-cargo quick checks return 404 - reconcile's single version_exists poll returns 200 - expected: PackageState::Published, attempts=1 (no second cargo) - verifies PublishReconciling + PublishReconciled{Published} events emitted to the on-disk event log ### 2. reconcile_bdd_ambiguous_resolves_to_not_published_then_retries - cargo fails ambiguously on every attempt - registry consistently returns 404 - reconcile resolves to NotPublished, falls through to normal retry - after max_attempts=2, package ends Failed - verifies PublishReconciled{NotPublished} event emitted - verifies cargo was actually retried (attempts=2) ### 3. reconcile_bdd_resume_from_ambiguous_state_skips_republish - state pre-seeded with PackageState::Ambiguous (simulating a prior interrupted run) - entry check returns 404 (bypasses "already published" short-circuit) - resume-path reconcile (from PR #115) fires BEFORE retry loop - single version_exists query returns 200 → Published - expected: attempts=0 (cargo NOT invoked) - verifies via SHIPPER_CARGO_ARGS_LOG that cargo binary was never called - verifies PublishReconciling + PublishReconciled{Published} events emitted ## Verification - cargo test -p shipper --lib -- reconcile_bdd: 3 pass - cargo test -p shipper --lib -- engine::parallel: 70 pass (no regressions) - cargo clippy --all-targets -- -D warnings: clean - cargo fmt --check: clean ## Why events are read from disk publish_package flushes event batches to events.jsonl and clears the in-memory log after each write. Assertions read via EventLog::read_from_file(&events_path) to see the full persisted stream. ## #99 tracker after this merge - [x] In-flight reconcile state machine (#111) - [x] Resume-path reconciliation (#115) - [x] Docs/contracts: cargo text is a hint, registry is truth (#117) - [x] BDD integration scenarios for all 3 outcomes (this PR) #99 is now fully closed.
… follow-on) Completes the third acceptance-criteria BDD scenario for #99. The existing suite already covered Published (registry confirms upload landed) and NotPublished (registry confirms upload didn't land); this adds the operator-actionable StillUnknown case where reconciliation itself can't reach a verdict because the registry is unhealthy. Setup drives every registry query on /api/v1/crates/demo/0.1.0 to 500, so version_exists bails with Err for non-200/404 statuses. reconcile_ambiguous_upload translates that Err into ReconciliationOutcome::StillUnknown. Asserts: - publish returns Err with "reconciliation inconclusive" message - package state is PackageState::Ambiguous (so resume's reconcile block will fire rather than a silent retry) - exactly one cargo invocation — StillUnknown must not blind-retry - PublishReconciling + PublishReconciled{ StillUnknown } events are persisted to events.jsonl for operator visibility Closes the last open AC on #99 acceptance list; reconciliation behavior itself was already implemented in PR #111 and PR #115.
… follow-on) (#144) Completes the third acceptance-criteria BDD scenario for #99. The existing suite already covered Published (registry confirms upload landed) and NotPublished (registry confirms upload didn't land); this adds the operator-actionable StillUnknown case where reconciliation itself can't reach a verdict because the registry is unhealthy. Setup drives every registry query on /api/v1/crates/demo/0.1.0 to 500, so version_exists bails with Err for non-200/404 statuses. reconcile_ambiguous_upload translates that Err into ReconciliationOutcome::StillUnknown. Asserts: - publish returns Err with "reconciliation inconclusive" message - package state is PackageState::Ambiguous (so resume's reconcile block will fire rather than a silent retry) - exactly one cargo invocation — StillUnknown must not blind-retry - PublishReconciling + PublishReconciled{ StillUnknown } events are persisted to events.jsonl for operator visibility Closes the last open AC on #99 acceptance list; reconciliation behavior itself was already implemented in PR #111 and PR #115.
Closes the last remaining safety gap from #99 after the in-flight reconciliation landed in #111.
The gap this closes
When a prior run left a package in `PackageState::Ambiguous` (reconciliation was inconclusive), a subsequent `shipper resume` would fall through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded — the exact scenario the "never blind retry" rule was designed to prevent.
What this adds
Both sequential (`engine::mod`) and parallel (`engine::parallel::publish`) publish paths detect `PackageState::Ambiguous` before any cargo activity and reconcile against registry truth first:
Implementation symmetry
Both emit `PublishReconciling` + `PublishReconciled` events so the event stream records resume-time reconciliation alongside any in-flight reconciliation from earlier runs.
Verification
Related