Skip to content

feat(reconcile): resume-path reconciliation for Ambiguous packages (#99 follow-on)#115

Merged
EffortlessSteven merged 1 commit into
mainfrom
feat/99-resume-reconcile-ambiguous
Apr 17, 2026
Merged

feat(reconcile): resume-path reconciliation for Ambiguous packages (#99 follow-on)#115
EffortlessSteven merged 1 commit into
mainfrom
feat/99-resume-reconcile-ambiguous

Conversation

@EffortlessSteven

Copy link
Copy Markdown
Member

Closes the last remaining safety gap from #99 after the in-flight reconciliation landed in #111.

The gap this closes

When a prior run left a package in `PackageState::Ambiguous` (reconciliation was inconclusive), a subsequent `shipper resume` would fall through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded — the exact scenario the "never blind retry" rule was designed to prevent.

What this adds

Both sequential (`engine::mod`) and parallel (`engine::parallel::publish`) publish paths detect `PackageState::Ambiguous` before any cargo activity and reconcile against registry truth first:

Outcome Action
`Published` mark Published, return success receipt; no republish
`NotPublished` clear state to Pending, fall through to normal publish flow
`StillUnknown` emit `PublishFailed` webhook + halt with operator-actionable error

Implementation symmetry

  • Sequential: new `sequential_reconcile` helper wraps `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec)` shape the parallel helper does
  • Parallel: reuses existing `reconcile_ambiguous_upload` from `engine::parallel::reconcile`

Both emit `PublishReconciling` + `PublishReconciled` events so the event stream records resume-time reconciliation alongside any in-flight reconciliation from earlier runs.

Verification

  • `cargo clippy --all-targets -- -D warnings` clean
  • `cargo fmt --check` clean
  • 192 engine tests pass (no regressions)

Related

… follow-on)

Closes the remaining safety gap from #111: when a prior run left a package
in PackageState::Ambiguous (reconciliation was inconclusive at that time),
a subsequent `shipper resume` previously fell through to blind cargo retry.
That risked re-uploading a crate whose original upload may have actually
succeeded.

## What this adds

Both the sequential (engine::mod) and parallel (engine::parallel::publish)
publish paths now detect `PackageState::Ambiguous` BEFORE any cargo
activity and reconcile against registry truth first.

Outcomes follow the same state machine as in-flight reconciliation:

- **Published** → mark Published, skip any further cargo work for this
  crate, continue/return success receipt with no republish.
- **NotPublished** → clear Ambiguous back to Pending (registry confirms
  no prior upload), fall through to the normal publish flow.
- **StillUnknown** → emit PublishFailed webhook + halt with an
  operator-actionable error; do NOT retry.

## Parallel + sequential symmetry

- Sequential: new `sequential_reconcile` helper in engine::mod wraps the
  full `RegistryClient::is_version_visible_with_backoff` and returns the
  same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the
  parallel path's helper. Called from the per-package state-check match
  arm.
- Parallel: reuses existing `reconcile_ambiguous_upload` from
  engine::parallel::reconcile. Called just before the retry-loop entry,
  returning a completed PackageResult early on Published or StillUnknown.

Both paths emit `PublishReconciling` and `PublishReconciled` events so
the event stream records the resume-time reconciliation alongside any
in-flight reconciliation from earlier runs.

## Test plan

- [x] `cargo clippy --all-targets -- -D warnings` clean
- [x] `cargo fmt --check` clean
- [x] 192 engine tests pass (no regressions)

BDD scenarios for resume-path reconciliation across all three outcomes
are a natural follow-up; the handler shape mirrors the in-flight path so
regressions here surface through existing integration tests that already
cover the outcome shape.

## Related

- Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the
  v0.3.0-rc.1 retrospective)
- Sibling PRs already merged: #111 (in-flight reconcile), #112 (events-
  as-truth), #113 (retry narration), #114 (registry-aware backoff)
- Master roadmap: #109
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@EffortlessSteven has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 5 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 5 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8da96f01-aba1-439d-8753-7aabd1caa580

📥 Commits

Reviewing files that changed from the base of the PR and between 4ee237f and 25e6669.

📒 Files selected for processing (2)
  • crates/shipper/src/engine/mod.rs
  • crates/shipper/src/engine/parallel/publish.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/99-resume-reconcile-ambiguous

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EffortlessSteven EffortlessSteven merged commit bb83ea1 into main Apr 17, 2026
15 of 16 checks passed
@EffortlessSteven EffortlessSteven deleted the feat/99-resume-reconcile-ambiguous branch April 17, 2026 05:38

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25e6669834

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +626 to +629
"{}@{}: reconciled as published on resume (no republish)",
p.name, p.version
));
continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve receipt entry after resume reconciliation success

When an Ambiguous package reconciles to Published in sequential mode, this branch immediately continues, which bypasses the normal receipt append and completion accounting later in run_publish. In resume scenarios, that makes receipt.packages omit a crate that was just resolved and can under-report PublishCompleted metrics (for a single-package resume, totals can become 0 even though state transitioned to published). This new path should emit a success receipt before leaving the loop.

Useful? React with 👍 / 👎.

Comment on lines +277 to +281
return PackagePublishResult {
result: Ok(PackageReceipt {
name: p.name.clone(),
version: p.version.clone(),
attempts: 0,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record PackagePublished before early return on reconciled publish

In parallel mode, the resume-reconciliation Published branch updates state and returns a successful receipt without emitting EventType::PackagePublished. Other publish-success paths do emit that event, and the end-of-run consistency check derives published truth from PackagePublished; this branch can therefore create in_state_only drift whenever reconciliation resolves to published. Emit and flush PackagePublished before returning, mirroring the in-loop reconciliation success path.

Useful? React with 👍 / 👎.

@codecov

codecov Bot commented Apr 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 4.59184% with 187 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/shipper/src/engine/parallel/publish.rs 7.61% 97 Missing ⚠️
crates/shipper/src/engine/mod.rs 1.09% 90 Missing ⚠️

📢 Thoughts on this report? Let us know!

EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
The `sequential_reconcile` helper's docstring linked to
`crate::engine::parallel::reconcile::reconcile_ambiguous_upload` via
rustdoc intra-doc syntax, but the target is `pub(super)` inside
`engine::parallel::reconcile` — not resolvable from `engine::mod`.
`-Dwarnings` on the Documentation CI lane promoted this to an error.

Link was introduced in #115 (resume-path reconcile) but only tripped
CI now because the docs job had a later-run schedule. Replaced the
markdown link with plain text — the mirror-relationship is still
explained, just without a clickable doc link across a pub(super)
boundary.

Note: the Coverage (llvm-cov) check also failed on this run, with
`shipper-registry::context::tests::verify_ownership_propagates_
network_error` — a pre-existing environment-sensitive flake that
reproduces unrelated to this PR. Not addressing it here.
EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
…ar (#92) (#116)

* feat(preflight): slim workspace-verify event, strip ANSI, write sidecar (#92)

Addresses the "~2KB ANSI blob in events.jsonl" finding from the
v0.3.0-rc.1 retrospective. The `PreflightWorkspaceVerify` event no
longer dumps raw cargo dry-run stderr with embedded ANSI escapes into
the structured event stream.

## What changed

1. **New `strip_ansi` function** in `shipper-output-sanitizer` — a
   dependency-free byte-level scanner that removes CSI / OSC / two-byte
   escapes while preserving text and newlines. UTF-8 safe. Covered by
   6 unit tests (sgr, multi-code, no-op, cargo-style, UTF-8, OSC).

2. **Preflight event payload** now contains a slim summary: exit_code,
   sidecar path, and the ANSI-stripped tail-6 of stderr. Typical size
   drops from ~2KB to a few hundred bytes.

3. **New sidecar file** `<state_dir>/preflight_workspace_verify.txt`
   carries the full ANSI-stripped output. The Reporter surfaces the
   sidecar path with a byte count so operators know where to look.

## Schema stability

No change to the `EventType::PreflightWorkspaceVerify` variant shape —
`{ passed: bool, output: String }` is preserved. The `output` field
now carries a compact summary instead of a multi-KB blob. Schema-level
changes (structured `exit_code`/`elapsed_ms`/`output_path` fields) are
a bigger cross-cutting refactor and remain as a potential follow-up;
this PR captures the UX win with minimal disruption.

## Test plan

- [x] `cargo clippy --all-targets -- -D warnings` clean across
      shipper / shipper-output-sanitizer / shipper-types
- [x] `cargo fmt --check` clean
- [x] 6 new `strip_ansi` unit tests pass
- [x] 192 engine tests pass (no regressions)
- [x] No existing snapshot tests broke (event shape unchanged)

## Related

- Parent: #92 (Narrate — slim preflight evidence)
- Pillar umbrella: #103 (Narrate — now more complete)
- Master roadmap: #109

* fix(preflight): drop sidecar reporter.info to avoid snapshot churn

The initial #92 commit emitted `[info] full dry-run output written to
<path> (N bytes)` after writing the sidecar, which broke
e2e_expanded::preflight_allow_dirty_snapshot (and would break any
portability-sensitive snapshots since the byte count varies by line
endings / cargo output stability).

The sidecar path is deterministic (<state_dir>/preflight_workspace_
verify.txt) and documented; echoing it to stderr on the happy path
adds snapshot churn without giving operators new information they
can't discover on their own.

Keeps the warning on sidecar write failure (operator needs to know
when it DIDN'T land), drops the success-path info line.

* fix(docs): drop broken intra-doc link on sequential_reconcile (#92)

The `sequential_reconcile` helper's docstring linked to
`crate::engine::parallel::reconcile::reconcile_ambiguous_upload` via
rustdoc intra-doc syntax, but the target is `pub(super)` inside
`engine::parallel::reconcile` — not resolvable from `engine::mod`.
`-Dwarnings` on the Documentation CI lane promoted this to an error.

Link was introduced in #115 (resume-path reconcile) but only tripped
CI now because the docs job had a later-run schedule. Replaced the
markdown link with plain text — the mirror-relationship is still
explained, just without a clickable doc link across a pub(super)
boundary.

Note: the Coverage (llvm-cov) check also failed on this run, with
`shipper-registry::context::tests::verify_ownership_propagates_
network_error` — a pre-existing environment-sensitive flake that
reproduces unrelated to this PR. Not addressing it here.
EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
…low-on)

Closes one of the remaining #99 follow-ons identified in the
retrospectives: make it explicit, in the rustdoc that downstream tooling
and contributors will actually read, that cargo's stdout/stderr output is
a classification HINT for Shipper's retry loop — not the authoritative
answer about what happened.

## What this changes

- `ErrorClass` rustdoc (shipper-types): adds a "Classification is a hint,
  not truth" section explaining that pattern-matching cargo text drives
  retry scheduling but ambiguous outcomes MUST be reconciled against
  registry truth. Cross-refs ReconciliationOutcome and the reconcile
  module.
- `classify_cargo_failure` rustdoc (shipper::runtime::execution): states
  up-front "This is a hint, not authoritative truth" and points at the
  reconcile flow for the authoritative resolution on Ambiguous outcomes.
- `docs/explanation/why-shipper.md` gains a new section "Cargo stdout is
  a hint; the registry is the truth" making the same point at the product
  level — and connecting it to why Shipper can be safer than a naive
  shell-script `cargo publish` loop.

## Why this matters

Contributors reading `ErrorClass` and wondering "can I just pattern-match
more cargo strings?" now see the contract: cargo text is the fast-path
hint, registry queries are truth. The reconcile flow introduced in #111
(in-flight) and #115 (resume-path) is now documented as the authoritative
side of that contract, not just implemented.

## Scope

Pure docs/rustdoc change. No code changes. No schema changes. No
snapshot churn. Covered by existing tests.

- `cargo doc --no-deps -Dwarnings` clean
- `cargo test -p shipper-types` + `-p shipper --lib runtime::execution`
  pass (no regressions)

## Related

- Parent: #99 Reconcile
- Prior PRs: #111 (in-flight reconcile), #115 (resume-path reconcile)
- Pillar: Reconcile (#102)
EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
…low-on) (#117)

Closes one of the remaining #99 follow-ons identified in the
retrospectives: make it explicit, in the rustdoc that downstream tooling
and contributors will actually read, that cargo's stdout/stderr output is
a classification HINT for Shipper's retry loop — not the authoritative
answer about what happened.

## What this changes

- `ErrorClass` rustdoc (shipper-types): adds a "Classification is a hint,
  not truth" section explaining that pattern-matching cargo text drives
  retry scheduling but ambiguous outcomes MUST be reconciled against
  registry truth. Cross-refs ReconciliationOutcome and the reconcile
  module.
- `classify_cargo_failure` rustdoc (shipper::runtime::execution): states
  up-front "This is a hint, not authoritative truth" and points at the
  reconcile flow for the authoritative resolution on Ambiguous outcomes.
- `docs/explanation/why-shipper.md` gains a new section "Cargo stdout is
  a hint; the registry is the truth" making the same point at the product
  level — and connecting it to why Shipper can be safer than a naive
  shell-script `cargo publish` loop.

## Why this matters

Contributors reading `ErrorClass` and wondering "can I just pattern-match
more cargo strings?" now see the contract: cargo text is the fast-path
hint, registry queries are truth. The reconcile flow introduced in #111
(in-flight) and #115 (resume-path) is now documented as the authoritative
side of that contract, not just implemented.

## Scope

Pure docs/rustdoc change. No code changes. No schema changes. No
snapshot churn. Covered by existing tests.

- `cargo doc --no-deps -Dwarnings` clean
- `cargo test -p shipper-types` + `-p shipper --lib runtime::execution`
  pass (no regressions)

## Related

- Parent: #99 Reconcile
- Prior PRs: #111 (in-flight reconcile), #115 (resume-path reconcile)
- Pillar: Reconcile (#102)
EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
…omes (#99 follow-on)

Closes the last remaining #99 follow-on: end-to-end integration coverage
for the three reconciliation outcomes the state machine can produce.

## Three scenarios added to engine::parallel::tests

Each exercises publish_package() against a real tiny_http mock registry
with a fake cargo binary configured to exit ambiguously (exit 1, empty
stdout/stderr — which classify_publish_failure maps to
CargoFailureClass::Ambiguous per existing unit tests).

Readiness polling is disabled (readiness.enabled = false) so each
reconcile site reduces to a single registry query — avoids flaky request
counts that a full polling loop would produce.

### 1. reconcile_bdd_ambiguous_resolves_to_published

- cargo fails ambiguously
- entry + post-cargo quick checks return 404
- reconcile's single version_exists poll returns 200
- expected: PackageState::Published, attempts=1 (no second cargo)
- verifies PublishReconciling + PublishReconciled{Published} events
  emitted to the on-disk event log

### 2. reconcile_bdd_ambiguous_resolves_to_not_published_then_retries

- cargo fails ambiguously on every attempt
- registry consistently returns 404
- reconcile resolves to NotPublished, falls through to normal retry
- after max_attempts=2, package ends Failed
- verifies PublishReconciled{NotPublished} event emitted
- verifies cargo was actually retried (attempts=2)

### 3. reconcile_bdd_resume_from_ambiguous_state_skips_republish

- state pre-seeded with PackageState::Ambiguous (simulating a prior
  interrupted run)
- entry check returns 404 (bypasses "already published" short-circuit)
- resume-path reconcile (from PR #115) fires BEFORE retry loop
- single version_exists query returns 200 → Published
- expected: attempts=0 (cargo NOT invoked)
- verifies via SHIPPER_CARGO_ARGS_LOG that cargo binary was never called
- verifies PublishReconciling + PublishReconciled{Published} events emitted

## Verification

- cargo test -p shipper --lib -- reconcile_bdd: 3 pass
- cargo test -p shipper --lib -- engine::parallel: 70 pass (no regressions)
- cargo clippy --all-targets -- -D warnings: clean
- cargo fmt --check: clean

## Why events are read from disk

publish_package flushes event batches to events.jsonl and clears the
in-memory log after each write. Assertions read via
EventLog::read_from_file(&events_path) to see the full persisted stream.

## #99 tracker after this merge

- [x] In-flight reconcile state machine (#111)
- [x] Resume-path reconciliation (#115)
- [x] Docs/contracts: cargo text is a hint, registry is truth (#117)
- [x] BDD integration scenarios for all 3 outcomes (this PR)

#99 is now fully closed.
EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
…omes (#99 follow-on) (#119)

Closes the last remaining #99 follow-on: end-to-end integration coverage
for the three reconciliation outcomes the state machine can produce.

## Three scenarios added to engine::parallel::tests

Each exercises publish_package() against a real tiny_http mock registry
with a fake cargo binary configured to exit ambiguously (exit 1, empty
stdout/stderr — which classify_publish_failure maps to
CargoFailureClass::Ambiguous per existing unit tests).

Readiness polling is disabled (readiness.enabled = false) so each
reconcile site reduces to a single registry query — avoids flaky request
counts that a full polling loop would produce.

### 1. reconcile_bdd_ambiguous_resolves_to_published

- cargo fails ambiguously
- entry + post-cargo quick checks return 404
- reconcile's single version_exists poll returns 200
- expected: PackageState::Published, attempts=1 (no second cargo)
- verifies PublishReconciling + PublishReconciled{Published} events
  emitted to the on-disk event log

### 2. reconcile_bdd_ambiguous_resolves_to_not_published_then_retries

- cargo fails ambiguously on every attempt
- registry consistently returns 404
- reconcile resolves to NotPublished, falls through to normal retry
- after max_attempts=2, package ends Failed
- verifies PublishReconciled{NotPublished} event emitted
- verifies cargo was actually retried (attempts=2)

### 3. reconcile_bdd_resume_from_ambiguous_state_skips_republish

- state pre-seeded with PackageState::Ambiguous (simulating a prior
  interrupted run)
- entry check returns 404 (bypasses "already published" short-circuit)
- resume-path reconcile (from PR #115) fires BEFORE retry loop
- single version_exists query returns 200 → Published
- expected: attempts=0 (cargo NOT invoked)
- verifies via SHIPPER_CARGO_ARGS_LOG that cargo binary was never called
- verifies PublishReconciling + PublishReconciled{Published} events emitted

## Verification

- cargo test -p shipper --lib -- reconcile_bdd: 3 pass
- cargo test -p shipper --lib -- engine::parallel: 70 pass (no regressions)
- cargo clippy --all-targets -- -D warnings: clean
- cargo fmt --check: clean

## Why events are read from disk

publish_package flushes event batches to events.jsonl and clears the
in-memory log after each write. Assertions read via
EventLog::read_from_file(&events_path) to see the full persisted stream.

## #99 tracker after this merge

- [x] In-flight reconcile state machine (#111)
- [x] Resume-path reconciliation (#115)
- [x] Docs/contracts: cargo text is a hint, registry is truth (#117)
- [x] BDD integration scenarios for all 3 outcomes (this PR)

#99 is now fully closed.
EffortlessSteven added a commit that referenced this pull request Apr 18, 2026
… follow-on)

Completes the third acceptance-criteria BDD scenario for #99. The existing
suite already covered Published (registry confirms upload landed) and
NotPublished (registry confirms upload didn't land); this adds the
operator-actionable StillUnknown case where reconciliation itself can't
reach a verdict because the registry is unhealthy.

Setup drives every registry query on /api/v1/crates/demo/0.1.0 to 500,
so version_exists bails with Err for non-200/404 statuses.
reconcile_ambiguous_upload translates that Err into
ReconciliationOutcome::StillUnknown.

Asserts:
  - publish returns Err with "reconciliation inconclusive" message
  - package state is PackageState::Ambiguous (so resume's reconcile
    block will fire rather than a silent retry)
  - exactly one cargo invocation — StillUnknown must not blind-retry
  - PublishReconciling + PublishReconciled{ StillUnknown } events
    are persisted to events.jsonl for operator visibility

Closes the last open AC on #99 acceptance list; reconciliation behavior
itself was already implemented in PR #111 and PR #115.
EffortlessSteven added a commit that referenced this pull request Apr 18, 2026
… follow-on) (#144)

Completes the third acceptance-criteria BDD scenario for #99. The existing
suite already covered Published (registry confirms upload landed) and
NotPublished (registry confirms upload didn't land); this adds the
operator-actionable StillUnknown case where reconciliation itself can't
reach a verdict because the registry is unhealthy.

Setup drives every registry query on /api/v1/crates/demo/0.1.0 to 500,
so version_exists bails with Err for non-200/404 statuses.
reconcile_ambiguous_upload translates that Err into
ReconciliationOutcome::StillUnknown.

Asserts:
  - publish returns Err with "reconciliation inconclusive" message
  - package state is PackageState::Ambiguous (so resume's reconcile
    block will fire rather than a silent retry)
  - exactly one cargo invocation — StillUnknown must not blind-retry
  - PublishReconciling + PublishReconciled{ StillUnknown } events
    are persisted to events.jsonl for operator visibility

Closes the last open AC on #99 acceptance list; reconciliation behavior
itself was already implemented in PR #111 and PR #115.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant