feat(consistency): enforce events-as-truth invariant at end-of-run (#93)#112
Conversation
Turns the prose invariant in docs/INVARIANTS.md into code: events.jsonl
is authoritative, state.json is a projection, and the two must agree on
which packages were published. Any drift is now detected at end of
`shipper publish` (both parallel and sequential paths), emitted as a
structured event, and surfaced to the operator via the Reporter.
## Types (shipper-types)
- `StateEventDrift { in_events_only, in_state_only }` + `is_consistent()`.
Labels in `name@version` format consistent with the rest of the event
stream. Both lists empty iff the projection matches the truth.
- `EventType::StateEventDriftDetected { drift: StateEventDrift }` — a new
event variant that surfaces drift in events.jsonl for auditors and
tooling that consume the log.
## Implementation (shipper::state::consistency)
- `verify_events_state_consistency(events_path, state) -> Result<StateEventDrift>`:
reads events.jsonl, builds the `BTreeSet` of packages with
`PackagePublished` events, compares against the `BTreeSet` of packages
with `PackageState::Published` in the in-memory `ExecutionState`, and
returns the delta on each side.
- `format_drift_summary(drift) -> String`: human-readable one-or-three-line
rendering used by the Reporter.
## Wire-in (engine::run_publish)
Both the parallel and sequential end-of-run paths in `engine/mod.rs` now
run the consistency check immediately before emitting `ExecutionFinished`:
- Drift detected → warn via Reporter + record `StateEventDriftDetected`
event. The run is NOT failed — the publishes already happened. The goal
is legibility, not punishment.
- Check itself errored (I/O) → warn via Reporter, continue (don't mask
real success by failing on a diagnostic).
- Consistent → silent.
## Tests
7 unit tests in `state::consistency::tests` covering:
- consistent when state and events agree
- detects `in_events_only` (resume would duplicate)
- detects `in_state_only` (event log bypassed)
- empty state + empty events
- ignores non-Published states (Failed / Skipped / Pending)
- drift summary formatting (consistent + mixed-side)
125 engine tests continue to pass (no regressions). 206 targeted tests
across events/parallel/consistency all green.
## Scope
This PR closes #93's PR 1 (consistency check). Follow-up #93 PR 2 is
state-rebuild-from-events for corruption recovery, which is a separate
capability and a follow-up branch.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 8 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef899acaa2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let events_published: BTreeSet<String> = log | ||
| .all_events() | ||
| .iter() | ||
| .filter(|e| matches!(e.event_type, EventType::PackagePublished { .. })) |
There was a problem hiding this comment.
Count all successful publish outcomes in drift check
This check only treats EventType::PackagePublished as proof of publication, but there are normal success paths that set PackageState::Published without emitting that event (for example, when a failed cargo exit is reconciled by a positive version_exists check in run_publish/parallel publish). In those runs, in_state_only will be reported even though no corruption occurred, producing noisy state_event_drift_detected warnings and reducing the signal of real drift.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
… follow-on) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109
… follow-on) (#115) Closes the remaining safety gap from #111: when a prior run left a package in PackageState::Ambiguous (reconciliation was inconclusive at that time), a subsequent `shipper resume` previously fell through to blind cargo retry. That risked re-uploading a crate whose original upload may have actually succeeded. ## What this adds Both the sequential (engine::mod) and parallel (engine::parallel::publish) publish paths now detect `PackageState::Ambiguous` BEFORE any cargo activity and reconcile against registry truth first. Outcomes follow the same state machine as in-flight reconciliation: - **Published** → mark Published, skip any further cargo work for this crate, continue/return success receipt with no republish. - **NotPublished** → clear Ambiguous back to Pending (registry confirms no prior upload), fall through to the normal publish flow. - **StillUnknown** → emit PublishFailed webhook + halt with an operator-actionable error; do NOT retry. ## Parallel + sequential symmetry - Sequential: new `sequential_reconcile` helper in engine::mod wraps the full `RegistryClient::is_version_visible_with_backoff` and returns the same `(ReconciliationOutcome, Vec<ReadinessEvidence>)` shape as the parallel path's helper. Called from the per-package state-check match arm. - Parallel: reuses existing `reconcile_ambiguous_upload` from engine::parallel::reconcile. Called just before the retry-loop entry, returning a completed PackageResult early on Published or StillUnknown. Both paths emit `PublishReconciling` and `PublishReconciled` events so the event stream records the resume-time reconciliation alongside any in-flight reconciliation from earlier runs. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean - [x] `cargo fmt --check` clean - [x] 192 engine tests pass (no regressions) BDD scenarios for resume-path reconciliation across all three outcomes are a natural follow-up; the handler shape mirrors the in-flight path so regressions here surface through existing integration tests that already cover the outcome shape. ## Related - Parent: #99 Reconcile (the biggest non-cosmetic safety gap from the v0.3.0-rc.1 retrospective) - Sibling PRs already merged: #111 (in-flight reconcile), #112 (events- as-truth), #113 (retry narration), #114 (registry-aware backoff) - Master roadmap: #109
Closes #93 PR 1 — turns the prose invariant in `docs/INVARIANTS.md` into code.
Problem
The docs say:
But nothing in code enforced that the three agreed on which packages were published. A stale/corrupted `state.json` could silently mislead `shipper resume` — the kind of drift that causes duplicate uploads or refuses valid work.
What this adds
Types (`shipper-types`)
Implementation (`shipper::state::consistency`)
Wire-in (`engine::run_publish`)
Both the parallel and sequential end-of-run paths run the check immediately before emitting `ExecutionFinished`:
The run is not failed on drift — the publishes already happened. The goal is legibility and audit, not punishment.
Tests
7 unit tests in `state::consistency::tests`:
No regressions: 125 engine tests pass, 206 targeted state/parallel/events tests pass, full `cargo clippy --all-targets -- -D warnings` clean, `cargo fmt --check` clean.
Out of scope
Files
Net: 4 files, +348 / 0.
Related