feat(recover): synthetic rehearsal test + operator rehearsal playbook (#90) by EffortlessSteven · Pull Request #124 · EffortlessMetrics/shipper

EffortlessSteven · 2026-04-17T13:45:11Z

Summary

Close #90's testing gap in two layers:

Synthetic CI-guard test — crates/shipper-cli/tests/e2e_rehearse.rs.
Runs on every commit. Proves the write-side invariants (state.json,
events.jsonl NDJSON integrity, no duplicate publishes on resume) that
existing bdd_resume tests don't cover because they construct state
by hand rather than running a real publish.
Operator playbook — docs/how-to/run-recover-rehearsal.md.
Step-by-step for the once-per-RC crates.io-backed rehearsal the
synthetic can't substitute for (rate limits, sparse-index timing,
runner kill + artifact upload).

Why both

The synthetic test catches persistence bugs cheaply and repeatedly.
The real rehearsal catches the things mocks literally can't simulate —
crates.io's per-minute new-crate rate limit, sparse-index propagation
delay, the GitHub Actions `if: always()` artifact upload path after a
cancelled job, token/auth edge cases.

Running only one is insufficient:

Only synthetic → bugs that only manifest against real crates.io slip
through (see: ci: reduce PR proptest cost, add Ubuntu-only full-strength crypto lane #87, docs: add operator release runbook #88, ci: include .shipper/ hidden dir in release workflow artifact uploads #89 — all found by CI reality, not mocks).
Only real → bugs regress between RC rehearsals and waste a tag cycle.

The synthetic test

rehearsal_interrupted_publish_then_resume_preserves_invariants:

3-crate fixture with topological deps (c → b → a)
Mock registry with per-crate 404→200 flip + `never_flip` pin so
crate-c stays unpublished during run 1 even if cargo "exits 0"
(otherwise the reconcile path would upgrade the simulated failure
to Published and the scenario becomes meaningless)
Run 1: a/b succeed, c fails → realistic interrupted state
Run 2: resume with c succeeding
Asserts: state.json + events.jsonl coherent; no duplicate
PackagePublished; events.jsonl append-only (ExecutionStarted count = 2
confirms run 1's events survived resume)

The playbook

docs/how-to/run-recover-rehearsal.md — 9 steps + pass/fail rubric +
when-to-re-run. Linked from docs/README.md how-to index.

Findings noted during test development

Two potential improvements surfaced while writing the test, kept out of
scope for this PR:

Resume's "already published, skipping" path does NOT emit a
PackageSkipped event for packages whose state is already Published
(only emits for packages where `version_exists` short-circuits).
The audit trail is silent on the "state said done, we trusted it"
decision. Candidate enhancement for an Document and enforce the events-as-truth / state-as-projection invariant #93 follow-on.
In one path tested, the resume logged `Skipped` in stdout but
state.json kept showing `failed` for the affected package. Top-level
`updated_at` bumped (state file was re-written) but the package's
own `last_updated_at` didn't. Could indicate update_state not being
called on the Skipped path for a resumed Failed package — worth a
closer look.

Both warrant follow-up issues, not blockers here.

Test plan

`cargo test -p shipper-cli --test e2e_rehearse` passes on Windows.
`cargo clippy -p shipper-cli --all-targets --all-features -- -D warnings` clean.
`cargo fmt --all --check` clean.
CI multi-OS green (Linux/macOS fake-cargo script is the `sh`
variant; expect parity but verify).
Real rehearsal against a throwaway `v0.3.0-test-resume-YYYYMMDD`
tag (operator action — out of PR scope; see playbook).

…#90) Close #90's testing gap in two layers: a synthetic CI-guard test and an operator playbook for the real crates.io-backed rehearsal. ## Synthetic test (crates/shipper-cli/tests/e2e_rehearse.rs) The existing `bdd_resume` scenarios prove the resume *read* side: given a hand-constructed state file, assert resume does X. They do NOT prove the *write* side — that a real `shipper publish` run's persisted state is actually coherent when the run is interrupted. The new `rehearsal_interrupted_publish_then_resume_preserves_invariants` closes that gap end-to-end: 1. 3-crate workspace (a, b, c with c→b→a) — real topological ordering. 2. Mock registry with per-crate 404→200 flip + a `never_flip` pin so crate-c stays 404 during run 1 even after cargo "exits 0", preventing the reconcile path from upgrading a test-simulated failure to Published. 3. Run 1: fake cargo succeeds for a/b, fails for c → realistic interrupted end state. 4. Assert on persisted evidence: - state.json parses, a/b=published, c not published - events.jsonl is valid NDJSON (append-only integrity) - PackagePublished event count == succeeded crate count (no duplicates) 5. Run 2: shipper resume with c now succeeding. 6. Assert post-resume: - a/b stay Published (in-state idempotency) - c reaches Published or Skipped (both legitimate — the version_exists short-circuit is a valid end state) - PackagePublished total across runs is 2..=3 (never >3 — duplicates would mean resume re-published already-done crates) - ExecutionStarted event count == 2 (events.jsonl is append-only — truncation would drop events from run 1) This runs on every CI commit as a cheap regression guard. ## Real-rehearsal playbook (docs/how-to/run-recover-rehearsal.md) Step-by-step procedure for the actual crates.io-backed rehearsal that the synthetic test can't simulate: - throwaway `v0.3.0-test-resume-YYYYMMDD` tag - trigger release workflow; cancel after 2-3 crates published - collect shipper-state-final artifact; verify NDJSON + event/state consistency - dispatch resume; verify plan-ID validation, skip-already-published, no duplicate cargo publishes - spot-check crates.io visibility - yank the rehearsal versions (containment, not deletion) - pass/fail rubric + "when to re-run" policy The playbook is linked from docs/README.md's How-to index. ## Scope Synthetic test proves state/events invariants. Real rehearsal proves crates.io rate-limit interaction, sparse-index propagation, runner killed-mid-upload artifact preservation — things the mock can't reach. Both together = #90's "resume is demonstrated, not promised."

coderabbitai · 2026-04-17T13:45:20Z

Warning

Rate limit exceeded

@EffortlessSteven has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 19 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 19 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85c79bd5-17a5-4def-8a55-8997c0c3eb0a

📥 Commits

Reviewing files that changed from the base of the PR and between cc0dc68 and d3766fa.

📒 Files selected for processing (3)

crates/shipper-cli/tests/e2e_rehearse.rs
docs/README.md
docs/how-to/run-recover-rehearsal.md

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/90-recover-rehearsal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new end-to-end 'Recover rehearsal' test and documentation to verify that the publishing process can be safely resumed after an interruption. The changes include a synthetic test using a mock registry and a detailed manual procedure for testing against crates.io. Feedback was provided regarding a docstring that incorrectly mentions unimplemented event types, an inconsistent crate count in the documentation, and incorrect file paths in the maintenance section of the new guide. Additionally, it was suggested to make the crate list in the manual procedure dynamic to prevent the documentation from becoming brittle.

gemini-code-assist · 2026-04-17T13:46:27Z

+//!    - c reaches Published
+//!    - PackagePublished event count is exactly N_crates (one per crate,
+//!      across both runs combined)
+//!    - PackageSkipped events emitted for a/b during resume


The docstring mentions that PackageSkipped events are emitted for a/b during resume. However, the pull request description explicitly states that the resume path currently does not emit PackageSkipped events for packages already in the Published state, and that this is a candidate for a follow-on enhancement (#93). Additionally, the test code does not currently assert the presence of these events. This comment should be updated to reflect the current implementation to avoid misleading future maintainers.

gemini-code-assist · 2026-04-17T13:46:27Z

+- `CARGO_REGISTRY_TOKEN` with publish scope for all 12 crates already
+  configured (or Trusted Publishing registration complete — see
+  [run-in-github-actions.md](run-in-github-actions.md)).


There is an inconsistency between the stated number of crates and the list provided later in the document. Line 18 (and lines 123, 179) mentions "12 crates", but the loop in Step 8 (lines 139-142) and Step 9 (lines 153-156) lists 13 distinct crate names. Please verify the correct count and update the documentation accordingly.

gemini-code-assist · 2026-04-17T13:46:27Z

+for c in shipper shipper-cli shipper-core shipper-config shipper-types \
+         shipper-duration shipper-retry shipper-encrypt shipper-webhook \
+         shipper-registry shipper-sparse-index shipper-cargo-failure \
+         shipper-output-sanitizer; do


The list of crates in this playbook is hardcoded, which makes the documentation brittle as the workspace evolves. Consider suggesting a dynamic way to retrieve the list of workspace members (e.g., using cargo metadata --format-version 1 | jq -r '.workspace_members[] | split(" ")[0]') or adding a note for the operator to verify the current list of crates in the workspace.

gemini-code-assist · 2026-04-17T13:46:27Z

+- After any change to `crates/shipper-core/src/engine/`,
+  `crates/shipper-core/src/state/`, or `crates/shipper-core/src/runtime/`
+  that touches persistence, resumption, or reconciliation.


The paths referenced here (crates/shipper-core/src/...) appear to be incorrect based on the repository structure described in GEMINI.md and the provided file list, which indicate that the core logic resides in crates/shipper/src/. Please verify if shipper-core is the intended path or if it should be updated to crates/shipper/src/.

Suggested change

- After any change to `crates/shipper-core/src/engine/`,

`crates/shipper-core/src/state/`, or `crates/shipper-core/src/runtime/`

that touches persistence, resumption, or reconciliation.

- After any change to `crates/shipper/src/engine/`,

`crates/shipper/src/state/`, or `crates/shipper/src/runtime/`

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3766fae6d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T13:47:53Z

+    let published_total =
+        count_events_matching(&events_all, |e| event_type_matches(e, "PackagePublished"));
+    assert!(
+        (2..=3).contains(&published_total),


Enforce idempotency per crate, not by aggregate count

This assertion can miss a duplicate publish regression on resume: if crate-a (or crate-b) is incorrectly re-published while crate-c is skipped, the total PackagePublished count is still 3, which this check accepts. That means the test passes even though resume violated idempotency for an already-published crate. To actually guard the invariant described in the comments, the test should validate PackagePublished occurrences by package (e.g., crate-a@0.1.0 and crate-b@0.1.0 appear exactly once across both runs).

Useful? React with 👍 / 👎.

codecov · 2026-04-17T14:16:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

plus regression test for Failed->Skipped transition (#126) ## #125 — silent resume-skip When resume encounters a package whose state is already \`Published\` or \`Skipped\`, the sequential path short-circuited with a stdout info line and \`continue\`, without recording an event. Since \`events.jsonl\` is the **authoritative audit log** (state.json is a projection of it), this left a user-visible gap: an auditor reading only events.jsonl couldn't tell "resume looked at this package and trusted its terminal state" from "resume never touched this package." Surfaced by the synthetic rehearsal test (\`e2e_rehearse.rs\`, PR #124). **Fix:** emit a \`PackageSkipped { reason: \"resume: state already <s>\" }\` event in the already-complete match arm. No receipt entry is pushed — the receipt vector has always excluded already-terminal packages on the resume path, and callers depend on that shape (e.g. \`run_resume_runs_publish_when_state_exists\`). Only the audit-trail behavior changes. ## #126 — Failed->Skipped state drift The issue speculated that resume's stdout-said-Skipped but state.json-still-says-Failed could indicate \`update_state\` not firing on the version-exists short-circuit when the prior state was Failed. **Finding:** the current sequential code handles this correctly — \`reg.version_exists(p) \=\\= true\` triggers \`update_state(Skipped)\` which flushes to state.json, and a targeted test (\`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\`) passes without any code change. The observation in the issue may have been a stale-state-file artifact in the synthetic rehearsal fixture. Landing the test anyway as a regression guard: if a future refactor breaks this semantics (e.g. parallel mode taking a different path, or state write getting clobbered), the guard fires immediately. ## Tests Two new \`#[serial]\` tests in \`engine::tests\`: - \`resume_emits_package_skipped_event_for_already_published_state\` — seeds state with Published, runs \`run_publish\`, asserts \`PackageSkipped\` appears in events.jsonl. - \`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\` — seeds state with Failed{Ambiguous}, registry returns 200, asserts reloaded state.json shows Skipped. All 49 existing resume/publish tests remain green.

… (#130) plus regression test for Failed->Skipped transition (#126) ## #125 — silent resume-skip When resume encounters a package whose state is already \`Published\` or \`Skipped\`, the sequential path short-circuited with a stdout info line and \`continue\`, without recording an event. Since \`events.jsonl\` is the **authoritative audit log** (state.json is a projection of it), this left a user-visible gap: an auditor reading only events.jsonl couldn't tell "resume looked at this package and trusted its terminal state" from "resume never touched this package." Surfaced by the synthetic rehearsal test (\`e2e_rehearse.rs\`, PR #124). **Fix:** emit a \`PackageSkipped { reason: \"resume: state already <s>\" }\` event in the already-complete match arm. No receipt entry is pushed — the receipt vector has always excluded already-terminal packages on the resume path, and callers depend on that shape (e.g. \`run_resume_runs_publish_when_state_exists\`). Only the audit-trail behavior changes. ## #126 — Failed->Skipped state drift The issue speculated that resume's stdout-said-Skipped but state.json-still-says-Failed could indicate \`update_state\` not firing on the version-exists short-circuit when the prior state was Failed. **Finding:** the current sequential code handles this correctly — \`reg.version_exists(p) \=\\= true\` triggers \`update_state(Skipped)\` which flushes to state.json, and a targeted test (\`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\`) passes without any code change. The observation in the issue may have been a stale-state-file artifact in the synthetic rehearsal fixture. Landing the test anyway as a regression guard: if a future refactor breaks this semantics (e.g. parallel mode taking a different path, or state write getting clobbered), the guard fires immediately. ## Tests Two new \`#[serial]\` tests in \`engine::tests\`: - \`resume_emits_package_skipped_event_for_already_published_state\` — seeds state with Published, runs \`run_publish\`, asserts \`PackageSkipped\` appears in events.jsonl. - \`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\` — seeds state with Failed{Ambiguous}, registry returns 200, asserts reloaded state.json shows Skipped. All 49 existing resume/publish tests remain green.

Three-column reconciliation of what's actually merged vs each issue's acceptance checklist. Single source of truth for what closes each pillar issue. **Findings:** - **#90 Recover** — honestly closable. Code side is done (#124 + #130); operator-side real rehearsal is an ops action, not a code gap. - **#97 Prove tier 2** — 85% done. Rehearsal + visibility + hard gate + plan_id binding all landed (#127 + #133). Missing: install/smoke check (cargo install against the rehearsal registry / consumer build). One narrow follow-up PR closes it. - **#98 Remediate** — 60% done. Receipt schema + plan-yank (from-receipt) + yank primitive + fix-forward planning all landed (#121 + #132 + #134). Missing: plan-yank's --starting-crate graph mode, plan execution for yank + fix-forward. Two narrow follow-ups. Also captures the two review concerns on #122 (Trusted Publishing) that were addressed in a follow-up commit to that PR. Recommended next merge order and follow-up PRs spelled out at bottom.

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

This was referenced Apr 17, 2026

Resume: already-Published skip path is silent in events.jsonl (#93 follow-on) #125

Closed

Resume: Failed→Skipped transition may leave stale 'failed' in state.json #126

Closed

EffortlessSteven mentioned this pull request Apr 17, 2026

fix(resume): silent-skip event + Failed→Skipped regression test (#125, #126) #130

Merged

4 tasks

EffortlessSteven merged commit cb6b955 into main Apr 18, 2026
20 checks passed

EffortlessSteven deleted the feat/90-recover-rehearsal branch April 18, 2026 00:03

EffortlessSteven mentioned this pull request Apr 18, 2026

docs: GAPS_CLOSEOUT audit — #90/#97/#98 against main #136

Merged

This was referenced Apr 19, 2026

Competency: Survive — gap audit and roadmap #101

Open

Roadmap: post-v0.3.0-rc.1 product scorecard across nine competencies #109

Open

EffortlessSteven mentioned this pull request Jun 24, 2026

deps(deps): bump reqwest from 0.13.3 to 0.13.4 #402

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(recover): synthetic rehearsal test + operator rehearsal playbook (#90)#124

feat(recover): synthetic rehearsal test + operator rehearsal playbook (#90)#124
EffortlessSteven merged 1 commit into
mainfrom
feat/90-recover-rehearsal

EffortlessSteven commented Apr 17, 2026

Uh oh!

coderabbitai Bot commented Apr 17, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Uh oh!

codecov Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

EffortlessSteven commented Apr 17, 2026

Summary

Why both

The synthetic test

The playbook

Findings noted during test development

Test plan

Uh oh!

coderabbitai Bot commented Apr 17, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 17, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant