Skip to content

feat(recover): synthetic rehearsal test + operator rehearsal playbook (#90)#124

Merged
EffortlessSteven merged 1 commit into
mainfrom
feat/90-recover-rehearsal
Apr 18, 2026
Merged

feat(recover): synthetic rehearsal test + operator rehearsal playbook (#90)#124
EffortlessSteven merged 1 commit into
mainfrom
feat/90-recover-rehearsal

Conversation

@EffortlessSteven

Copy link
Copy Markdown
Member

Summary

Close #90's testing gap in two layers:

  1. Synthetic CI-guard testcrates/shipper-cli/tests/e2e_rehearse.rs.
    Runs on every commit. Proves the write-side invariants (state.json,
    events.jsonl NDJSON integrity, no duplicate publishes on resume) that
    existing bdd_resume tests don't cover because they construct state
    by hand rather than running a real publish.
  2. Operator playbookdocs/how-to/run-recover-rehearsal.md.
    Step-by-step for the once-per-RC crates.io-backed rehearsal the
    synthetic can't substitute for (rate limits, sparse-index timing,
    runner kill + artifact upload).

Why both

The synthetic test catches persistence bugs cheaply and repeatedly.
The real rehearsal catches the things mocks literally can't simulate —
crates.io's per-minute new-crate rate limit, sparse-index propagation
delay, the GitHub Actions `if: always()` artifact upload path after a
cancelled job, token/auth edge cases.

Running only one is insufficient:

The synthetic test

rehearsal_interrupted_publish_then_resume_preserves_invariants:

  • 3-crate fixture with topological deps (c → b → a)
  • Mock registry with per-crate 404→200 flip + `never_flip` pin so
    crate-c stays unpublished during run 1 even if cargo "exits 0"
    (otherwise the reconcile path would upgrade the simulated failure
    to Published and the scenario becomes meaningless)
  • Run 1: a/b succeed, c fails → realistic interrupted state
  • Run 2: resume with c succeeding
  • Asserts: state.json + events.jsonl coherent; no duplicate
    PackagePublished; events.jsonl append-only (ExecutionStarted count = 2
    confirms run 1's events survived resume)

The playbook

docs/how-to/run-recover-rehearsal.md — 9 steps + pass/fail rubric +
when-to-re-run. Linked from docs/README.md how-to index.

Findings noted during test development

Two potential improvements surfaced while writing the test, kept out of
scope for this PR:

  1. Resume's "already published, skipping" path does NOT emit a
    PackageSkipped event for packages whose state is already Published
    (only emits for packages where `version_exists` short-circuits).
    The audit trail is silent on the "state said done, we trusted it"
    decision. Candidate enhancement for an Document and enforce the events-as-truth / state-as-projection invariant #93 follow-on.
  2. In one path tested, the resume logged `Skipped` in stdout but
    state.json kept showing `failed` for the affected package. Top-level
    `updated_at` bumped (state file was re-written) but the package's
    own `last_updated_at` didn't. Could indicate update_state not being
    called on the Skipped path for a resumed Failed package — worth a
    closer look.

Both warrant follow-up issues, not blockers here.

Test plan

  • `cargo test -p shipper-cli --test e2e_rehearse` passes on Windows.
  • `cargo clippy -p shipper-cli --all-targets --all-features -- -D warnings` clean.
  • `cargo fmt --all --check` clean.
  • CI multi-OS green (Linux/macOS fake-cargo script is the `sh`
    variant; expect parity but verify).
  • Real rehearsal against a throwaway `v0.3.0-test-resume-YYYYMMDD`
    tag (operator action — out of PR scope; see playbook).

…#90)

Close #90's testing gap in two layers: a synthetic CI-guard test and an
operator playbook for the real crates.io-backed rehearsal.

## Synthetic test (crates/shipper-cli/tests/e2e_rehearse.rs)

The existing `bdd_resume` scenarios prove the resume *read* side: given
a hand-constructed state file, assert resume does X. They do NOT prove
the *write* side — that a real `shipper publish` run's persisted state
is actually coherent when the run is interrupted.

The new `rehearsal_interrupted_publish_then_resume_preserves_invariants`
closes that gap end-to-end:

1. 3-crate workspace (a, b, c with c→b→a) — real topological ordering.
2. Mock registry with per-crate 404→200 flip + a `never_flip` pin so
   crate-c stays 404 during run 1 even after cargo "exits 0", preventing
   the reconcile path from upgrading a test-simulated failure to Published.
3. Run 1: fake cargo succeeds for a/b, fails for c → realistic interrupted
   end state.
4. Assert on persisted evidence:
   - state.json parses, a/b=published, c not published
   - events.jsonl is valid NDJSON (append-only integrity)
   - PackagePublished event count == succeeded crate count (no duplicates)
5. Run 2: shipper resume with c now succeeding.
6. Assert post-resume:
   - a/b stay Published (in-state idempotency)
   - c reaches Published or Skipped (both legitimate — the version_exists
     short-circuit is a valid end state)
   - PackagePublished total across runs is 2..=3 (never >3 — duplicates
     would mean resume re-published already-done crates)
   - ExecutionStarted event count == 2 (events.jsonl is append-only —
     truncation would drop events from run 1)

This runs on every CI commit as a cheap regression guard.

## Real-rehearsal playbook (docs/how-to/run-recover-rehearsal.md)

Step-by-step procedure for the actual crates.io-backed rehearsal that
the synthetic test can't simulate:

- throwaway `v0.3.0-test-resume-YYYYMMDD` tag
- trigger release workflow; cancel after 2-3 crates published
- collect shipper-state-final artifact; verify NDJSON + event/state
  consistency
- dispatch resume; verify plan-ID validation, skip-already-published,
  no duplicate cargo publishes
- spot-check crates.io visibility
- yank the rehearsal versions (containment, not deletion)
- pass/fail rubric + "when to re-run" policy

The playbook is linked from docs/README.md's How-to index.

## Scope

Synthetic test proves state/events invariants. Real rehearsal proves
crates.io rate-limit interaction, sparse-index propagation, runner
killed-mid-upload artifact preservation — things the mock can't reach.

Both together = #90's "resume is demonstrated, not promised."
@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@EffortlessSteven has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 19 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 19 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85c79bd5-17a5-4def-8a55-8997c0c3eb0a

📥 Commits

Reviewing files that changed from the base of the PR and between cc0dc68 and d3766fa.

📒 Files selected for processing (3)
  • crates/shipper-cli/tests/e2e_rehearse.rs
  • docs/README.md
  • docs/how-to/run-recover-rehearsal.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/90-recover-rehearsal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new end-to-end 'Recover rehearsal' test and documentation to verify that the publishing process can be safely resumed after an interruption. The changes include a synthetic test using a mock registry and a detailed manual procedure for testing against crates.io. Feedback was provided regarding a docstring that incorrectly mentions unimplemented event types, an inconsistent crate count in the documentation, and incorrect file paths in the maintenance section of the new guide. Additionally, it was suggested to make the crate list in the manual procedure dynamic to prevent the documentation from becoming brittle.

//! - c reaches Published
//! - PackagePublished event count is exactly N_crates (one per crate,
//! across both runs combined)
//! - PackageSkipped events emitted for a/b during resume

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring mentions that PackageSkipped events are emitted for a/b during resume. However, the pull request description explicitly states that the resume path currently does not emit PackageSkipped events for packages already in the Published state, and that this is a candidate for a follow-on enhancement (#93). Additionally, the test code does not currently assert the presence of these events. This comment should be updated to reflect the current implementation to avoid misleading future maintainers.

Comment on lines +18 to +20
- `CARGO_REGISTRY_TOKEN` with publish scope for all 12 crates already
configured (or Trusted Publishing registration complete — see
[run-in-github-actions.md](run-in-github-actions.md)).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency between the stated number of crates and the list provided later in the document. Line 18 (and lines 123, 179) mentions "12 crates", but the loop in Step 8 (lines 139-142) and Step 9 (lines 153-156) lists 13 distinct crate names. Please verify the correct count and update the documentation accordingly.

Comment on lines +139 to +142
for c in shipper shipper-cli shipper-core shipper-config shipper-types \
shipper-duration shipper-retry shipper-encrypt shipper-webhook \
shipper-registry shipper-sparse-index shipper-cargo-failure \
shipper-output-sanitizer; do

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of crates in this playbook is hardcoded, which makes the documentation brittle as the workspace evolves. Consider suggesting a dynamic way to retrieve the list of workspace members (e.g., using cargo metadata --format-version 1 | jq -r '.workspace_members[] | split(" ")[0]') or adding a note for the operator to verify the current list of crates in the workspace.

Comment on lines +190 to +192
- After any change to `crates/shipper-core/src/engine/`,
`crates/shipper-core/src/state/`, or `crates/shipper-core/src/runtime/`
that touches persistence, resumption, or reconciliation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The paths referenced here (crates/shipper-core/src/...) appear to be incorrect based on the repository structure described in GEMINI.md and the provided file list, which indicate that the core logic resides in crates/shipper/src/. Please verify if shipper-core is the intended path or if it should be updated to crates/shipper/src/.

Suggested change
- After any change to `crates/shipper-core/src/engine/`,
`crates/shipper-core/src/state/`, or `crates/shipper-core/src/runtime/`
that touches persistence, resumption, or reconciliation.
- After any change to `crates/shipper/src/engine/`,
`crates/shipper/src/state/`, or `crates/shipper/src/runtime/`

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3766fae6d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

let published_total =
count_events_matching(&events_all, |e| event_type_matches(e, "PackagePublished"));
assert!(
(2..=3).contains(&published_total),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce idempotency per crate, not by aggregate count

This assertion can miss a duplicate publish regression on resume: if crate-a (or crate-b) is incorrectly re-published while crate-c is skipped, the total PackagePublished count is still 3, which this check accepts. That means the test passes even though resume violated idempotency for an already-published crate. To actually guard the invariant described in the comments, the test should validate PackagePublished occurrences by package (e.g., crate-a@0.1.0 and crate-b@0.1.0 appear exactly once across both runs).

Useful? React with 👍 / 👎.

@codecov

codecov Bot commented Apr 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

EffortlessSteven added a commit that referenced this pull request Apr 17, 2026
         plus regression test for Failed->Skipped transition (#126)

## #125 — silent resume-skip

When resume encounters a package whose state is already \`Published\` or
\`Skipped\`, the sequential path short-circuited with a stdout info line
and \`continue\`, without recording an event. Since \`events.jsonl\` is
the **authoritative audit log** (state.json is a projection of it),
this left a user-visible gap: an auditor reading only events.jsonl
couldn't tell "resume looked at this package and trusted its terminal
state" from "resume never touched this package." Surfaced by the
synthetic rehearsal test (\`e2e_rehearse.rs\`, PR #124).

**Fix:** emit a \`PackageSkipped { reason: \"resume: state already <s>\" }\`
event in the already-complete match arm. No receipt entry is pushed —
the receipt vector has always excluded already-terminal packages on
the resume path, and callers depend on that shape (e.g.
\`run_resume_runs_publish_when_state_exists\`). Only the audit-trail
behavior changes.

## #126 — Failed->Skipped state drift

The issue speculated that resume's stdout-said-Skipped but
state.json-still-says-Failed could indicate \`update_state\` not firing
on the version-exists short-circuit when the prior state was Failed.

**Finding:** the current sequential code handles this correctly —
\`reg.version_exists(p) \=\\= true\` triggers
\`update_state(Skipped)\` which flushes to state.json, and a targeted
test (\`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\`)
passes without any code change. The observation in the issue may have
been a stale-state-file artifact in the synthetic rehearsal fixture.

Landing the test anyway as a regression guard: if a future refactor
breaks this semantics (e.g. parallel mode taking a different path, or
state write getting clobbered), the guard fires immediately.

## Tests

Two new \`#[serial]\` tests in \`engine::tests\`:
- \`resume_emits_package_skipped_event_for_already_published_state\`
  — seeds state with Published, runs \`run_publish\`, asserts
  \`PackageSkipped\` appears in events.jsonl.
- \`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\`
  — seeds state with Failed{Ambiguous}, registry returns 200, asserts
  reloaded state.json shows Skipped.

All 49 existing resume/publish tests remain green.
@EffortlessSteven EffortlessSteven merged commit cb6b955 into main Apr 18, 2026
20 checks passed
@EffortlessSteven EffortlessSteven deleted the feat/90-recover-rehearsal branch April 18, 2026 00:03
EffortlessSteven added a commit that referenced this pull request Apr 18, 2026
… (#130)

plus regression test for Failed->Skipped transition (#126)

## #125 — silent resume-skip

When resume encounters a package whose state is already \`Published\` or
\`Skipped\`, the sequential path short-circuited with a stdout info line
and \`continue\`, without recording an event. Since \`events.jsonl\` is
the **authoritative audit log** (state.json is a projection of it),
this left a user-visible gap: an auditor reading only events.jsonl
couldn't tell "resume looked at this package and trusted its terminal
state" from "resume never touched this package." Surfaced by the
synthetic rehearsal test (\`e2e_rehearse.rs\`, PR #124).

**Fix:** emit a \`PackageSkipped { reason: \"resume: state already <s>\" }\`
event in the already-complete match arm. No receipt entry is pushed —
the receipt vector has always excluded already-terminal packages on
the resume path, and callers depend on that shape (e.g.
\`run_resume_runs_publish_when_state_exists\`). Only the audit-trail
behavior changes.

## #126 — Failed->Skipped state drift

The issue speculated that resume's stdout-said-Skipped but
state.json-still-says-Failed could indicate \`update_state\` not firing
on the version-exists short-circuit when the prior state was Failed.

**Finding:** the current sequential code handles this correctly —
\`reg.version_exists(p) \=\\= true\` triggers
\`update_state(Skipped)\` which flushes to state.json, and a targeted
test (\`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\`)
passes without any code change. The observation in the issue may have
been a stale-state-file artifact in the synthetic rehearsal fixture.

Landing the test anyway as a regression guard: if a future refactor
breaks this semantics (e.g. parallel mode taking a different path, or
state write getting clobbered), the guard fires immediately.

## Tests

Two new \`#[serial]\` tests in \`engine::tests\`:
- \`resume_emits_package_skipped_event_for_already_published_state\`
  — seeds state with Published, runs \`run_publish\`, asserts
  \`PackageSkipped\` appears in events.jsonl.
- \`resume_from_failed_ambiguous_updates_state_to_skipped_when_registry_visible\`
  — seeds state with Failed{Ambiguous}, registry returns 200, asserts
  reloaded state.json shows Skipped.

All 49 existing resume/publish tests remain green.
EffortlessSteven added a commit that referenced this pull request Apr 18, 2026
Three-column reconciliation of what's actually merged vs each issue's
acceptance checklist. Single source of truth for what closes each
pillar issue.

**Findings:**

- **#90 Recover** — honestly closable. Code side is done (#124 + #130);
  operator-side real rehearsal is an ops action, not a code gap.

- **#97 Prove tier 2** — 85% done. Rehearsal + visibility + hard gate +
  plan_id binding all landed (#127 + #133). Missing: install/smoke
  check (cargo install against the rehearsal registry / consumer
  build). One narrow follow-up PR closes it.

- **#98 Remediate** — 60% done. Receipt schema + plan-yank (from-receipt)
  + yank primitive + fix-forward planning all landed (#121 + #132 +
  #134). Missing: plan-yank's --starting-crate graph mode, plan
  execution for yank + fix-forward. Two narrow follow-ups.

Also captures the two review concerns on #122 (Trusted Publishing)
that were addressed in a follow-up commit to that PR.

Recommended next merge order and follow-up PRs spelled out at bottom.
EffortlessSteven added a commit that referenced this pull request Apr 18, 2026
Three-column reconciliation of what's actually merged vs each issue's
acceptance checklist. Single source of truth for what closes each
pillar issue.

**Findings:**

- **#90 Recover** — honestly closable. Code side is done (#124 + #130);
  operator-side real rehearsal is an ops action, not a code gap.

- **#97 Prove tier 2** — 85% done. Rehearsal + visibility + hard gate +
  plan_id binding all landed (#127 + #133). Missing: install/smoke
  check (cargo install against the rehearsal registry / consumer
  build). One narrow follow-up PR closes it.

- **#98 Remediate** — 60% done. Receipt schema + plan-yank (from-receipt)
  + yank primitive + fix-forward planning all landed (#121 + #132 +
  #134). Missing: plan-yank's --starting-crate graph mode, plan
  execution for yank + fix-forward. Two narrow follow-ups.

Also captures the two review concerns on #122 (Trusted Publishing)
that were addressed in a follow-up commit to that PR.

Recommended next merge order and follow-up PRs spelled out at bottom.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verify resume path under real interruption (rehearsal procedure)

1 participant