docs(ci): audit and rewrite test-evidence-lanes (#189, docs-first)#234
Conversation
#189 was filed with a richer lane-routing proposal (e.g. moving cross-platform from every-PR to labeled+nightly). That work is deferred because: - Current PR wall-clock is ~24-28 min (recent CI runs from this rollout), close to the original 25-min target without aggressive routing. - Shipper has Windows-specific code paths (path handling, process spawning, line endings). Moving Windows/macOS/aarch64 builds to nightly means platform regressions surface a day later instead of inside the PR — that is a real loss for a release-pipeline product. - Coverage-removing routing changes are hard to undo: contributors stop expecting platform signal, regressions accumulate, then "fixing" the lanes becomes a multi-PR cleanup. So #189 lands as a documentation pass: capture the actual current state of .github/workflows/ so future routing decisions have an accurate baseline. Changes to docs/ci/test-evidence-lanes.md: - Replace the conflated "Always-On" table with a complete workflow inventory: 10 workflows, ~28 jobs across them, columns for trigger / lane / required-for-merge. - Per-job lane map for ci.yml (the load-bearing PR workflow), with predicates, observed wall-clock, and what each job proves. Previous doc was missing fuzz-smoke, cross-platform, release-build, crypto- proptests-heavy, and policy entirely; conflated `lint` into separate `fmt`/`clippy` rows. - Policy gates section now lists all eight xtask checks (file-policy, generated, executable, dependency-surface, workflow, process, network, lint-policy) plus the release-time no-panic check, with the PR each was introduced in. - Advisory/Routed section adds droid-review and droid (was missing). - Scheduled section corrects mutation to "Sunday 04:00 UTC" (was "nightly") and adds droid-security-scan. - Release Proof: full release.yml job table including the policy-gate job added in #229. - New "Routing Changes Deferred to Follow-Up PRs" section enumerates concrete movements with rationale: release-build to release-only, path-filtered fuzz-smoke, split cross-platform so only Linux is every-PR. No workflow files changed. Coverage-removing routing decisions deserve their own focused PRs. CHANGELOG.md Added entry under Documentation docs/ci/test-evidence-lanes.md ~180 lines rewritten
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Droid finished @EffortlessSteven's task —— View job Droid review complete. Validation Pass 2 finished. Results: 6/6 candidates approved. All findings cross-checked against actual workflow files.
No security issues found. Batched review submitted. |
| ## Workflow Inventory | ||
|
|
||
| ### Always-On (Every PR and Push) | ||
| Every workflow under `.github/workflows/` and the lane each one occupies. The audit dates from the #189 docs pass. |
There was a problem hiding this comment.
[P1] Workflow Inventory — architecture-guard.yml trigger omits path filter
The doc claims architecture-guard.yml triggers on push + pull_request without path restrictions. The actual file (architecture-guard.yml lines 3-12) restricts both triggers to only fire on path changes under crates/shipper-core/src/** or crates/shipper/src/**. A push to main that only touches documentation or CI configs does NOT trigger this workflow. The doc's Trigger column should reflect this path filtering, e.g.: push (main, path-filtered to crates/shipper-core/src/** + crates/shipper/src/**) + pull_request (path-filtered).
| | `security` | every PR | ~1 min | `cargo audit` against the current advisory database. | | ||
| | `docs` | every PR | ~1 min | `cargo doc --workspace --no-deps` clean under `-D warnings` (catches `rustdoc::invalid-html-tags` and friends). | | ||
| | `bdd` | every PR | ~2 min | Cucumber scenarios for core publish/resume/reconcile flows. | | ||
| | `fuzz-smoke` | every PR except `schedule` | ~10 min | Six fuzz targets at low-energy: parser, encrypt, sanitizer, plan, state, events. | |
There was a problem hiding this comment.
[P1] ci.yml job table — fuzz-smoke target count and names are inaccurate
Line 46 describes six fuzz targets as: parser, encrypt, sanitizer, plan, state, events. The actual ci.yml fuzz-smoke job (ci.yml lines ~347-351) runs exactly five cargo-fuzz commands: load_state, resolve_token, schema_version, release_levels, redact_output. The doc's "six" count is wrong and none of the listed names (parser, encrypt, sanitizer, plan, state, events) match the actual targets. The six names in the doc appear to conflate with the separate fuzz.yml nightly targets (load_state, resolve_token, encrypt_decrypt, retry_strategy, types_serialization, redact_output).
| | `droid-review.yml` | `pull_request` | Advisory (same-repo + bot guard) | Advisory | | ||
| | `droid.yml` | `issues` + `pull_request` (command-triggered) | Advisory (trusted-actor guard) | Advisory | | ||
| | `droid-security-scan.yml` | `schedule` + `workflow_dispatch` | Scheduled (Mon 08:00 UTC) | Advisory | | ||
| | `fuzz.yml` | `schedule` + `workflow_dispatch` | Nightly | Advisory | |
There was a problem hiding this comment.
[P2] Workflow Inventory — coverage.yml trigger omits path filter
The doc lists coverage.yml trigger as push (main) without path restrictions. The actual coverage.yml (lines 4-11) has push + pull_request triggers with an explicit path filter: only paths matching Cargo.toml, Cargo.lock, crates/**, fuzz/**, .github/workflows/coverage.yml, or codecov.yml. The doc omits this, making it appear coverage runs on all main pushes when it does not.
| |---|---|---|---| | ||
| | `lint` | every PR | ~1 min | `cargo fmt --check` + `cargo clippy --workspace --all-targets -- -D warnings`. | | ||
| | `policy` | every PR | ~1 min | All seven xtask policy checks in `--mode blocking-allowlist`, plus `policy-report`. See `docs/policy/NON_RUST_ROLLOUT.md`. | | ||
| | `test` (nextest, 3-OS matrix) | every PR | ~17 min (longest leg) | Unit and integration tests pass on Ubuntu, Windows, macOS. Doc-tests run alongside. | |
There was a problem hiding this comment.
[P2] Workflow Inventory — ripr.yml trigger omits path filter
The doc lists ripr.yml trigger as pull_request + workflow_dispatch. The actual ripr.yml (lines 4-11) restricts pull_request to only fire on paths matching crates/**, xtask/**, Cargo.{toml,lock}, ripr.toml, policy/ripr-suppressions.toml, or .github/workflows/ripr.yml. A docs-only PR does not trigger ripr. The path filter should be documented.
| **Recent observed PR wall-clock:** 24–28 min (CI runs from this rollout). Critical path is `test` on macOS plus `fuzz-smoke`; everything else fits inside that window. | ||
|
|
||
| ## Policy Gates (xtask-Enforced, Inside `ci.yml`'s `policy` Job) | ||
|
|
There was a problem hiding this comment.
[P3] ci.yml job table — bdd and release-build predicates understate their dependencies
The doc says bdd and release-build have predicate 'every PR'. In the actual ci.yml, bdd has needs: [lint] and release-build has needs: [lint, test]. These jobs only run when their dependencies succeed. While technically still 'every PR' in the sense they run on every PR that passes lint, the simple 'every PR' framing without dependency context could mislead a contributor expecting them to run independently.
| @@ -130,3 +162,19 @@ These crates receive the most rigorous mutation coverage because they handle rea | |||
| | `shipper-registry` | Registry API interactions | | |||
There was a problem hiding this comment.
[P3] Trust-Critical Crates table — shipper-registry listed but excluded from mutation coverage
The Trust-Critical Crates table includes shipper-registry, correctly noting it handles 'Registry API interactions'. However, mutation.yml's mutants-weekly job only mutates shipper-duration, shipper-types, and shipper-config — shipper-registry is absent. The doc's 'Expanding to full trust-critical surface is a future rollout step' comment is accurate but easy to miss. A reader could reasonably conclude shipper-registry is covered by the weekly mutation lane when it is not.
…) (#237) Stabilises `concurrent_version_exists_checks` (and any other test using `with_multi_server`) on slow macOS CI runners. Hit three times in a single rollout session (#233, #234, #236) as `version_exists: registry request failed -> operation timed out` against the local tiny_http mock. Root cause: the helper's accept loop blocked on `handler(req)` until each response was fully written before returning to `recv_timeout`. With 5 concurrent reqwest clients hitting the same loopback socket, the remaining clients sat in the kernel's TCP backlog long enough to exceed reqwest's default OS-level connect timeout. Windows and Linux runners process the queue fast enough to mask the bug; macOS does not. Fix: spawn one worker thread per accepted request and let the accept loop return to `recv_timeout` immediately. The accept loop still serialises on `recv_timeout` (tiny_http requires that), but handlers run in parallel, so the kernel's listen queue drains as fast as connections arrive. Other changes: - `recv_timeout` bumped from 30s to 60s for additional headroom. - Trait bound on the handler closure goes from `Fn + Send + 'static` to `Fn + Send + Sync + 'static` (required to wrap the handler in `Arc` for clone-into-workers). All existing call sites use closures that already satisfy `Sync`. - The accept thread joins worker threads before returning so any panic in a handler surfaces in CI. cargo test -p shipper-registry --lib passes 258/258 locally (Windows). The fix targets a macOS-specific timing bug, so CI is the real verification.
Summary
Docs-first PR for #189. Captures the actual current state of
.github/workflows/so future routing decisions have an accurate baseline. No workflow files changed.Why deferred from aggressive routing
#189 was filed with a richer lane-routing proposal (e.g. moving
cross-platformfrom every-PR to labeled+nightly). That work is deferred because:What the doc now captures (was missing/stale)
ci.yml: predicates, wall-clock, what each job proves. Previously elidedfuzz-smoke,cross-platform,release-build,crypto-proptests-heavy,policy; conflatedlintinto two rows.release.ymljob table including thepolicy-gatejob from feat(harden): no-panic check + release CI gate (#187 PR 8b) #229.Test plan
cargo fmt --all -- --check(unchanged source) clean.ls .github/workflows/*.ymland each file'son:block.ci.ymljob table cross-checked against the actual job names in the workflow file.Refs #189.