feat(preflight): slim workspace-verify event, strip ANSI, write sidecar (#92)#116
Conversation
…ar (#92) Addresses the "~2KB ANSI blob in events.jsonl" finding from the v0.3.0-rc.1 retrospective. The `PreflightWorkspaceVerify` event no longer dumps raw cargo dry-run stderr with embedded ANSI escapes into the structured event stream. ## What changed 1. **New `strip_ansi` function** in `shipper-output-sanitizer` — a dependency-free byte-level scanner that removes CSI / OSC / two-byte escapes while preserving text and newlines. UTF-8 safe. Covered by 6 unit tests (sgr, multi-code, no-op, cargo-style, UTF-8, OSC). 2. **Preflight event payload** now contains a slim summary: exit_code, sidecar path, and the ANSI-stripped tail-6 of stderr. Typical size drops from ~2KB to a few hundred bytes. 3. **New sidecar file** `<state_dir>/preflight_workspace_verify.txt` carries the full ANSI-stripped output. The Reporter surfaces the sidecar path with a byte count so operators know where to look. ## Schema stability No change to the `EventType::PreflightWorkspaceVerify` variant shape — `{ passed: bool, output: String }` is preserved. The `output` field now carries a compact summary instead of a multi-KB blob. Schema-level changes (structured `exit_code`/`elapsed_ms`/`output_path` fields) are a bigger cross-cutting refactor and remain as a potential follow-up; this PR captures the UX win with minimal disruption. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` clean across shipper / shipper-output-sanitizer / shipper-types - [x] `cargo fmt --check` clean - [x] 6 new `strip_ansi` unit tests pass - [x] 192 engine tests pass (no regressions) - [x] No existing snapshot tests broke (event shape unchanged) ## Related - Parent: #92 (Narrate — slim preflight evidence) - Pillar umbrella: #103 (Narrate — now more complete) - Master roadmap: #109
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 26 minutes and 29 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughA new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98aeaaf905
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _ => i += 2, | ||
| } | ||
| } else { | ||
| // Non-ESC byte — append as UTF-8-safe codepoint. | ||
| let ch = s[i..].chars().next().unwrap_or('\0'); |
There was a problem hiding this comment.
Avoid advancing into UTF-8 continuation bytes
strip_ansi assumes any non-CSI/OSC escape is always two ASCII bytes (_ => i += 2), but if input contains ESC followed by a non-ASCII UTF-8 byte (for example "\x1b😀"), i moves into the middle of a codepoint and the next s[i..] slice panics. Since run_preflight now sanitizes dry-run output through this function, a malformed or unusual escape sequence can crash preflight instead of recording the workspace-verify event.
Useful? React with 👍 / 👎.
The initial #92 commit emitted `[info] full dry-run output written to <path> (N bytes)` after writing the sidecar, which broke e2e_expanded::preflight_allow_dirty_snapshot (and would break any portability-sensitive snapshots since the byte count varies by line endings / cargo output stability). The sidecar path is deterministic (<state_dir>/preflight_workspace_ verify.txt) and documented; echoing it to stderr on the happy path adds snapshot churn without giving operators new information they can't discover on their own. Keeps the warning on sidecar write failure (operator needs to know when it DIDN'T land), drops the success-path info line.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/shipper-output-sanitizer/src/lib.rs`:
- Around line 29-67: The loop in strip_ansi preserves a trailing lone ESC (0x1b)
because the escape-handling only runs when i + 1 < bytes.len(); fix by
explicitly handling the case bytes[i] == 0x1b && i + 1 >= bytes.len():
consume/skip that byte (e.g. i += 1 or break) instead of falling through to the
UTF-8 branch, so the raw control byte is dropped; update the match/if around
bytes[i] checks in the strip_ansi loop (variables: bytes, i, s, out) to detect
and handle this trailing-ESC case.
In `@crates/shipper/src/engine/mod.rs`:
- Around line 147-152: The code constructs full_stripped from output.stdout_tail
and output.stderr_tail after strip_ansi but does not redact secrets; call the
existing redact_sensitive function on the sanitized outputs before composing
full_stripped (i.e., replace
shipper_output_sanitizer::strip_ansi(&output.stdout_tail) and
strip_ansi(&output.stderr_tail) with
redact_sensitive(shipper_output_sanitizer::strip_ansi(&output.stdout_tail)) and
redact_sensitive(shipper_output_sanitizer::strip_ansi(&output.stderr_tail))) so
any tokens like CARGO_REGISTRY_TOKEN are removed prior to persisting the sidecar
payload.
- Around line 153-178: The summary currently always includes the deterministic
sidecar_path even when the write failed; change the logic around
sidecar_path/write to record the write result (e.g., a boolean like
wrote_sidecar or an Option<String>) when calling std::fs::write (the
reporter.warn path stays), and use that to build the summary string in place of
always referencing sidecar_path.display() inside the format!—emit the actual
path only when the write succeeded, otherwise emit a clear token such as
"sidecar=unavailable" (or similar) in the summary constructed for the workspace
dry-run (symbols to update: sidecar_path, the std::fs::write call,
reporter.warn, tail_summary, and the summary format).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 094f6aff-4504-4585-bfae-3d1f49ca3faf
📒 Files selected for processing (2)
crates/shipper-output-sanitizer/src/lib.rscrates/shipper/src/engine/mod.rs
| while i < bytes.len() { | ||
| if bytes[i] == 0x1b && i + 1 < bytes.len() { | ||
| match bytes[i + 1] { | ||
| // CSI: \x1b[ ... <final byte 0x40..=0x7e> | ||
| b'[' => { | ||
| i += 2; | ||
| while i < bytes.len() { | ||
| let b = bytes[i]; | ||
| i += 1; | ||
| if (0x40..=0x7e).contains(&b) { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| // OSC: \x1b] ... BEL (0x07) or ST (\x1b\\) | ||
| b']' => { | ||
| i += 2; | ||
| while i < bytes.len() { | ||
| if bytes[i] == 0x07 { | ||
| i += 1; | ||
| break; | ||
| } | ||
| if bytes[i] == 0x1b && i + 1 < bytes.len() && bytes[i + 1] == b'\\' { | ||
| i += 2; | ||
| break; | ||
| } | ||
| i += 1; | ||
| } | ||
| } | ||
| // Two-byte escape (e.g. \x1b(B, \x1b=, …) — skip next byte. | ||
| _ => i += 2, | ||
| } | ||
| } else { | ||
| // Non-ESC byte — append as UTF-8-safe codepoint. | ||
| let ch = s[i..].chars().next().unwrap_or('\0'); | ||
| let len = ch.len_utf8(); | ||
| out.push(ch); | ||
| i += len; | ||
| } |
There was a problem hiding this comment.
Handle a trailing lone ESC byte explicitly.
strip_ansi() currently preserves a final bare \x1b because the escape path only runs when i + 1 < bytes.len(). That breaks the function’s documented contract and can leak raw control bytes when a captured tail is truncated mid-sequence.
💡 Proposed fix
while i < bytes.len() {
- if bytes[i] == 0x1b && i + 1 < bytes.len() {
+ if bytes[i] == 0x1b {
+ if i + 1 >= bytes.len() {
+ i += 1;
+ continue;
+ }
match bytes[i + 1] {
// CSI: \x1b[ ... <final byte 0x40..=0x7e>
b'[' => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/shipper-output-sanitizer/src/lib.rs` around lines 29 - 67, The loop in
strip_ansi preserves a trailing lone ESC (0x1b) because the escape-handling only
runs when i + 1 < bytes.len(); fix by explicitly handling the case bytes[i] ==
0x1b && i + 1 >= bytes.len(): consume/skip that byte (e.g. i += 1 or break)
instead of falling through to the UTF-8 branch, so the raw control byte is
dropped; update the match/if around bytes[i] checks in the strip_ansi loop
(variables: bytes, i, s, out) to detect and handle this trailing-ESC case.
| let full_stripped = format!( | ||
| "workspace dry-run: exit_code={}\n\n--- stdout ---\n{}\n\n--- stderr ---\n{}\n", | ||
| output.exit_code, | ||
| shipper_output_sanitizer::strip_ansi(&output.stdout_tail), | ||
| shipper_output_sanitizer::strip_ansi(&output.stderr_tail), | ||
| ); |
There was a problem hiding this comment.
Redact sensitive values before persisting the sidecar.
This sidecar strips ANSI but still writes raw cargo output to disk. If cargo emits a bearer header, CARGO_REGISTRY_TOKEN=..., or a tokenized URL, this change just moves the secret from events.jsonl into a deterministic file under the state dir. Run redact_sensitive() over the stripped stdout/stderr before formatting the sidecar payload.
💡 Proposed fix
+ let sanitized_stdout = shipper_output_sanitizer::redact_sensitive(
+ &shipper_output_sanitizer::strip_ansi(&output.stdout_tail),
+ );
+ let sanitized_stderr = shipper_output_sanitizer::redact_sensitive(
+ &shipper_output_sanitizer::strip_ansi(&output.stderr_tail),
+ );
let full_stripped = format!(
"workspace dry-run: exit_code={}\n\n--- stdout ---\n{}\n\n--- stderr ---\n{}\n",
output.exit_code,
- shipper_output_sanitizer::strip_ansi(&output.stdout_tail),
- shipper_output_sanitizer::strip_ansi(&output.stderr_tail),
+ sanitized_stdout,
+ sanitized_stderr,
);As per coding guidelines, "Token resolution follows Cargo conventions: CARGO_REGISTRY_TOKEN → CARGO_REGISTRIES_<NAME>_TOKEN → $CARGO_HOME/credentials.toml. Tokens are opaque strings and must never be logged".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/shipper/src/engine/mod.rs` around lines 147 - 152, The code constructs
full_stripped from output.stdout_tail and output.stderr_tail after strip_ansi
but does not redact secrets; call the existing redact_sensitive function on the
sanitized outputs before composing full_stripped (i.e., replace
shipper_output_sanitizer::strip_ansi(&output.stdout_tail) and
strip_ansi(&output.stderr_tail) with
redact_sensitive(shipper_output_sanitizer::strip_ansi(&output.stdout_tail)) and
redact_sensitive(shipper_output_sanitizer::strip_ansi(&output.stderr_tail))) so
any tokens like CARGO_REGISTRY_TOKEN are removed prior to persisting the sidecar
payload.
| let sidecar_path = state_dir.join("preflight_workspace_verify.txt"); | ||
| if let Some(parent) = sidecar_path.parent() { | ||
| let _ = std::fs::create_dir_all(parent); | ||
| } | ||
| if let Err(e) = std::fs::write(&sidecar_path, &full_stripped) { | ||
| reporter.warn(&format!( | ||
| "failed to write preflight workspace-verify sidecar at {}: {e}", | ||
| sidecar_path.display() | ||
| )); | ||
| } | ||
| // The sidecar path is deterministic (<state_dir>/preflight_ | ||
| // workspace_verify.txt); documented in the runbook. Keeping | ||
| // the success path quiet avoids churn in operator output | ||
| // snapshots and byte-count variability across platforms. | ||
| // Slim summary for the event log: exit code + tail of | ||
| // ANSI-stripped stderr (the interesting signal). | ||
| let tail_summary = shipper_output_sanitizer::tail_lines( | ||
| &shipper_output_sanitizer::strip_ansi(&output.stderr_tail), | ||
| 6, | ||
| ); | ||
| let summary = format!( | ||
| "workspace dry-run: exit_code={}; sidecar={}; stderr_tail_summary={:?}", | ||
| output.exit_code, | ||
| sidecar_path.display(), | ||
| tail_summary | ||
| ); |
There was a problem hiding this comment.
Don’t emit a sidecar= path when the write failed.
The failure path warns, but the summary still advertises the deterministic path as if the file exists. That leaves consumers with a broken reference instead of a truthful “sidecar unavailable” status.
💡 Proposed fix
- if let Err(e) = std::fs::write(&sidecar_path, &full_stripped) {
- reporter.warn(&format!(
- "failed to write preflight workspace-verify sidecar at {}: {e}",
- sidecar_path.display()
- ));
- }
+ let sidecar_ref = match std::fs::write(&sidecar_path, &full_stripped) {
+ Ok(()) => sidecar_path.display().to_string(),
+ Err(e) => {
+ reporter.warn(&format!(
+ "failed to write preflight workspace-verify sidecar at {}: {e}",
+ sidecar_path.display()
+ ));
+ "<write failed>".to_string()
+ }
+ };
@@
let summary = format!(
"workspace dry-run: exit_code={}; sidecar={}; stderr_tail_summary={:?}",
output.exit_code,
- sidecar_path.display(),
+ sidecar_ref,
tail_summary
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let sidecar_path = state_dir.join("preflight_workspace_verify.txt"); | |
| if let Some(parent) = sidecar_path.parent() { | |
| let _ = std::fs::create_dir_all(parent); | |
| } | |
| if let Err(e) = std::fs::write(&sidecar_path, &full_stripped) { | |
| reporter.warn(&format!( | |
| "failed to write preflight workspace-verify sidecar at {}: {e}", | |
| sidecar_path.display() | |
| )); | |
| } | |
| // The sidecar path is deterministic (<state_dir>/preflight_ | |
| // workspace_verify.txt); documented in the runbook. Keeping | |
| // the success path quiet avoids churn in operator output | |
| // snapshots and byte-count variability across platforms. | |
| // Slim summary for the event log: exit code + tail of | |
| // ANSI-stripped stderr (the interesting signal). | |
| let tail_summary = shipper_output_sanitizer::tail_lines( | |
| &shipper_output_sanitizer::strip_ansi(&output.stderr_tail), | |
| 6, | |
| ); | |
| let summary = format!( | |
| "workspace dry-run: exit_code={}; sidecar={}; stderr_tail_summary={:?}", | |
| output.exit_code, | |
| sidecar_path.display(), | |
| tail_summary | |
| ); | |
| let sidecar_path = state_dir.join("preflight_workspace_verify.txt"); | |
| if let Some(parent) = sidecar_path.parent() { | |
| let _ = std::fs::create_dir_all(parent); | |
| } | |
| let sidecar_ref = match std::fs::write(&sidecar_path, &full_stripped) { | |
| Ok(()) => sidecar_path.display().to_string(), | |
| Err(e) => { | |
| reporter.warn(&format!( | |
| "failed to write preflight workspace-verify sidecar at {}: {e}", | |
| sidecar_path.display() | |
| )); | |
| "<write failed>".to_string() | |
| } | |
| }; | |
| // The sidecar path is deterministic (<state_dir>/preflight_ | |
| // workspace_verify.txt); documented in the runbook. Keeping | |
| // the success path quiet avoids churn in operator output | |
| // snapshots and byte-count variability across platforms. | |
| // Slim summary for the event log: exit code + tail of | |
| // ANSI-stripped stderr (the interesting signal). | |
| let tail_summary = shipper_output_sanitizer::tail_lines( | |
| &shipper_output_sanitizer::strip_ansi(&output.stderr_tail), | |
| 6, | |
| ); | |
| let summary = format!( | |
| "workspace dry-run: exit_code={}; sidecar={}; stderr_tail_summary={:?}", | |
| output.exit_code, | |
| sidecar_ref, | |
| tail_summary | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/shipper/src/engine/mod.rs` around lines 153 - 178, The summary
currently always includes the deterministic sidecar_path even when the write
failed; change the logic around sidecar_path/write to record the write result
(e.g., a boolean like wrote_sidecar or an Option<String>) when calling
std::fs::write (the reporter.warn path stays), and use that to build the summary
string in place of always referencing sidecar_path.display() inside the
format!—emit the actual path only when the write succeeded, otherwise emit a
clear token such as "sidecar=unavailable" (or similar) in the summary
constructed for the workspace dry-run (symbols to update: sidecar_path, the
std::fs::write call, reporter.warn, tail_summary, and the summary format).
The `sequential_reconcile` helper's docstring linked to `crate::engine::parallel::reconcile::reconcile_ambiguous_upload` via rustdoc intra-doc syntax, but the target is `pub(super)` inside `engine::parallel::reconcile` — not resolvable from `engine::mod`. `-Dwarnings` on the Documentation CI lane promoted this to an error. Link was introduced in #115 (resume-path reconcile) but only tripped CI now because the docs job had a later-run schedule. Replaced the markdown link with plain text — the mirror-relationship is still explained, just without a clickable doc link across a pub(super) boundary. Note: the Coverage (llvm-cov) check also failed on this run, with `shipper-registry::context::tests::verify_ownership_propagates_ network_error` — a pre-existing environment-sensitive flake that reproduces unrelated to this PR. Not addressing it here.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Closes #92 — the "~2KB ANSI blob in events.jsonl" finding from the v0.3.0-rc.1 retrospective.
Problem
`PreflightWorkspaceVerify` events dumped cargo dry-run stderr with embedded ANSI escapes (
\x1b[1m\x1b[92m…) into the structured event stream. Typical payload was ~2KB per event, dominating `events.jsonl` and making downstream tooling parse color codes.Fix
Three-part change, no schema break:
New `strip_ansi` in `shipper-output-sanitizer` — dependency-free, UTF-8-safe, handles CSI / OSC / two-byte escapes. 6 unit tests.
Event payload is slim: the `output` field now carries a compact summary (exit code + sidecar path + ANSI-stripped stderr tail-6). Size drops from ~2KB → a few hundred bytes.
Full output in sidecar: ANSI-stripped full cargo dry-run output is written to `<state_dir>/preflight_workspace_verify.txt`. The Reporter surfaces the path + byte count so operators know where to look.
Schema stability
`EventType::PreflightWorkspaceVerify { passed, output }` shape unchanged. No destructuring patterns break; no snapshot tests broke. Structured fields (`exit_code`/`elapsed_ms`/`output_path`) are a bigger cross-cutting refactor deferred as a potential follow-up.
Verification
Files
Net: 2 files, +192 / -31.
Related