fix(auto-reply): preserve post-compaction failure context in user-facing recovery message (#67750)#87691
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 3, 2026, 4:28 AM ET / 08:28 UTC. Summary PR surface: Source +104, Tests +143. Total +247 across 2 files. Reproducibility: yes. source-level: current main tracks completed compaction and structured fallback attempts but can still resolve a post-compaction mixed timeout/billing failure to billing or generic copy without compaction context; no live timeout was reproduced in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the narrow formatter and regression tests after real behavior proof or a maintainer proof override confirms the post-compaction recovery message in an actual gateway/channel path. Do we have a high-confidence way to reproduce the issue? Yes, source-level: current main tracks completed compaction and structured fallback attempts but can still resolve a post-compaction mixed timeout/billing failure to billing or generic copy without compaction context; no live timeout was reproduced in this read-only review. Is this the best way to solve the issue? Yes, as a narrow fix: the formatter is the right owner boundary because the fallback engine already exposes structured attempts and the issue is about final user-facing recovery copy, not fallback selection or session reset policy. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 34c3827290a7. Label changesLabel justifications:
Evidence reviewedPR surface: Source +104, Tests +143. Total +247 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
d2813b8 to
5b8a9e2
Compare
Summary
What problem does this PR solve?
When auto-compaction succeeds inside a turn and the retried run still fails (e.g.
LLM idle timeout (120s), fallback exhausted with billing-skipped candidates),runAgentTurnWithFallback's catch path collapses the failure into either the generic⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session.copy or a bare cause-specific message (e.g.BILLING_ERROR_USER_MESSAGE). Either way, the user loses the fact that compaction was attempted, that the timeout was on the retried turn, and which fallback candidates failed for which reason — exactly the chain in the bug report atsrc/auto-reply/reply/agent-runner-execution.ts.Why does this matter now?
#67750was filed against the user-facing chain insrc/auto-reply/reply/agent-runner-execution.ts. The same catch block is on the hot path for every messaging surface (embedded harness runs and Codex / Pi-style CLI dispatch), so the same erasure happens across channels.What is the intended outcome?
The user-facing message preserves the post-compaction signal already tracked in
autoCompactionCountand the per-attempt reason chain already carried onFallbackSummaryError.attempts. Cause-specific copy (billing, rate-limit, timeout, oauth) is preserved verbatim so existing actionable hints still surface; the generic catch-all gains a structuredAuto-compaction succeeded, but the retried turn still failed — ...prefix plus a concrete next step instead of bare/new.What is intentionally out of scope?
src/agents/model-fallback.ts) or toFallbackSummaryError's shape.src/agents/embedded-agent-subscribe.ts) — theattemptCompactionCountincrement in the outer execution layer (src/auto-reply/reply/agent-runner-execution.ts:2535,2631) is already the source of truth.HEARTBEAT_EXTERNAL_RUN_FAILURE_TEXT) and control-UI surfaces keep their existing copy contracts.What does success look like?
Three new vitest cases under
runAgentTurnWithFallbackcover: (1) post-compactionFallbackSummaryErrorwith timeout + billing-skip attempts surfaces both compaction context and the per-attempt summary; (2) post-compaction plain failure surfaces the structured prefix plus the generic-mode guidance; (3) failure without a successful compaction still produces the plainGENERIC_EXTERNAL_RUN_FAILURE_TEXT.What should reviewers focus on?
autoCompactionCount > 0 && !params.isHeartbeat && !shouldSurfaceToControlUi, so existing copy tests (rate-limit, billing-only, gateway-restart, Codex usage-limit, missing-API-key, CLI subprocess timeout) keep their exact strings.POST_COMPACTION_ATTEMPT_SUMMARY_MAX_CHARS = 240) is sized to keep the prefix concise on transports with text-length caps.Linked context
Which issue does this close?
Closes #67750
Which issues, PRs, or discussions are related?
Related #67750
Was this requested by a maintainer or owner?
No — addressing the labelled
P2/clawsweeper:fix-shape-clear/clawsweeper:queueable-fix/clawsweeper:source-reprotriage on #67750.Real behavior proof (required for external PRs)
/newfallback (or a bareBILLING_ERROR_USER_MESSAGE) and erases the compaction-succeeded signal that is already tracked inautoCompactionCountand onFallbackSummaryError.attempts.runEmbeddedAgent+runWithModelFallbackseam that drive the exact failure shape from the bug report (embedded run emitscompaction phase=end completed=true, then throws an LLM idle timeout; fallback layer rejects with aFallbackSummaryErrorcontainingreason: "timeout"plusreason: "billing"skip entries). Theclawsweeper:source-reprolabel on the issue already covers the source-of-truth chain atsrc/auto-reply/reply/agent-runner-execution.ts:2660-2811.node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts(149 tests, all passing).pnpm tsgo:coreandpnpm tsgo:core:test(clean).pnpm format:check src/auto-reply/reply/agent-runner-execution.ts src/auto-reply/reply/agent-runner-execution.test.ts(clean).node scripts/run-oxlint.mjs src/auto-reply/reply/agent-runner-execution.ts src/auto-reply/reply/agent-runner-execution.test.ts(clean).surfaces post-compaction context when the retried turn times out (#67750)asserts the rendered text containsAuto-compaction succeeded, retains the cause-specificbillingcopy, and includes the per-attempt summaryopenai-codex/gpt-5.4 timed outplusanthropic/claude-opus-4-7 skipped (billing). Testsurfaces post-compaction context for plain failures without a fallback summaryasserts the generic-mode guidance ("The context was compacted but no candidate could finish the turn ...") instead of the bare/newcopy.Test Files 1 passed (1) / Tests 149 passed (149).clawsweeper:source-reproprecisely because the failure chain is observable in source plus emitted log lines (see the verbatim repro evidence in the issue body), and the diff exercises the exactautoCompactionCount > 0 && FallbackSummaryErrorseam from that chain.src/auto-reply/reply/agent-runner-execution.ts:2795-2805would resolvefallbackTextto eitherBILLING_ERROR_USER_MESSAGE(whenhasBillingAttemptSummaryis true) orGENERIC_EXTERNAL_RUN_FAILURE_TEXT(when no specific classifier matches) — both of which dropautoCompactionCountand theFallbackSummaryError.attemptscause chain on the floor.Tests and validation
Which commands did you run?
node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts(149 passed)node scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/agent-runner-helpers.test.ts src/auto-reply/reply/agent-runner.misc.runreplyagent.test.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.ts(194 passed)node scripts/run-vitest.mjs src/auto-reply/reply/followup-runner.test.ts(55 passed — sibling-surface check; followup-runner sharesautoCompactionCountbut uses a different non-user-facing failure path)pnpm tsgo:coreandpnpm tsgo:core:test(clean)pnpm format:check(clean)What regression coverage was added or updated?
Three new cases in
src/auto-reply/reply/agent-runner-execution.test.tsunderdescribe("runAgentTurnWithFallback"):surfaces post-compaction context when the retried turn times out (#67750)— full repro of the bug-report failure chain (compaction succeeds; retried turn timeout; fallback exhausted with timeout + billing-skip attempts). Asserts the rendered text preserves compaction context, the resolved cause-specific copy, and the per-attempt summary.surfaces post-compaction context for plain failures without a fallback summary— covers the generic catch-all branch when the error is a plainError(noFallbackSummaryError), confirming the structured prefix and the specific next-step guidance replace the bare/newfallback.does not add post-compaction context when no compaction succeeded— regression guard so the wrap stays scoped toautoCompactionCount > 0and ordinary unknown errors still resolve toGENERIC_EXTERNAL_RUN_FAILURE_TEXT.What failed before this fix, if known?
Test #1 above would have produced
result.payload.text === "billing"(the mockedBILLING_ERROR_USER_MESSAGE), with no mention of compaction or the per-attempt timeout. Test #2 would have producedresult.payload.text === GENERIC_RUN_FAILURE_TEXT, with no mention of compaction succeeding.If no test was added, why not?
N/A — tests added.
Risk checklist
Did user-visible behavior change? (
Yes/No)Yes — only on the post-auto-compaction failure path. The pre-compaction failure path (
autoCompactionCount === 0) returns the existingfallbackTextunchanged, and existing tests for billing, rate-limit, gateway-restart, Codex usage-limit, missing-API-key, CLI subprocess timeout, and the generic copy all still assert their exact strings.Did config, environment, or migration behavior change? (
Yes/No)No.
Did security, auth, secrets, network, or tool execution behavior change? (
Yes/No)No.
What is the highest-risk area?
User-facing copy on the post-compaction failure path. Wrong wording on a hot error path can confuse users about whether the message they sent was processed.
How is that risk mitigated?
autoCompactionCount > 0,!params.isHeartbeat,!shouldSurfaceToControlUi), so heartbeat probes and the control-UI logs path keep their existing copy.BILLING_ERROR_USER_MESSAGE, rate-limit cooldown, Codex usage-limit, missing-API-key, OAuth re-auth, CLI timeout) is preserved verbatim — the wrap only prepends/appends, never replaces.agent-runner-execution.test.tscases still pass (including everysurfaces Xcopy contract).Current review state
What is the next action?
Maintainer review.
What is still waiting on author, maintainer, CI, or external proof?
CI run on the new commit.
Which bot or reviewer comments were addressed?
None yet — first revision.