Preserve agent hard timeout attribution#87902
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 2:49 PM ET / 18:49 UTC. Summary PR surface: Source +1256, Tests +3961. Total +5217 across 66 files. Reproducibility: yes. The PR body gives before/after Crabbox proof for the reported timeout path, and source inspection confirms current main lacks the PR's hard-timeout attribution guard and blocked-timeout preservation path. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land only after a maintainer explicitly accepts the cross-surface timeout contract and confirms the remaining physical cleanup work stays tracked separately in #76962. Do we have a high-confidence way to reproduce the issue? Yes. The PR body gives before/after Crabbox proof for the reported timeout path, and source inspection confirms current main lacks the PR's hard-timeout attribution guard and blocked-timeout preservation path. Is this the best way to solve the issue? Yes, with maintainer sign-off. The cross-path implementation matches the bug shape better than a single agent.wait patch, but the externally observed terminal-status contract needs explicit maintainer acceptance before merge. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 4e2d9b0b760d. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +1256, Tests +3961. Total +5217 across 66 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
963d1cb to
9d05f97
Compare
0642411 to
b16bf36
Compare
|
Closing this because the replacement fix landed in #88162 as acb0e9c. That change keeps the core idea from this PR, but moves terminal run outcome precedence into one shared helper and wires the projections through that instead of carrying the larger repeated-surface patch. Thanks for finding the timeout/cancel/completion race across the surfaces. |
Summary
Fixes #87444 by preserving the distinction between a real run timeout and ordinary abort/cancel/error signals after an agent run crosses its configured hard deadline.
The change carries hard-timeout attribution through the Gateway wait/dedupe paths, subagent lifecycle reconciliation, task registry state, SDK normalization, OpenResponses projection, ACP relay, TUI projection, and persisted session recovery. It also keeps non-timeout aborts distinct, so user/RPC cancellation still normalizes as cancelled instead of timed out.
Related coverage: this also partially addresses #76962 by ensuring a terminal subagent timeout announcement is no longer held behind a still-active embedded child run. It does not claim to fully fix #76962's physical
claude -pprocess cleanup, exec approval queue cancellation, or late direct-output leak; those remain separate proof/fix surfaces.LoC churn: application code
+1457/-248; non-application code, including tests and test helpers,+4784/-776.Why This Shape
The reported failure was one symptom of a broader invariant: once a run has hit its hard timeout, later cleanup signals must not erase or reclassify that timeout. Several surfaces observe the same run at different times, so the fix is intentionally cross-path rather than limited to the original
agent.waitsymptom.The highest-risk sibling behavior is cancellation. This PR adds coverage that cancelled/RPC-stopped runs remain cancelled while hard-timeout runs remain timed out.
The #76962 overlap is intentionally narrow: terminal timeout announcement now proceeds from the already-final timeout outcome instead of waiting for the embedded child process to settle. That removes one path where timeout delivery could be delayed by a still-active child, while leaving process termination and late-output suppression to the dedicated zombie-process cleanup path.
Practical Impact
This change makes timeout observable and sticky at the orchestration layer. A subagent that crosses its configured hard timeout should be reported as timed out across parent waits, Gateway/SDK/OpenResponses callers, task state, channel delivery, and recovered session state.
That means parent turns can unblock and progress sooner with the right terminal reason instead of waiting behind child cleanup or receiving an ambiguous pending, aborted, cancelled, or generic error state. More accurate logs are a side effect; the main behavior is that user-facing status and parent workflows no longer depend on the still-unwinding child process finishing cleanup first.
This does not introduce process cleanup enforcement. It does not forcibly kill stuck child processes, cancel exec approval queues, suppress every possible late direct-output leak, or guarantee physical
claude -pprocess cleanup. The enforced contract here is the parent-facing run outcome, not OS/process reaping.Verification
$autoreview: clean, zero accepted/actionable findings.git diff --check: clean.run_3eff1d4287bd: 31 files passed, 153 tests passed, 909 skipped.run_3eff1d4287bd: all 8 release-candidate scenarios passed.run_3eff1d4287bd: built an npm tarball from this branch, installed that package in a Docker environment, and passed the Telegram round-trip channel test using a Convex-leasedtelegramcredential.run_68e8283ae722: latestorigin/mainreproduced the subagent timeout leaves zombie claude -p → late output emitted directly to user transport (bypasses parent agent) #76962 sub-symptom by calling the embedded-child wait on a terminal timeout; this branch passed the same before/after probe by announcing without that wait.The live matrix covered:
agent.waitand task terminal state.runs.create,run.wait, and cancellation normalization.Real behavior proof
Behavior addressed: Subagent hard run timeouts now remain terminal timeouts across Gateway wait/dedupe, task registry, SDK, OpenResponses, persisted session reconstruction, and channel delivery, instead of being lost or misclassified after late abort/rejection/completion signals. The branch also prevents terminal timeout announcements from waiting behind an active embedded child run, partially addressing #76962.
Real environment tested: AWS Crabbox Linux
c7a.8xlarge, provideraws, leasecbx_ab678695114e, slugharbor-crab, runrun_3eff1d4287bd; related #76962 probe also ran on AWS Crabboxrun_68e8283ae722.Exact steps or command run after this patch: Ran autoreview, then ran a release-candidate proof matrix on AWS Crabbox against this branch. The matrix started a real Gateway with live OpenAI model access, exercised the reported subagent timeout behavior and sibling API paths, built the branch into an npm package tarball, installed that tarball in Docker, and verified Telegram channel round-trip delivery. A separate before/after Crabbox probe compared latest
origin/mainwith this branch for the #76962 terminal-timeout announcement sub-symptom.Evidence after fix: The targeted regression suite passed; the live Gateway/API matrix passed all 8 scenarios; the package-installed Telegram Docker E2E passed; and the #76962-related probe passed on this branch after reproducing on latest
origin/main.Observed result after fix: The reported path returned a terminal timeout with no active child runs remaining, delivered completion state,
agent.waitstatustimeout, and timeout outcome preserved after late aborted/rejection signals. For the #76962-related probe, terminal timeout announcement proceeded without waiting for the still-active embedded child run.What was not tested: The final live matrix used
openai/gpt-5.4-minibecause earliergpt-5.5attempts hit OpenAI TPM limits. The #76962 probe did not prove physicalclaude -pprocess termination, exec approval queue cancellation, or late direct-output suppression.