fix(agents): extend terminal outcome projections#88162
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 8:46 PM ET / 00:46 UTC. Summary PR surface: Source +152, Tests +628. Total +780 across 18 files. Reproducibility: yes. from source inspection, though I did not execute it: current main has separate SDK, task, and session projection logic that does not apply the shared timeout/cancel precedence everywhere. The PR adds focused regression tests for those projection paths. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the shared-helper projection fix only after maintainer approval of the SDK/session-state semantics and real after-fix behavior proof showing the timeout attribution path outside unit tests. Do we have a high-confidence way to reproduce the issue? Yes from source inspection, though I did not execute it: current main has separate SDK, task, and session projection logic that does not apply the shared timeout/cancel precedence everywhere. The PR adds focused regression tests for those projection paths. Is this the best way to solve the issue? Mostly yes: routing the projections through the shared terminal outcome helper matches repository policy and removes drift. The remaining gap is merge proof, not a line-level implementation defect I can point to in the final diff. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against a341ae27ece9. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +152, Tests +628. Total +780 across 18 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2e6ecb6f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| input.status === "timeout" && | ||
| (isHardAgentRunTimeoutPhase(timeoutPhase) || providerStarted === true); | ||
| isHardAgentRunTimeoutPhase(timeoutPhase) || | ||
| ((input.status === "timeout" || input.status === "error") && providerStarted === true); |
There was a problem hiding this comment.
Require timeout attribution before marking errors as timed out
When a lifecycle/error or agent.wait payload includes providerStarted: true for a non-timeout provider failure (for example an auth, rate-limit, or model error after the request reached the provider) but does not include a hard timeoutPhase, this branch now classifies the run as hard_timeout. providerStarted only says the provider request began, so this makes task/session/SDK projections report those failures as timeouts instead of failures; the hard-timeout path should require an actual timeout status/phase rather than status === "error" plus providerStarted.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 790dff7dde
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (hasHardTimeoutMetadata(data, true)) { | ||
| return "run.timed_out"; |
There was a problem hiding this comment.
Require timeout attribution before timed-out events
Fresh evidence: the SDK event normalizer duplicates the providerStarted-only timeout check here, so fixing the shared terminal helper would not correct normalized events. For lifecycle error events from a provider request that started but failed for a non-timeout reason (auth, rate limit, model error, etc.), passing true makes hasHardTimeoutMetadata return true solely from providerStarted === true, causing subscribers to see run.timed_out instead of run.failed; this branch should require an actual timeout status/phase rather than provider-started metadata alone.
Useful? React with 👍 / 👎.
2622b0b to
ae2c0cd
Compare
|
Verification before merge:
|
ae2c0cd to
13a24e9
Compare
|
Final rebased-head verification:
|
* fix(agents): extend terminal outcome projections * fix(agents): align terminal outcome follow-up checks * fix(agents): satisfy terminal outcome mapper lint * test(scripts): isolate websocket open timers * test(security): drive sandbox browser timeout timers * test(scripts): drive gh-read timeout timers * test(agents): isolate code mode timers * fix(agents): preserve hard timeouts on wait surfaces * fix(agents): require timeout attribution for provider errors * fix(sdk): require timeout attribution for provider errors * fix(scripts): preserve changelog parse cause
* fix(agents): extend terminal outcome projections * fix(agents): align terminal outcome follow-up checks * fix(agents): satisfy terminal outcome mapper lint * test(scripts): isolate websocket open timers * test(security): drive sandbox browser timeout timers * test(scripts): drive gh-read timeout timers * test(agents): isolate code mode timers * fix(agents): preserve hard timeouts on wait surfaces * fix(agents): require timeout attribution for provider errors * fix(sdk): require timeout attribution for provider errors * fix(scripts): preserve changelog parse cause
* fix(agents): extend terminal outcome projections * fix(agents): align terminal outcome follow-up checks * fix(agents): satisfy terminal outcome mapper lint * test(scripts): isolate websocket open timers * test(security): drive sandbox browser timeout timers * test(scripts): drive gh-read timeout timers * test(agents): isolate code mode timers * fix(agents): preserve hard timeouts on wait surfaces * fix(agents): require timeout attribution for provider errors * fix(sdk): require timeout attribution for provider errors * fix(scripts): preserve changelog parse cause
Summary
Real behavior proof
Behavior addressed: provider-attributed agent run timeouts remain timeouts across task, session, SDK wait, and SDK event projections.
Real environment tested: local source checkout on Node/pnpm.
Exact steps or command run after this patch: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose
Evidence after fix: 4 Vitest shards passed: unit-fast, unit, gateway, and tasks.
Observed result after fix: provider timeout metadata maps to timed_out/timeout while pending retry-grace and successful provider-started snapshots remain non-terminal/completed.
What was not tested: full repository CI locally; GitHub CI will cover the broader matrix.
Verification