fix(agents): extend terminal outcome projections by steipete · Pull Request #88162 · openclaw/openclaw

steipete · 2026-05-29T23:27:45Z

Summary

Routes task and session lifecycle terminal projections through the shared agent-run terminal outcome helper.
Extends timeout attribution precedence so provider timeout metadata wins over rpc/cancel/error/blocked surfaces without turning plain provider-started success into timeout.
Aligns SDK wait and normalized event surfaces with the same timeout/cancel precedence.

Real behavior proof

Behavior addressed: provider-attributed agent run timeouts remain timeouts across task, session, SDK wait, and SDK event projections.
Real environment tested: local source checkout on Node/pnpm.
Exact steps or command run after this patch: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose
Evidence after fix: 4 Vitest shards passed: unit-fast, unit, gateway, and tasks.
Observed result after fix: provider timeout metadata maps to timed_out/timeout while pending retry-grace and successful provider-started snapshots remain non-terminal/completed.
What was not tested: full repository CI locally; GitHub CI will cover the broader matrix.

Verification

pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose
node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
pnpm --filter @openclaw/sdk build
git diff --check
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main

clawsweeper · 2026-05-29T23:29:34Z

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 8:46 PM ET / 00:46 UTC.

Summary
The branch routes agent terminal timeout/cancel precedence through the shared helper across task records, session lifecycle persistence, gateway wait snapshots, SDK wait results, SDK normalized events, and related regression/timer-isolation tests.

PR surface: Source +152, Tests +628. Total +780 across 18 files.

Reproducibility: yes. from source inspection, though I did not execute it: current main has separate SDK, task, and session projection logic that does not apply the shared timeout/cancel precedence everywhere. The PR adds focused regression tests for those projection paths.

Review metrics: 1 noteworthy metric.

Terminal projection surfaces: 4 surfaces changed. SDK wait/events plus task and session persistence now share timeout/cancel precedence, which changes public or persisted status outputs before merge.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Add redacted real behavior proof for the provider-attributed timeout projection path outside tests.
[P1] Have a maintainer explicitly accept the SDK and persisted session/task status compatibility impact before merge.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR body provides after-fix test and build commands only; it still needs redacted real runtime output, terminal proof, logs, screenshot, recording, or linked artifact showing the projection behavior outside tests. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

[P1] The PR intentionally changes public SDK wait/event statuses and persisted task/session terminal states, so maintainers should accept the compatibility and session-state semantics before merge.
[P1] Contributor proof is still mock/test-only; an external PR should show a real after-fix run or redacted runtime output for the changed projection path before merging.

Maintainer options:

Prove the real projection path first (recommended)
Have the contributor or a maintainer add redacted terminal output, runtime logs, a linked artifact, or a screenshot/recording that shows a provider-attributed timeout becoming timeout/timed_out across the affected runtime surface outside tests.
Accept the status semantics deliberately
Maintainers can choose to merge after explicitly accepting that public SDK statuses and persisted task/session terminal states will change for these timeout/cancel precedence cases.

Next step before merge

[P1] The remaining blockers are contributor real-behavior proof and maintainer acceptance of compatibility/session-state semantics; there is no narrow automated code repair left from this review.

Security
Cleared: No security or supply-chain concern found; the diff changes runtime status projection logic and tests without dependencies, workflows, secrets, lockfiles, or package resolution changes.

Review details

Best possible solution:

Land the shared-helper projection fix only after maintainer approval of the SDK/session-state semantics and real after-fix behavior proof showing the timeout attribution path outside unit tests.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, though I did not execute it: current main has separate SDK, task, and session projection logic that does not apply the shared timeout/cancel precedence everywhere. The PR adds focused regression tests for those projection paths.

Is this the best way to solve the issue?

Mostly yes: routing the projections through the shared terminal outcome helper matches repository policy and removes drift. The remaining gap is merge proof, not a line-level implementation defect I can point to in the final diff.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a341ae27ece9.

Label changes

Label changes:

add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

P2: This is a normal-priority agent/gateway bug fix with limited but real SDK and persisted-state blast radius.
merge-risk: 🚨 compatibility: The PR changes public SDK wait/event status classification for timeout/cancel/provider-error combinations.
merge-risk: 🚨 session-state: The PR changes persisted task and session terminal status projection, so stale or wrong precedence could mis-record run state.
rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides after-fix test and build commands only; it still needs redacted real runtime output, terminal proof, logs, screenshot, recording, or linked artifact showing the projection behavior outside tests. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +152, Tests +628. Total +780 across 18 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	7	191	39	+152
Tests	11	640	12	+628
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	18	831	51	+780

Acceptance criteria:

[P2] Contributor or maintainer adds redacted real runtime proof for provider-attributed timeout projection outside tests.
[P1] Review the PR's existing focused test commands before merge: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose.
[P1] Run the normal relevant CI/check lane for agent, gateway, SDK, and task surfaces before landing.

What I checked:

Root and scoped policy read: Root AGENTS.md was read fully, and scoped guides for src/agents, src/gateway, src/gateway/server-methods, scripts, and test were read; the root policy explicitly calls agent terminal state a shared-helper surface and treats session state / SDK status changes as compatibility-sensitive. (AGENTS.md:53, a341ae27ece9)
Exact merge diff inspected: The PR merge ref changes 18 files with focused agent/gateway/task/SDK/test updates, not the stale-head workboard churn visible in a direct head-to-main diff. (a95855889f15)
Central helper behavior inspected: Current main's terminal helper is the canonical timeout/cancel precedence point; the PR adjusts hard-timeout ordering so explicit timeoutPhase can outrank blocked/cancel metadata while providerStarted alone no longer turns non-timeout errors into timeouts. (src/agents/agent-run-terminal-outcome.ts:74, a95855889f15)
Projection surfaces inspected: The PR applies the shared terminal outcome mapping to SDK wait/event normalization, agent wait normalization, gateway dedupe snapshots, persisted session lifecycle state, and task lifecycle updates with regression coverage for provider timeout and provider non-timeout errors. (src/tasks/task-registry.ts:369, a95855889f15)
Real behavior proof still test-only: The PR body's Real behavior proof section lists local Vitest shards, tsgo, SDK build, diff check, and autoreview, but no redacted runtime logs, terminal output artifact, screenshot, recording, or linked artifact from a real after-fix run outside tests. (ae2c0cd47424)
Feature history: The current shared helper and gateway snapshot normalization were introduced by the recent terminal-precedence commit, which makes this PR a direct follow-up on the same behavior path. (src/agents/agent-run-terminal-outcome.ts:74, b1e5c9d7fa16)

Likely related people:

steipete: Authored the current main terminal outcome helper centralization and the surrounding SDK/session/task projection code that this PR follows up on. (role: recent area contributor; confidence: high; commits: b1e5c9d7fa16, 44e31f7c6a; files: src/agents/agent-run-terminal-outcome.ts, src/gateway/server-methods/agent-wait-dedupe.ts, src/gateway/session-lifecycle-state.ts)
keshavbotagent: Recent history search shows adjacent provider timeout/fallback work in the agent paths, but the direct terminal projection ownership trail is weaker than the helper author trail. (role: adjacent recent contributor; confidence: low; commits: 5f89fbe669; files: src/agents)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2e6ecb6f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T23:30:46Z

-    input.status === "timeout" &&
-    (isHardAgentRunTimeoutPhase(timeoutPhase) || providerStarted === true);
+    isHardAgentRunTimeoutPhase(timeoutPhase) ||
+    ((input.status === "timeout" || input.status === "error") && providerStarted === true);


Require timeout attribution before marking errors as timed out

When a lifecycle/error or agent.wait payload includes providerStarted: true for a non-timeout provider failure (for example an auth, rate-limit, or model error after the request reached the provider) but does not include a hard timeoutPhase, this branch now classifies the run as hard_timeout. providerStarted only says the provider request began, so this makes task/session/SDK projections report those failures as timeouts instead of failures; the hard-timeout path should require an actual timeout status/phase rather than status === "error" plus providerStarted.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 790dff7dde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-30T00:07:19Z

+      if (hasHardTimeoutMetadata(data, true)) {
+        return "run.timed_out";


Require timeout attribution before timed-out events

Fresh evidence: the SDK event normalizer duplicates the providerStarted-only timeout check here, so fixing the shared terminal helper would not correct normalized events. For lifecycle error events from a provider request that started but failed for a non-timeout reason (auth, rate limit, model error, etc.), passing true makes hasHardTimeoutMetadata return true solely from providerStarted === true, causing subscribers to see run.timed_out instead of run.failed; this branch should require an actual timeout status/phase rather than provider-started metadata alone.

Useful? React with 👍 / 👎.

steipete · 2026-05-30T01:02:19Z

Verification before merge:

Local focused tests: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/agents/run-wait.test.ts src/gateway/server-methods/agent-wait-dedupe.test.ts src/tasks/task-registry.test.ts src/gateway/session-lifecycle-state.test.ts packages/sdk/src/index.test.ts -- --reporter=verbose
Local timer follow-up tests: pnpm test src/security/audit-sandbox-browser.test.ts test/scripts/e2e-websocket-open.test.ts test/scripts/mcp-websocket-open.test.ts test/scripts/gh-read.test.ts src/agents/code-mode.test.ts -- --reporter=verbose
SDK/core proof: pnpm --filter @openclaw/sdk build; node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
Lint/diff proof: pnpm exec oxlint packages/sdk/src/client.ts packages/sdk/src/normalize.ts packages/sdk/src/index.test.ts src/agents/agent-run-terminal-outcome.ts src/agents/agent-run-terminal-outcome.test.ts; git diff --check
Auto-review: clean, no accepted/actionable findings after follow-up fixes.
CI: reran failed checks-node-core-runtime-infra-state; workflow run 26669530907, job 78610829050, passed. PR merge state is CLEAN; only remaining non-success rollup entries are stale cancelled superseded auto-response and Real behavior proof runs, with fresh passing replacements on the same head SHA.

steipete · 2026-05-30T01:12:55Z

Final rebased-head verification:

Rebased onto current origin/main after GitHub rejected merge because the base branch moved.
Local proof after rebase: pnpm test packages/sdk/src/index.test.ts src/agents/agent-run-terminal-outcome.test.ts -- --reporter=verbose; pnpm exec oxlint packages/sdk/src/client.ts packages/sdk/src/normalize.ts packages/sdk/src/index.test.ts src/agents/agent-run-terminal-outcome.ts src/agents/agent-run-terminal-outcome.test.ts && git diff --check; node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo; pnpm lint --threads=8.
Fixed the new origin/main lint blocker in scripts/package-changelog.mjs by preserving the caught error as cause.
Current PR head f5fe18b6a93dc5f4b40e061e1c1abf2062a9389f is CLEAN; current-head checks passed. Remaining cancelled rollup entry is a stale superseded Real behavior proof run from the prior pushed head.

* fix(agents): extend terminal outcome projections * fix(agents): align terminal outcome follow-up checks * fix(agents): satisfy terminal outcome mapper lint * test(scripts): isolate websocket open timers * test(security): drive sandbox browser timeout timers * test(scripts): drive gh-read timeout timers * test(agents): isolate code mode timers * fix(agents): preserve hard timeouts on wait surfaces * fix(agents): require timeout attribution for provider errors * fix(sdk): require timeout attribution for provider errors * fix(scripts): preserve changelog parse cause

openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels May 29, 2026

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

steipete requested a review from a team as a code owner May 29, 2026 23:57

chatgpt-codex-connector Bot reviewed May 30, 2026

View reviewed changes

steipete force-pushed the steipete/followup-agent-terminal-surfaces branch from 2622b0b to ae2c0cd Compare May 30, 2026 00:41

clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 30, 2026

steipete added 9 commits May 30, 2026 02:02

fix(agents): extend terminal outcome projections

b6d7a5d

fix(agents): align terminal outcome follow-up checks

8857542

fix(agents): satisfy terminal outcome mapper lint

e3e37fd

test(scripts): isolate websocket open timers

0915bd5

test(security): drive sandbox browser timeout timers

283171a

test(scripts): drive gh-read timeout timers

4557bd6

test(agents): isolate code mode timers

669adf5

fix(agents): preserve hard timeouts on wait surfaces

67d3e1e

fix(agents): require timeout attribution for provider errors

684a931

fix(sdk): require timeout attribution for provider errors

13a24e9

steipete force-pushed the steipete/followup-agent-terminal-surfaces branch from ae2c0cd to 13a24e9 Compare May 30, 2026 01:02

fix(scripts): preserve changelog parse cause

f5fe18b

openclaw-barnacle Bot added the scripts Repository scripts label May 30, 2026

steipete merged commit acb0e9c into main May 30, 2026
111 of 112 checks passed

steipete deleted the steipete/followup-agent-terminal-surfaces branch May 30, 2026 01:13

Haderach-Ram mentioned this pull request May 30, 2026

Ecosystem Digest — 2026-05-30 Haderach-Ram/openclaw-radar#23

Open

github-actions Bot mentioned this pull request May 30, 2026

📡 Upstream Digest — 2026-05-30 02:26 UTC curtismercier/openclaw-mods#975

Open

steipete mentioned this pull request May 30, 2026

Preserve agent hard timeout attribution #87902

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): extend terminal outcome projections#88162

fix(agents): extend terminal outcome projections#88162
steipete merged 11 commits into
mainfrom
steipete/followup-agent-terminal-surfaces

steipete commented May 29, 2026

Uh oh!

clawsweeper Bot commented May 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Uh oh!

steipete commented May 30, 2026

Uh oh!

steipete commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (hasHardTimeoutMetadata(data, true)) {
		return "run.timed_out";

Uh oh!

Conversation

steipete commented May 29, 2026

Summary

Real behavior proof

Verification

Uh oh!

clawsweeper Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented May 30, 2026

Uh oh!

steipete commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 29, 2026 •

edited

Loading