Skip to content

fix(agents): extend terminal outcome projections#88162

Merged
steipete merged 11 commits into
mainfrom
steipete/followup-agent-terminal-surfaces
May 30, 2026
Merged

fix(agents): extend terminal outcome projections#88162
steipete merged 11 commits into
mainfrom
steipete/followup-agent-terminal-surfaces

Conversation

@steipete

Copy link
Copy Markdown
Contributor

Summary

  • Routes task and session lifecycle terminal projections through the shared agent-run terminal outcome helper.
  • Extends timeout attribution precedence so provider timeout metadata wins over rpc/cancel/error/blocked surfaces without turning plain provider-started success into timeout.
  • Aligns SDK wait and normalized event surfaces with the same timeout/cancel precedence.

Real behavior proof

Behavior addressed: provider-attributed agent run timeouts remain timeouts across task, session, SDK wait, and SDK event projections.
Real environment tested: local source checkout on Node/pnpm.
Exact steps or command run after this patch: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose
Evidence after fix: 4 Vitest shards passed: unit-fast, unit, gateway, and tasks.
Observed result after fix: provider timeout metadata maps to timed_out/timeout while pending retry-grace and successful provider-started snapshots remain non-terminal/completed.
What was not tested: full repository CI locally; GitHub CI will cover the broader matrix.

Verification

  • pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose
  • node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
  • pnpm --filter @openclaw/sdk build
  • git diff --check
  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels May 29, 2026
@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 8:46 PM ET / 00:46 UTC.

Summary
The branch routes agent terminal timeout/cancel precedence through the shared helper across task records, session lifecycle persistence, gateway wait snapshots, SDK wait results, SDK normalized events, and related regression/timer-isolation tests.

PR surface: Source +152, Tests +628. Total +780 across 18 files.

Reproducibility: yes. from source inspection, though I did not execute it: current main has separate SDK, task, and session projection logic that does not apply the shared timeout/cancel precedence everywhere. The PR adds focused regression tests for those projection paths.

Review metrics: 1 noteworthy metric.

  • Terminal projection surfaces: 4 surfaces changed. SDK wait/events plus task and session persistence now share timeout/cancel precedence, which changes public or persisted status outputs before merge.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Add redacted real behavior proof for the provider-attributed timeout projection path outside tests.
  • [P1] Have a maintainer explicitly accept the SDK and persisted session/task status compatibility impact before merge.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body provides after-fix test and build commands only; it still needs redacted real runtime output, terminal proof, logs, screenshot, recording, or linked artifact showing the projection behavior outside tests. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The PR intentionally changes public SDK wait/event statuses and persisted task/session terminal states, so maintainers should accept the compatibility and session-state semantics before merge.
  • [P1] Contributor proof is still mock/test-only; an external PR should show a real after-fix run or redacted runtime output for the changed projection path before merging.

Maintainer options:

  1. Prove the real projection path first (recommended)
    Have the contributor or a maintainer add redacted terminal output, runtime logs, a linked artifact, or a screenshot/recording that shows a provider-attributed timeout becoming timeout/timed_out across the affected runtime surface outside tests.
  2. Accept the status semantics deliberately
    Maintainers can choose to merge after explicitly accepting that public SDK statuses and persisted task/session terminal states will change for these timeout/cancel precedence cases.

Next step before merge

  • [P1] The remaining blockers are contributor real-behavior proof and maintainer acceptance of compatibility/session-state semantics; there is no narrow automated code repair left from this review.

Security
Cleared: No security or supply-chain concern found; the diff changes runtime status projection logic and tests without dependencies, workflows, secrets, lockfiles, or package resolution changes.

Review details

Best possible solution:

Land the shared-helper projection fix only after maintainer approval of the SDK/session-state semantics and real after-fix behavior proof showing the timeout attribution path outside unit tests.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, though I did not execute it: current main has separate SDK, task, and session projection logic that does not apply the shared timeout/cancel precedence everywhere. The PR adds focused regression tests for those projection paths.

Is this the best way to solve the issue?

Mostly yes: routing the projections through the shared terminal outcome helper matches repository policy and removes drift. The remaining gap is merge proof, not a line-level implementation defect I can point to in the final diff.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a341ae27ece9.

Label changes

Label changes:

  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority agent/gateway bug fix with limited but real SDK and persisted-state blast radius.
  • merge-risk: 🚨 compatibility: The PR changes public SDK wait/event status classification for timeout/cancel/provider-error combinations.
  • merge-risk: 🚨 session-state: The PR changes persisted task and session terminal status projection, so stale or wrong precedence could mis-record run state.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides after-fix test and build commands only; it still needs redacted real runtime output, terminal proof, logs, screenshot, recording, or linked artifact showing the projection behavior outside tests. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +152, Tests +628. Total +780 across 18 files.

View PR surface stats
Area Files Added Removed Net
Source 7 191 39 +152
Tests 11 640 12 +628
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 18 831 51 +780

Acceptance criteria:

  • [P2] Contributor or maintainer adds redacted real runtime proof for provider-attributed timeout projection outside tests.
  • [P1] Review the PR's existing focused test commands before merge: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/tasks/task-registry.test.ts packages/sdk/src/index.test.ts src/gateway/session-lifecycle-state.test.ts -- --reporter=verbose.
  • [P1] Run the normal relevant CI/check lane for agent, gateway, SDK, and task surfaces before landing.

What I checked:

  • Root and scoped policy read: Root AGENTS.md was read fully, and scoped guides for src/agents, src/gateway, src/gateway/server-methods, scripts, and test were read; the root policy explicitly calls agent terminal state a shared-helper surface and treats session state / SDK status changes as compatibility-sensitive. (AGENTS.md:53, a341ae27ece9)
  • Exact merge diff inspected: The PR merge ref changes 18 files with focused agent/gateway/task/SDK/test updates, not the stale-head workboard churn visible in a direct head-to-main diff. (a95855889f15)
  • Central helper behavior inspected: Current main's terminal helper is the canonical timeout/cancel precedence point; the PR adjusts hard-timeout ordering so explicit timeoutPhase can outrank blocked/cancel metadata while providerStarted alone no longer turns non-timeout errors into timeouts. (src/agents/agent-run-terminal-outcome.ts:74, a95855889f15)
  • Projection surfaces inspected: The PR applies the shared terminal outcome mapping to SDK wait/event normalization, agent wait normalization, gateway dedupe snapshots, persisted session lifecycle state, and task lifecycle updates with regression coverage for provider timeout and provider non-timeout errors. (src/tasks/task-registry.ts:369, a95855889f15)
  • Real behavior proof still test-only: The PR body's Real behavior proof section lists local Vitest shards, tsgo, SDK build, diff check, and autoreview, but no redacted runtime logs, terminal output artifact, screenshot, recording, or linked artifact from a real after-fix run outside tests. (ae2c0cd47424)
  • Feature history: The current shared helper and gateway snapshot normalization were introduced by the recent terminal-precedence commit, which makes this PR a direct follow-up on the same behavior path. (src/agents/agent-run-terminal-outcome.ts:74, b1e5c9d7fa16)

Likely related people:

  • steipete: Authored the current main terminal outcome helper centralization and the surrounding SDK/session/task projection code that this PR follows up on. (role: recent area contributor; confidence: high; commits: b1e5c9d7fa16, 44e31f7c6a; files: src/agents/agent-run-terminal-outcome.ts, src/gateway/server-methods/agent-wait-dedupe.ts, src/gateway/session-lifecycle-state.ts)
  • keshavbotagent: Recent history search shows adjacent provider timeout/fallback work in the agent paths, but the direct terminal projection ownership trail is weaker than the helper author trail. (role: adjacent recent contributor; confidence: low; commits: 5f89fbe669; files: src/agents)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2e6ecb6f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

input.status === "timeout" &&
(isHardAgentRunTimeoutPhase(timeoutPhase) || providerStarted === true);
isHardAgentRunTimeoutPhase(timeoutPhase) ||
((input.status === "timeout" || input.status === "error") && providerStarted === true);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require timeout attribution before marking errors as timed out

When a lifecycle/error or agent.wait payload includes providerStarted: true for a non-timeout provider failure (for example an auth, rate-limit, or model error after the request reached the provider) but does not include a hard timeoutPhase, this branch now classifies the run as hard_timeout. providerStarted only says the provider request began, so this makes task/session/SDK projections report those failures as timeouts instead of failures; the hard-timeout path should require an actual timeout status/phase rather than status === "error" plus providerStarted.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 29, 2026
@steipete steipete requested a review from a team as a code owner May 29, 2026 23:57

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 790dff7dde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/sdk/src/normalize.ts Outdated
Comment on lines +89 to +90
if (hasHardTimeoutMetadata(data, true)) {
return "run.timed_out";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require timeout attribution before timed-out events

Fresh evidence: the SDK event normalizer duplicates the providerStarted-only timeout check here, so fixing the shared terminal helper would not correct normalized events. For lifecycle error events from a provider request that started but failed for a non-timeout reason (auth, rate limit, model error, etc.), passing true makes hasHardTimeoutMetadata return true solely from providerStarted === true, causing subscribers to see run.timed_out instead of run.failed; this branch should require an actual timeout status/phase rather than provider-started metadata alone.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 30, 2026
@steipete steipete force-pushed the steipete/followup-agent-terminal-surfaces branch from 2622b0b to ae2c0cd Compare May 30, 2026 00:41
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 30, 2026
@steipete

Copy link
Copy Markdown
Contributor Author

Verification before merge:

  • Local focused tests: pnpm test src/agents/agent-run-terminal-outcome.test.ts src/agents/run-wait.test.ts src/gateway/server-methods/agent-wait-dedupe.test.ts src/tasks/task-registry.test.ts src/gateway/session-lifecycle-state.test.ts packages/sdk/src/index.test.ts -- --reporter=verbose
  • Local timer follow-up tests: pnpm test src/security/audit-sandbox-browser.test.ts test/scripts/e2e-websocket-open.test.ts test/scripts/mcp-websocket-open.test.ts test/scripts/gh-read.test.ts src/agents/code-mode.test.ts -- --reporter=verbose
  • SDK/core proof: pnpm --filter @openclaw/sdk build; node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
  • Lint/diff proof: pnpm exec oxlint packages/sdk/src/client.ts packages/sdk/src/normalize.ts packages/sdk/src/index.test.ts src/agents/agent-run-terminal-outcome.ts src/agents/agent-run-terminal-outcome.test.ts; git diff --check
  • Auto-review: clean, no accepted/actionable findings after follow-up fixes.
  • CI: reran failed checks-node-core-runtime-infra-state; workflow run 26669530907, job 78610829050, passed. PR merge state is CLEAN; only remaining non-success rollup entries are stale cancelled superseded auto-response and Real behavior proof runs, with fresh passing replacements on the same head SHA.

@steipete steipete force-pushed the steipete/followup-agent-terminal-surfaces branch from ae2c0cd to 13a24e9 Compare May 30, 2026 01:02
@openclaw-barnacle openclaw-barnacle Bot added the scripts Repository scripts label May 30, 2026
@steipete

Copy link
Copy Markdown
Contributor Author

Final rebased-head verification:

  • Rebased onto current origin/main after GitHub rejected merge because the base branch moved.
  • Local proof after rebase: pnpm test packages/sdk/src/index.test.ts src/agents/agent-run-terminal-outcome.test.ts -- --reporter=verbose; pnpm exec oxlint packages/sdk/src/client.ts packages/sdk/src/normalize.ts packages/sdk/src/index.test.ts src/agents/agent-run-terminal-outcome.ts src/agents/agent-run-terminal-outcome.test.ts && git diff --check; node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo; pnpm lint --threads=8.
  • Fixed the new origin/main lint blocker in scripts/package-changelog.mjs by preserving the caught error as cause.
  • Current PR head f5fe18b6a93dc5f4b40e061e1c1abf2062a9389f is CLEAN; current-head checks passed. Remaining cancelled rollup entry is a stale superseded Real behavior proof run from the prior pushed head.

@steipete steipete merged commit acb0e9c into main May 30, 2026
111 of 112 checks passed
@steipete steipete deleted the steipete/followup-agent-terminal-surfaces branch May 30, 2026 01:13
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 30, 2026
* fix(agents): extend terminal outcome projections

* fix(agents): align terminal outcome follow-up checks

* fix(agents): satisfy terminal outcome mapper lint

* test(scripts): isolate websocket open timers

* test(security): drive sandbox browser timeout timers

* test(scripts): drive gh-read timeout timers

* test(agents): isolate code mode timers

* fix(agents): preserve hard timeouts on wait surfaces

* fix(agents): require timeout attribution for provider errors

* fix(sdk): require timeout attribution for provider errors

* fix(scripts): preserve changelog parse cause
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix(agents): extend terminal outcome projections

* fix(agents): align terminal outcome follow-up checks

* fix(agents): satisfy terminal outcome mapper lint

* test(scripts): isolate websocket open timers

* test(security): drive sandbox browser timeout timers

* test(scripts): drive gh-read timeout timers

* test(agents): isolate code mode timers

* fix(agents): preserve hard timeouts on wait surfaces

* fix(agents): require timeout attribution for provider errors

* fix(sdk): require timeout attribution for provider errors

* fix(scripts): preserve changelog parse cause
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(agents): extend terminal outcome projections

* fix(agents): align terminal outcome follow-up checks

* fix(agents): satisfy terminal outcome mapper lint

* test(scripts): isolate websocket open timers

* test(security): drive sandbox browser timeout timers

* test(scripts): drive gh-read timeout timers

* test(agents): isolate code mode timers

* fix(agents): preserve hard timeouts on wait surfaces

* fix(agents): require timeout attribution for provider errors

* fix(sdk): require timeout attribution for provider errors

* fix(scripts): preserve changelog parse cause
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. scripts Repository scripts size: L status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant