Skip to content

Preserve agent hard timeout attribution#87902

Closed
joshavant wants to merge 14 commits into
mainfrom
fix/subagent-run-timeout-deadline
Closed

Preserve agent hard timeout attribution#87902
joshavant wants to merge 14 commits into
mainfrom
fix/subagent-run-timeout-deadline

Conversation

@joshavant

@joshavant joshavant commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #87444 by preserving the distinction between a real run timeout and ordinary abort/cancel/error signals after an agent run crosses its configured hard deadline.

The change carries hard-timeout attribution through the Gateway wait/dedupe paths, subagent lifecycle reconciliation, task registry state, SDK normalization, OpenResponses projection, ACP relay, TUI projection, and persisted session recovery. It also keeps non-timeout aborts distinct, so user/RPC cancellation still normalizes as cancelled instead of timed out.

Related coverage: this also partially addresses #76962 by ensuring a terminal subagent timeout announcement is no longer held behind a still-active embedded child run. It does not claim to fully fix #76962's physical claude -p process cleanup, exec approval queue cancellation, or late direct-output leak; those remain separate proof/fix surfaces.

LoC churn: application code +1457/-248; non-application code, including tests and test helpers, +4784/-776.

Why This Shape

The reported failure was one symptom of a broader invariant: once a run has hit its hard timeout, later cleanup signals must not erase or reclassify that timeout. Several surfaces observe the same run at different times, so the fix is intentionally cross-path rather than limited to the original agent.wait symptom.

The highest-risk sibling behavior is cancellation. This PR adds coverage that cancelled/RPC-stopped runs remain cancelled while hard-timeout runs remain timed out.

The #76962 overlap is intentionally narrow: terminal timeout announcement now proceeds from the already-final timeout outcome instead of waiting for the embedded child process to settle. That removes one path where timeout delivery could be delayed by a still-active child, while leaving process termination and late-output suppression to the dedicated zombie-process cleanup path.

Practical Impact

This change makes timeout observable and sticky at the orchestration layer. A subagent that crosses its configured hard timeout should be reported as timed out across parent waits, Gateway/SDK/OpenResponses callers, task state, channel delivery, and recovered session state.

That means parent turns can unblock and progress sooner with the right terminal reason instead of waiting behind child cleanup or receiving an ambiguous pending, aborted, cancelled, or generic error state. More accurate logs are a side effect; the main behavior is that user-facing status and parent workflows no longer depend on the still-unwinding child process finishing cleanup first.

This does not introduce process cleanup enforcement. It does not forcibly kill stuck child processes, cancel exec approval queues, suppress every possible late direct-output leak, or guarantee physical claude -p process cleanup. The enforced contract here is the parent-facing run outcome, not OS/process reaping.

Verification

  • $autoreview: clean, zero accepted/actionable findings.
  • git diff --check: clean.
  • Targeted regression suite on AWS Crabbox run_3eff1d4287bd: 31 files passed, 153 tests passed, 909 skipped.
  • Live Gateway/API matrix on AWS Crabbox run_3eff1d4287bd: all 8 release-candidate scenarios passed.
  • Package/channel proof on AWS Crabbox run_3eff1d4287bd: built an npm tarball from this branch, installed that package in a Docker environment, and passed the Telegram round-trip channel test using a Convex-leased telegram credential.
  • Related issue probe on AWS Crabbox run_68e8283ae722: latest origin/main reproduced the subagent timeout leaves zombie claude -p → late output emitted directly to user transport (bypasses parent agent) #76962 sub-symptom by calling the embedded-child wait on a terminal timeout; this branch passed the same before/after probe by announcing without that wait.

The live matrix covered:

Real behavior proof

Behavior addressed: Subagent hard run timeouts now remain terminal timeouts across Gateway wait/dedupe, task registry, SDK, OpenResponses, persisted session reconstruction, and channel delivery, instead of being lost or misclassified after late abort/rejection/completion signals. The branch also prevents terminal timeout announcements from waiting behind an active embedded child run, partially addressing #76962.

Real environment tested: AWS Crabbox Linux c7a.8xlarge, provider aws, lease cbx_ab678695114e, slug harbor-crab, run run_3eff1d4287bd; related #76962 probe also ran on AWS Crabbox run_68e8283ae722.

Exact steps or command run after this patch: Ran autoreview, then ran a release-candidate proof matrix on AWS Crabbox against this branch. The matrix started a real Gateway with live OpenAI model access, exercised the reported subagent timeout behavior and sibling API paths, built the branch into an npm package tarball, installed that tarball in Docker, and verified Telegram channel round-trip delivery. A separate before/after Crabbox probe compared latest origin/main with this branch for the #76962 terminal-timeout announcement sub-symptom.

Evidence after fix: The targeted regression suite passed; the live Gateway/API matrix passed all 8 scenarios; the package-installed Telegram Docker E2E passed; and the #76962-related probe passed on this branch after reproducing on latest origin/main.

Observed result after fix: The reported path returned a terminal timeout with no active child runs remaining, delivered completion state, agent.wait status timeout, and timeout outcome preserved after late aborted/rejection signals. For the #76962-related probe, terminal timeout announcement proceeded without waiting for the still-active embedded child run.

What was not tested: The final live matrix used openai/gpt-5.4-mini because earlier gpt-5.5 attempts hit OpenAI TPM limits. The #76962 probe did not prove physical claude -p process termination, exec approval queue cancellation, or late direct-output suppression.

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime agents Agent runtime and tooling size: XL maintainer Maintainer-authored PR labels May 29, 2026
@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 2:49 PM ET / 18:49 UTC.

Summary
The branch preserves hard-timeout attribution across Gateway wait/dedupe, subagent lifecycle, task state, SDK/OpenResponses, ACP/TUI projections, channel delivery, and persisted recovery paths while keeping ordinary cancellation distinct.

PR surface: Source +1256, Tests +3961. Total +5217 across 66 files.

Reproducibility: yes. The PR body gives before/after Crabbox proof for the reported timeout path, and source inspection confirms current main lacks the PR's hard-timeout attribution guard and blocked-timeout preservation path.

Review metrics: 1 noteworthy metric.

  • Externally observed terminal projections: 7 projections changed. Gateway wait, SDK, OpenResponses, ACP, TUI, task registry, and subagent/channel delivery can expose different terminal status values after merge, so compatibility review matters beyond green tests.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦀 challenger crab
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P2] Merging intentionally changes externally observed terminal status semantics across Gateway wait, SDK events/status, OpenResponses, ACP/TUI, task state, persisted recovery, and channel delivery; existing clients may now see timeout/timed_out/failed where they previously saw cancellation, error, pending, or completed.
  • [P2] The branch deliberately allows terminal timeout announcement to proceed before all embedded child cleanup is proven complete; physical process cleanup, exec approval queue cancellation, and late direct-output suppression remain in subagent timeout leaves zombie claude -p → late output emitted directly to user transport (bypasses parent agent) #76962.
  • [P2] The protected maintainer label means green checks and strong proof do not replace explicit maintainer acceptance of the timeout-vs-cancellation contract.

Maintainer options:

  1. Accept the new timeout contract (recommended)
    A maintainer can land this if they explicitly accept timeout as the sticky terminal outcome across Gateway, SDK/OpenResponses, UI, task, session, and channel projections while leaving process cleanup follow-up separate.
  2. Narrow the merge surface
    If the status-contract change is too broad, ask the author to reduce the branch to the minimum agent.wait/subagent timeout repair and move OpenResponses, TUI, and delivery projections into follow-up PRs.
  3. Pause for full zombie cleanup
    If maintainers want timeout attribution and physical child-process cleanup solved together, hold this PR until the remaining work in subagent timeout leaves zombie claude -p → late output emitted directly to user transport (bypasses parent agent) #76962 is included or explicitly split.

Next step before merge

  • [P2] Manual maintainer review is the remaining action because the protected label and cross-surface terminal-status semantics require human acceptance, not a narrow automated repair.

Security
Cleared: The diff changes runtime/status handling and tests, with no dependency, workflow, lockfile, secret-handling, or supply-chain surface that introduces a concrete security concern.

Review details

Best possible solution:

Land only after a maintainer explicitly accepts the cross-surface timeout contract and confirms the remaining physical cleanup work stays tracked separately in #76962.

Do we have a high-confidence way to reproduce the issue?

Yes. The PR body gives before/after Crabbox proof for the reported timeout path, and source inspection confirms current main lacks the PR's hard-timeout attribution guard and blocked-timeout preservation path.

Is this the best way to solve the issue?

Yes, with maintainer sign-off. The cross-path implementation matches the bug shape better than a single agent.wait patch, but the externally observed terminal-status contract needs explicit maintainer acceptance before merge.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 4e2d9b0b760d.

Label changes

Label changes:

  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦀 challenger crab and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The PR targets a broken agent/subagent timeout workflow that can block parent progress and channel delivery for real users.
  • merge-risk: 🚨 compatibility: Existing clients and integrations may observe timeout/timed_out/failed instead of cancellation, generic error, pending, or completed after upgrade.
  • merge-risk: 🚨 session-state: The diff changes persisted subagent recovery, task state, and terminal lifecycle reconciliation for runs that may still have embedded cleanup in progress.
  • merge-risk: 🚨 message-delivery: Subagent timeout announcements and channel delivery can now proceed from the terminal timeout outcome before embedded child cleanup fully settles.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦀 challenger crab and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes structured after-fix real behavior proof from AWS Crabbox, live Gateway/API runs, Docker package installation, Telegram round-trip delivery, and a before/after related-issue probe.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-fix real behavior proof from AWS Crabbox, live Gateway/API runs, Docker package installation, Telegram round-trip delivery, and a before/after related-issue probe.
Evidence reviewed

PR surface:

Source +1256, Tests +3961. Total +5217 across 66 files.

View PR surface stats
Area Files Added Removed Net
Source 37 1510 254 +1256
Tests 29 4730 769 +3961
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 66 6240 1023 +5217

What I checked:

  • Repository policy applied: Root AGENTS.md and the relevant scoped guides for src/agents, embedded runner, src/gateway/server-methods, and src/tui were read; they frame provider routing, session state, delivery, and externally observed terminal status as compatibility-sensitive review surfaces. (AGENTS.md:1, 4e2d9b0b760d)
  • Protected maintainer review state: The provided live PR context has the protected maintainer label plus existing compatibility, session-state, and message-delivery merge-risk labels, so this workflow must keep it open for explicit maintainer handling. (b16bf36df45c)
  • Hard-timeout attribution helper: The PR adds a shared helper that treats preflight, provider, and post_turn timeout phases as hard timeout attribution, giving sibling surfaces one contract to consume. (src/agents/run-timeout-attribution.ts:23, b16bf36df45c)
  • Gateway wait semantics changed: The PR records hard timeout lifecycle snapshots and lets agent.wait surface pending hard-timeout snapshots rather than losing them behind later cleanup signals. (src/gateway/server-methods/agent-job.ts:267, b16bf36df45c)
  • Subagent completion semantics changed: Subagent wait reconciliation now completes explicit hard timeouts as timeout outcomes with timeout metadata, while still allowing earlier successful completion evidence to correct provisional timeouts. (src/agents/subagent-registry-run-manager.ts:369, b16bf36df45c)
  • Announcement no longer waits behind terminal timeout: The announce flow treats an already-terminal timeout outcome as the contract and skips waiting for a still-active embedded child before delivery. (src/agents/subagent-announce.ts:272, b16bf36df45c)

Likely related people:

  • MonkeyLeeT: Authored the current-main subagent runTimeoutSeconds enforcement commit that this PR builds on for explicit timeout contracts. (role: recent area contributor; confidence: high; commits: 8a60f39221db; files: src/agents/subagent-registry.ts)
  • martingarramon: Recently changed agent-job timeout/error grace behavior and adjacent wait/dedupe surfaces central to this PR. (role: recent area contributor; confidence: high; commits: 039fcbaa4c76; files: src/gateway/server-methods/agent-job.ts, src/gateway/server-methods/agent-wait-dedupe.ts, src/agents/run-wait.ts)
  • steipete: Recent commits and merges cover subagent delivery state, aborted lifecycle handling, and gateway wait/test stabilization around the same terminal-state surface. (role: adjacent owner by history; confidence: high; commits: d73f3ac85d5f, fc4bd448b611, 8df01a8683de; files: src/agents/subagent-registry.ts, src/gateway/server-methods/agent-job.ts)
  • vincentkoc: Recently worked on agent wait timeout attribution, which overlaps the Gateway wait semantics changed here. (role: recent area contributor; confidence: medium; commits: 77c3bdb3ca0a; files: src/gateway/server-methods/agent-job.ts)
  • joshavant: Beyond authoring this PR, prior merged work touched subagent targeting behavior in the same agent/subagent area. (role: current proposer and prior adjacent contributor; confidence: medium; commits: 00da318350e2; files: src/agents/subagent-spawn.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 29, 2026
@joshavant joshavant force-pushed the fix/subagent-run-timeout-deadline branch from 963d1cb to 9d05f97 Compare May 29, 2026 06:06
@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 29, 2026
@joshavant joshavant force-pushed the fix/subagent-run-timeout-deadline branch from 0642411 to b16bf36 Compare May 29, 2026 18:42
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 29, 2026
@steipete steipete self-assigned this May 29, 2026
@steipete

Copy link
Copy Markdown
Contributor

Closing this because the replacement fix landed in #88162 as acb0e9c.

That change keeps the core idea from this PR, but moves terminal run outcome precedence into one shared helper and wires the projections through that instead of carrying the larger repeated-surface patch. Thanks for finding the timeout/cancel/completion race across the surfaces.

@steipete steipete closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: XL status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

2 participants