fix(agents): enforce subagent run timeouts#87489
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 11:36 PM ET / 03:36 UTC. Summary PR surface: Source +250, Tests +915. Total +1165 across 6 files. Reproducibility: yes. Current main's source path retries plain Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this after exact-head CI passes and maintainers accept the hard-deadline subagent semantics, while keeping manual kill/reap behavior tracked separately in #87445. Do we have a high-confidence way to reproduce the issue? Yes. Current main's source path retries plain Is this the best way to solve the issue? Yes. The patch fixes the existing documented AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against cb085ec5f1f5. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +250, Tests +915. Total +1165 across 6 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
f8233cf to
26ce8bc
Compare
|
ClawSweeper PR egg: ✨ hatched 🥚 common Cosmic Branchling. Rarity: 🥚 common. Trait: finds missing screenshots. DetailsShare on X: post this hatch
About:
|
26ce8bc to
057f3d4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 057f3d4685
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
057f3d4 to
4177a29
Compare
4177a29 to
3e8fe49
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e8fe49490
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
3e8fe49 to
1ae2de0
Compare
|
@clawsweeper re-review Addressed the deadline-before-wait finding on the latest head: restored/retried explicit |
74971aa to
50f00ea
Compare
50f00ea to
8c09f04
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c09f041f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return isFreshForRun(sessionEntry, params.notBeforeMs) | ||
| ? freshSessionStartedAt(sessionEntry, params.notBeforeMs) | ||
| : undefined; |
There was a problem hiding this comment.
Honor running session starts for run deadlines
When agent.wait returns a plain timeout after the child has only emitted a lifecycle start, the persisted session entry is still status: "running" with startedAt but no terminal endedAt/updatedAt after the run. This new helper therefore always returns undefined because isFreshForRun requires a terminal timestamp, so the run-manager fallback computes the explicit timeout from the registry createdAt instead of the observed child start and can time out/announce runs early whenever the child actually starts later than registration.
Useful? React with 👍 / 👎.
|
Behavior addressed: explicit subagent runTimeoutSeconds is now treated as the terminal deadline across agent.wait, lifecycle, session-store reconciliation, sweeper recovery, pending delivery retry, and in-flight cleanup paths. Successful waits without startedAt now use the fresh session-store start before deadline checks, so restored/racy completions do not get misclassified. Real environment tested: local focused Vitest/type/lint proof on the rebased branch, Codex autoreview, and GitHub PR CI on head 8c09f04. Exact steps or command run after this patch:
Evidence after fix: focused local tests passed, check:test-types passed, targeted oxlint passed, diff check passed, autoreview clean with no accepted/actionable findings, and GitHub checks passed including CI, CodeQL Critical Quality, OpenGrep, security high, dependency guard, workflow sanity, and real behavior proof. Observed result after fix: explicit run timeout outcomes are stable at the configured deadline, late success/error sources cannot overwrite already-published deadline timeouts, restored successful waits use the session-store start when agent.wait lacks one, and cron schedule identity matches runtime stagger normalization. What was not tested: no live provider/subagent execution was run; coverage is unit/CI/autoreview plus GitHub PR gates. |
Summary
sessions_spawnacceptedrunTimeoutSeconds, but a plain non-terminalagent.waittimeout could keep the subagent run active after the stored deadline.Linked context
Which issue does this close?
Closes #87444
Which issues, PRs, or discussions are related?
Related #87445
Was this requested by a maintainer or owner?
Requested through the maintainer workflow for #87444.
Real behavior proof (required for external PRs)
Behavior addressed:
sessions_spawnchild runs with explicitrunTimeoutSecondsno longer remain active after the stored deadline whenagent.waitonly reports a non-terminal timeout; restored/retried waits now use only the remaining stored deadline budget before completing the timeout path.Real environment tested: local OpenClaw source worktree running a real gateway/server and real
agentRPC against a loopback OpenAI Responses-compatible provider harness. The parent model turn calledsessions_spawn, thensessions_yield; the child provider request was held open pastrunTimeoutSeconds: 1.Exact steps or command run after this patch:
node --import tsx /private/tmp/openclaw-issue-87444-proof.mts;node scripts/run-vitest.mjs src/agents/subagent-registry.test.ts src/agents/subagent-announce.timeout.test.ts;node scripts/run-vitest.mjs src/agents/openclaw-tools.subagents.sessions-spawn.lifecycle.test.ts src/gateway/server-methods/agent.test.ts;git diff --check. After the latest deadline-before-wait amendment, rerannode scripts/run-vitest.mjs src/agents/subagent-registry.test.ts src/agents/subagent-announce.timeout.test.ts,node scripts/run-vitest.mjs src/agents/openclaw-tools.subagents.sessions-spawn.lifecycle.test.ts src/gateway/server-methods/agent.test.ts, andgit diff --check.Evidence after fix: copied terminal output from the real gateway proof run:
Observed result after fix: the real gateway run accepted
sessions_spawnwithrunTimeoutSeconds: 1, the held-open child recordedoutcomeStatus: "timeout"at exactlyelapsedMs: 1000,activeChildrenAfterTimeoutwas0, and the parent continuation produced the redacted wake tokenISSUE_87444_PARENT_WOKE_C7A617.What was not tested: external hosted provider credentials, external channel delivery, GitHub Actions CI completion, and the separate manual kill/reap path tracked by #87445.
Proof limitations or environment constraints: the provider endpoint was a loopback deterministic OpenAI Responses-compatible harness so no API keys or external messages were used; the exercised OpenClaw path was still the real gateway, real agent RPC, real
sessions_spawn, realsessions_yield, registry timeout completion, and parent wake dispatch.Before evidence (optional but encouraged): before the original registry change, the focused regression kept
completedRun?.endedAtundefined after a plainagent.waittimeout. Before the deadline-before-wait amendment, a restored run 59s into a 60s timeout still passed a full 60s budget toagent.wait.Tests and validation
Which commands did you run?
node --import tsx /private/tmp/openclaw-issue-87444-proof.mtsnode scripts/run-vitest.mjs src/agents/subagent-registry.test.ts src/agents/subagent-announce.timeout.test.tsnode scripts/run-vitest.mjs src/agents/openclaw-tools.subagents.sessions-spawn.lifecycle.test.ts src/gateway/server-methods/agent.test.tsgit diff --checkWhat regression coverage was added or updated?
Added registry coverage for non-terminal
agent.waittimeouts at and near the stored explicit run deadline, restored-run coverage proving a resumed run near its deadline waits only for the remaining budget, and announce coverage provingcleanup: "delete"timeout cleanup remains retryable while the embedded child request is still active.What failed before this fix, if known?
The new restored-run regression failed before the latest production change because a run restored 59s into a 60s timeout still passed
timeoutMs: 60000toagent.waitinstead oftimeoutMs: 1000. ClawSweeper also found that the previous announce branch could mark delete cleanup complete while deletion was skipped.If no test was added, why not?
N/A; focused regression tests were added.
Risk checklist
Did user-visible behavior change? (
Yes/No)Yes.
Did config, environment, or migration behavior change? (
Yes/No)No.
Did security, auth, secrets, network, or tool execution behavior change? (
Yes/No)No.
What is the highest-risk area?
Prematurely marking a still-running child as timed out if the gateway wait path returns a plain timeout before the configured child deadline, or finalizing delete cleanup while the child is still active.
How is that risk mitigated?
Fresh subagent waits keep the existing initial wait budget. Resumed/retried waits are capped to the stored deadline, session-store reconciliation still wins before synthetic timeout completion, and focused tests cover pre-deadline retry, boundary timeout completion, restored near-deadline wait capping, and delete-cleanup retryability.
Current review state
What is the next action?
Wait for GitHub CI and ClawSweeper re-review on the updated head/body.
What is still waiting on author, maintainer, CI, or external proof?
GitHub CI has not completed on the latest forced-pushed head; external hosted provider/channel proof was not run.
Which bot or reviewer comments were addressed?
Addressed ClawSweeper’s deadline-before-wait finding, real-behavior proof request, and cleanup-delete finding.