Summary
A subagent wrapper can time out while the actual child CLI run has not started yet. In the observed case, the child started a few seconds after the parent wait timed out, completed successfully, and emitted a useful final answer, but the parent/requester remained in a stale waiting/timed-out state until manually poked.
This matches user-visible reports where Jarvis appears to be waiting on a subagent that is already done, or only notices completion after a follow-up prompt.
Evidence
Example label: pwa-backtest-v2-lumbergh-fixes
Shared run id: f69b3958-5826-4d11-ba2f-1c9dd3d7a811
tasks/runs.sqlite shows two rows for the same run id:
| task_id |
runtime |
agent_id |
status |
created |
started |
ended |
start lag |
run time |
eaad5c31-9f18-4df2-9323-53df32ba49dc |
subagent |
main |
timed_out |
2026-05-16 17:34:42 MDT |
2026-05-16 17:34:42 MDT |
2026-05-16 17:39:42 MDT |
0.8s |
299.4s |
48ae44e4-d071-4a18-9c82-e562baa33cda |
cli |
billy |
succeeded |
2026-05-16 17:34:42 MDT |
2026-05-16 17:39:45 MDT |
2026-05-16 17:40:32 MDT |
303.1s |
47.2s |
The parent wrapper timed out at 17:39:42, while the child CLI run did not start until 17:39:45 and then completed successfully at 17:40:32.
Relevant log/session lines:
logs/gateway.log:295605: 2026-05-16T17:39:42.225-06:00 [ws] ⇄ res ✓ agent.wait 300012ms ...
agents/billy/sessions/3ba0944a-194f-4237-b0f6-042c02a71fd4.jsonl:54: prompt timeout recorded at 2026-05-16T23:39:42.796Z
agents/billy/sessions/3ba0944a-194f-4237-b0f6-042c02a71fd4.jsonl:71: final assistant message at 2026-05-16T23:40:31.369Z
logs/gateway.log:295647: 2026-05-16T17:40:32.483-06:00 Both fixes are complete and verified...
The final child output included ## Task Complete and listed both completed fixes, so this was not a failed child run.
Actual Behavior
The parent subagent task is marked timed_out after ~300s wall-clock time from wrapper start.
The child runtime can remain queued or blocked for nearly the full parent timeout window, then start after the wrapper has already timed out.
When the child completes successfully after the parent timeout, the requester/Jarvis does not reliably reconcile that late success into the parent state or notify the requester.
Expected Behavior
The requester should not lose successful child results because of child start delay.
The parent subagent lifecycle should distinguish between:
- queued/not-started time
- active child runtime
- child completed after parent wait timeout
If a child completes after the parent wrapper timed out, OpenClaw should reconcile the parent task state and deliver a late completion/update to the requester, or at minimum surface a clear late_success_after_parent_timeout state.
Suggested Fixes
- Start the subagent execution timeout when the child runtime actually starts, or maintain separate queue/start timeout and active execution timeout budgets.
- If parent wait times out, keep a watcher/reconciler subscribed to the child run id so late
succeeded/failed states are propagated.
- Reconcile parent rows where
runtime='subagent' status='timed_out' and a child row with the same run_id later reaches succeeded.
- Emit a requester-visible notification when late child success arrives after the initial wait timed out.
- Update UI state so
Subagent: <label> does not remain as stale waiting when the child row is already terminal.
Environment
Observed locally on OpenClaw 2026.5.12 on macOS, with agents.defaults.subagents.runTimeoutSeconds=300.
Post-upgrade local patch status (2026-05-19)
Upgraded local install from 2026.5.12 to 2026.5.18 (50a2481).
The previous local subagent-timeout-reconciliation patch is no longer being carried as a local delta:
- Reapply script status:
unchanged on /opt/homebrew/lib/node_modules/openclaw/dist/subagent-registry-Bu5qGLSl.js.
- The reconciliation marker/behavior was already present in the upgraded bundle.
Post-upgrade smoke checks passed:
- Gateway and CLI version:
2026.5.18.
openclaw status --deep: Gateway reachable, Discord OK, Telegram OK, event loop healthy.
openclaw channels status --json: all enabled Discord accounts connected (main, farber, lumbergh, maverick, scout) and Telegram connected.
openclaw tasks list --status running --json: 0 running tasks.
Interpretation: this issue appears covered upstream in 2026.5.18; no local patch is being carried for this anymore.
Summary
A subagent wrapper can time out while the actual child CLI run has not started yet. In the observed case, the child started a few seconds after the parent wait timed out, completed successfully, and emitted a useful final answer, but the parent/requester remained in a stale waiting/timed-out state until manually poked.
This matches user-visible reports where Jarvis appears to be waiting on a subagent that is already done, or only notices completion after a follow-up prompt.
Evidence
Example label:
pwa-backtest-v2-lumbergh-fixesShared run id:
f69b3958-5826-4d11-ba2f-1c9dd3d7a811tasks/runs.sqliteshows two rows for the same run id:eaad5c31-9f18-4df2-9323-53df32ba49dcsubagentmaintimed_out48ae44e4-d071-4a18-9c82-e562baa33cdaclibillysucceededThe parent wrapper timed out at 17:39:42, while the child CLI run did not start until 17:39:45 and then completed successfully at 17:40:32.
Relevant log/session lines:
logs/gateway.log:295605:2026-05-16T17:39:42.225-06:00 [ws] ⇄ res ✓ agent.wait 300012ms ...agents/billy/sessions/3ba0944a-194f-4237-b0f6-042c02a71fd4.jsonl:54: prompt timeout recorded at2026-05-16T23:39:42.796Zagents/billy/sessions/3ba0944a-194f-4237-b0f6-042c02a71fd4.jsonl:71: final assistant message at2026-05-16T23:40:31.369Zlogs/gateway.log:295647:2026-05-16T17:40:32.483-06:00 Both fixes are complete and verified...The final child output included
## Task Completeand listed both completed fixes, so this was not a failed child run.Actual Behavior
The parent subagent task is marked
timed_outafter ~300s wall-clock time from wrapper start.The child runtime can remain queued or blocked for nearly the full parent timeout window, then start after the wrapper has already timed out.
When the child completes successfully after the parent timeout, the requester/Jarvis does not reliably reconcile that late success into the parent state or notify the requester.
Expected Behavior
The requester should not lose successful child results because of child start delay.
The parent subagent lifecycle should distinguish between:
If a child completes after the parent wrapper timed out, OpenClaw should reconcile the parent task state and deliver a late completion/update to the requester, or at minimum surface a clear
late_success_after_parent_timeoutstate.Suggested Fixes
succeeded/failedstates are propagated.runtime='subagent' status='timed_out'and a child row with the samerun_idlater reachessucceeded.Subagent: <label>does not remain as stale waiting when the child row is already terminal.Environment
Observed locally on OpenClaw
2026.5.12on macOS, withagents.defaults.subagents.runTimeoutSeconds=300.Post-upgrade local patch status (2026-05-19)
Upgraded local install from
2026.5.12to2026.5.18 (50a2481).The previous local
subagent-timeout-reconciliationpatch is no longer being carried as a local delta:unchangedon/opt/homebrew/lib/node_modules/openclaw/dist/subagent-registry-Bu5qGLSl.js.Post-upgrade smoke checks passed:
2026.5.18.openclaw status --deep: Gateway reachable, Discord OK, Telegram OK, event loop healthy.openclaw channels status --json: all enabled Discord accounts connected (main,farber,lumbergh,maverick,scout) and Telegram connected.openclaw tasks list --status running --json:0running tasks.Interpretation: this issue appears covered upstream in
2026.5.18; no local patch is being carried for this anymore.