Bug: Session deadlock on lost toolResult
Summary
When a tool execution result is lost (network/IPC failure, process killed, timeout), the JSONL session log ends with a toolCall message but no corresponding toolResult. On next message, the agent loop blocks forever waiting for the result that will never arrive.
Reproduction
- Agent issues an
exec tool call that launches a background process with &
- Before the
toolResult is written to JSONL, the connection is lost / process is interrupted
- Session JSONL ends with a
role: assistant message containing type: toolCall
- Any subsequent message into the session is silently dropped — the agent never processes it
- Log shows:
typing TTL reached (2m); stopping typing indicator and nothing else
Observed in logs
2026-03-24T11:37:41-07:00: typing TTL reached (2m); stopping typing indicator
Session file had 61 lines, last line was an assistant toolCall for exec with no toolResult following it. Session status showed running indefinitely.
Root Cause
The agent loop likely reads pending toolCalls from JSONL and waits for their results before processing new inbound messages. When a result is lost (IPC drop, timeout), there is no recovery path — the session deadlocks silently.
Proposed Fixes
1. Session startup guard (highest priority)
On session load/resume, scan JSONL for any toolCall entries with no matching toolResult. Auto-inject a synthetic error toolResult before allowing new messages in:
{
"role": "toolResult",
"toolCallId": "<dangling-id>",
"toolName": "<name>",
"content": [{"type": "text", "text": "[tool execution lost — session was interrupted]"}],
"isError": true
}
2. exec tool hard timeout + guaranteed write-back
The exec tool should:
- Have a hard deadline (suggested: 30s default, configurable)
- Write the
toolResult to JSONL before returning to caller, even on failure/timeout
- Never silently drop results on network/IPC failure
3. Background commands (&) should return immediately
If a command is backgrounded, the tool should return immediately with PID, not wait indefinitely.
Workaround (manual)
Append a synthetic toolResult line to the JSONL matching the dangling toolCallId:
{"type":"message","id":"<uuid>","parentId":"<dangling-msg-id>","timestamp":"...","message":{"role":"toolResult","toolCallId":"<dangling-id>","toolName":"exec","content":[{"type":"text","text":"[tool execution lost]"}],"isError":true,"timestamp":...}}
This unblocks the session immediately without a full reset.
Impact
- Session silently stops responding to all messages
- No error surfaced to user — looks like the bot is just slow
- Only recovery currently is full session reset (loses context)
- Reproducible whenever exec tool result is lost mid-flight
Environment
- OpenClaw version: current (March 2026)
- macOS Darwin 25.3.0 arm64
- Node v25.6.1
Bug: Session deadlock on lost toolResult
Summary
When a tool execution result is lost (network/IPC failure, process killed, timeout), the JSONL session log ends with a
toolCallmessage but no correspondingtoolResult. On next message, the agent loop blocks forever waiting for the result that will never arrive.Reproduction
exectool call that launches a background process with&toolResultis written to JSONL, the connection is lost / process is interruptedrole: assistantmessage containingtype: toolCalltyping TTL reached (2m); stopping typing indicatorand nothing elseObserved in logs
Session file had 61 lines, last line was an assistant
toolCallforexecwith notoolResultfollowing it. Session status showedrunningindefinitely.Root Cause
The agent loop likely reads pending toolCalls from JSONL and waits for their results before processing new inbound messages. When a result is lost (IPC drop, timeout), there is no recovery path — the session deadlocks silently.
Proposed Fixes
1. Session startup guard (highest priority)
On session load/resume, scan JSONL for any
toolCallentries with no matchingtoolResult. Auto-inject a synthetic errortoolResultbefore allowing new messages in:{ "role": "toolResult", "toolCallId": "<dangling-id>", "toolName": "<name>", "content": [{"type": "text", "text": "[tool execution lost — session was interrupted]"}], "isError": true }2. exec tool hard timeout + guaranteed write-back
The
exectool should:toolResultto JSONL before returning to caller, even on failure/timeout3. Background commands (
&) should return immediatelyIf a command is backgrounded, the tool should return immediately with PID, not wait indefinitely.
Workaround (manual)
Append a synthetic
toolResultline to the JSONL matching the danglingtoolCallId:{"type":"message","id":"<uuid>","parentId":"<dangling-msg-id>","timestamp":"...","message":{"role":"toolResult","toolCallId":"<dangling-id>","toolName":"exec","content":[{"type":"text","text":"[tool execution lost]"}],"isError":true,"timestamp":...}}This unblocks the session immediately without a full reset.
Impact
Environment