Skip to content

LLM idle timeout error silently dropped when agentRunStarted is true #84945

@nicken

Description

@nicken

Bug Description

When an LLM idle timeout occurs after the agent has started (e.g., after tool calls), the error is written to the session log but never broadcast to connected clients. Users see no error feedback — the response silently stops.

Reproduction

  1. Start an agent session via gateway (e.g., through ACP bridge / Ki-Agents)
  2. The agent begins processing — reads skills, makes tool calls (so agentRunStarted = true)
  3. On a subsequent LLM call, the model fails to produce any token within the idle timeout window (default 120s)
  4. The timeout error is logged to the session JSONL file but never reaches the client

Session Log Evidence

{"type":"custom","customType":"openclaw:prompt-error","data":{"error":"LLM idle timeout (120s): no response from model | LLM idle timeout (120s): no response from model",...}}

The session ends here — no final/error event is broadcast.

Root Cause

File: src/gateway/server-methods/chat.ts (.then() handler, ~line 2692 in main)

if (!agentRunStarted) {
  // Agent never started → processes deliveredReplies, broadcasts final/error ✅
  broadcastChatFinal(...);
} else if (!hasBeforeAgentRunGate) {
  // Agent started → only updates transcript, NO broadcast ❌
  await emitUserTranscriptUpdate();
}

The timeout error flows like this:

  1. run.ts handles the timeout by returning an error payload ({ text: "...", isError: true }), not throwing an exception
  2. The error payload is collected in deliveredReplies via the deliver callback
  3. The .then() handler checks agentRunStarted — since the agent had started (it made tool calls), it's true
  4. The code only calls emitUserTranscriptUpdate()no broadcastChatError() or broadcastChatFinal() is called
  5. Meanwhile, .catch() (which does call broadcastChatError()) is never reached because run.ts returned normally, not threw

Result: The error payload sits in deliveredReplies but is never broadcast. Connected clients (ACP bridges, etc.) never receive any error event.

Expected Behavior

Clients should receive a state: "error" chat event with the timeout error message, the same as other error scenarios.

Suggested Fix

In the .then() handler, when agentRunStarted = true, check deliveredReplies for payloads with isError: true. If found, call broadcastChatError() to notify connected clients:

} else {
  // Agent started — check for error payloads that weren't streamed
  const errorPayloads = deliveredReplies
    .filter((entry) => entry.payload.isError);
  if (errorPayloads.length > 0) {
    const errorMsg = errorPayloads
      .map((entry) => entry.payload.text)
      .filter(Boolean)
      .join(" | ");
    broadcastChatError({
      context,
      runId: clientRunId,
      sessionKey,
      errorMessage: errorMsg,
    });
  } else if (!hasBeforeAgentRunGate) {
    await emitUserTranscriptUpdate().catch(...);
  }
}

Environment

  • OpenClaw version: main branch (bde07ddb)
  • Model: glm-5-turbo (via anthropic-messages API)
  • Connection: ACP bridge (Ki-Agents gateway)
  • Idle timeout: 120s (default)

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions