[Bug]: Agent stalls indefinitely when model emits stopReason="stop" with no toolCall — only thinking block generated

### Bug type

Behavior bug (incorrect output/state without crash)

### Beta release blocker

No

### Summary

When the Qwen3.6-35B-A3B model terminates generation with stopReason="stop" inside a thinking block without emitting any toolCall, the embedded agent enters an infinite silent stall — no tools execute, no JSONL entries are appended, no gateway logs fire.

### Steps to reproduce

1. Run an embedded agent via Matrix direct chat with context window ~72k and a tool-rich system prompt (file I/O, subprocess management).
2. Let the agent accumulate tool output over multiple turns (~39 toolUse cycles) until context pressure increases toward the overflow threshold (~45k+ tokens).
3. Trigger a turn where the model enters extended thinking (>250 chars of internal reasoning) while planning to execute tools.
4. Observe: the model generates stopReason="stop" without any subsequent toolCall token stream — only a thinking block containing embedded `<function=...>` XML as plain text. No JSONL entries are appended after this, no gateway logs fire, and the session freezes indefinitely.

Reproducible in session aa55eb34-e588-439d-a780-359d5e0de27c (1 June 2026), occurring 3 independent times out of 55 assistant entries over ~1h 24m.

### Expected behavior

If an assistant response contains only a thinking block with no toolCall (regardless of stopReason), the gateway should detect this as an incomplete turn and either:
- Auto-retry with a corrective prompt, or
- Surface a clear error state with logging (livenessState=abandoned)

### Actual behavior

Session remains silently idle — resolved only by user intervention (~64s for short thinking blocks), auto-compaction after context overflow (~191s for longer blocks), or never (permanent stall, no recovery path). No gateway log entries appear after the thinking-only JSONL entry until resolution occurs (or ever, in permanent stalls). Session appears healthy in dashboards despite being completely frozen.

### OpenClaw version

2026.5.28

### Operating system

Ubuntu Server 24.04 LTS (kernel 6.8.0-124-generic x64)

### Install method

git clone — updated via `cd ~/openclaw && git pull && git fetch origin --tags && git checkout <tag> && pnpm build && openclaw gateway start/stop`

### Model

llama/Qwen3.6-35B-A3B-UD-MTP-Q4_K_M (Qwen3.6-35B-A3B)

### Provider / routing chain

OpenClaw → local llama.cpp

### Additional provider/model setup details

Context window: 72k, reserveTokensFloor: 20000. Channel: Matrix direct chat. Model runs locally via llama.cpp with Qwen3.6-35B-A3B in Q4_K_M quantization. The model has a deterministic failure mode where extended thinking blocks (>250 chars) can trigger premature EOS (stopReason="stop") without emitting the toolUse token stream, even when the reasoning text contains explicit plans to execute tools with embedded <function=...> XML syntax as plain text — not structured tool calls.

This is distinct from truncated-response hangs: the model hits its own clean EOS, so truncation detectors do not trigger. The response appears structurally valid to the gateway (clean stopReason + text content → treated as completed turn).

### Logs, screenshots, and evidence

```shell
Stall entries in session aa55eb34 (55 assistant entries / 124 JSONL lines):

| Entry | JSONL Line | Timestamp (UTC) | stopReason | Has toolCall? | Thinking length | Resolution |
|-------|------------|-----------------|------------|---------------|-----------------|------------|
| [13]  | 14         | 2026-06-01T14:11:29.359Z | "stop" | ❌ No | 275 chars (<function=read>) | Resolved ~64s by user message "setze fort" |
| [53]  | 54         | 2026-06-01T15:20:10.806Z | "stop" | ❌ No | 586 chars (<function=exec>) | Resolved by auto-compaction (~191s after overflow precheck) |
| [123] | 124        | 2026-06-01T15:32:51.707Z | "stop" | ❌ No | 264 chars (<function=process>) | Permanent stall — no further JSONL entries, never resolved |

(Note: Entries [62] and [102] are gateway re-serialization artifacts from compaction of Entries [13] and [53], not independent inference runs.)

Gateway log — context-overflow-precheck triggered at 17:20:40 CEST:
  compactionAttempts=0 (first ever in this session)
  estimatedPromptTokens=65345, promptBudgetBeforeReserve=51680, overflowTokens=13665

Gateway log — auto-compaction completed at 17:23:51 CEST (191s gap):
  auto-compaction succeeded; retrying prompt (truncated 9 tool result(s))

llama.cpp slot release at 17:20:10 CEST:
  slot release: id 0 | task 90849 | stop processing: n_tokens=45923
  ~1.254s inference duration — consistent with normal completion of a long thinking block, not an inference stall.

Entry [123] at 17:32:51 CEST: no gateway log activity after this point. Context had decreased after compaction, so no overflow was detected and no recovery was triggered.

Evidence available: session JSONL trajectory (aa55eb34-e588-439d-a780-359d5e0de27c.jsonl, 124 NDJSON lines), gateway journalctl logs with context-overflow-precheck entries, llama.cpp slot release logs.
```

### Impact and severity

Affected: Embedded agents using models prone to the stopReason="stop" without toolCall failure mode (observed with Qwen3.6-35B-A3B)
Severity: Critical — sessions freeze permanently with no automatic recovery, zero error logging, no user-visible indication of failure
Frequency: Deterministic under context pressure (>~40k tokens accumulated in session); observed 3 independent occurrences out of 55 assistant turns (~5.5%) in session aa55eb34
Consequence: Agent stops responding; manual re-triggering required. For automated/long-running workflows, this means silent data loss or state drift.

### Additional information

This is a second failure pattern distinct from the truncation bug reported in #89051 (and addressed by PR #89160). That bug covers truncated API responses where stopReason is missing/length — `resolveIncompleteTurnPayloadText` was silently returning success. The present bug covers clean EOS (stopReason="stop") with no toolCall emitted. Both produce identical observable behavior but require different gateway-layer fixes.

Root cause hypothesis:
- Model layer: Extended thinking blocks trigger premature EOS without emitting the toolUse token stream
- Gateway layer: No validation for stopReason="stop" + no toolCall case; the response appears structurally valid (has clean stopReason + text content → treated as completed turn)

Related: #89051 (original silent stall report), #87692 (silent abort without error log), PR #89160 (truncation path fix)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Agent stalls indefinitely when model emits stopReason="stop" with no toolCall — only thinking block generated #89787

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Agent stalls indefinitely when model emits stopReason="stop" with no toolCall — only thinking block generated #89787

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions