Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
When the Qwen3.6-35B-A3B model terminates generation with stopReason="stop" inside a thinking block without emitting any toolCall, the embedded agent enters an infinite silent stall — no tools execute, no JSONL entries are appended, no gateway logs fire.
Steps to reproduce
- Run an embedded agent via Matrix direct chat with context window ~72k and a tool-rich system prompt (file I/O, subprocess management).
- Let the agent accumulate tool output over multiple turns (~39 toolUse cycles) until context pressure increases toward the overflow threshold (~45k+ tokens).
- Trigger a turn where the model enters extended thinking (>250 chars of internal reasoning) while planning to execute tools.
- Observe: the model generates stopReason="stop" without any subsequent toolCall token stream — only a thinking block containing embedded
<function=...> XML as plain text. No JSONL entries are appended after this, no gateway logs fire, and the session freezes indefinitely.
Reproducible in session aa55eb34-e588-439d-a780-359d5e0de27c (1 June 2026), occurring 3 independent times out of 55 assistant entries over ~1h 24m.
Expected behavior
If an assistant response contains only a thinking block with no toolCall (regardless of stopReason), the gateway should detect this as an incomplete turn and either:
- Auto-retry with a corrective prompt, or
- Surface a clear error state with logging (livenessState=abandoned)
Actual behavior
Session remains silently idle — resolved only by user intervention (~64s for short thinking blocks), auto-compaction after context overflow (~191s for longer blocks), or never (permanent stall, no recovery path). No gateway log entries appear after the thinking-only JSONL entry until resolution occurs (or ever, in permanent stalls). Session appears healthy in dashboards despite being completely frozen.
OpenClaw version
2026.5.28
Operating system
Ubuntu Server 24.04 LTS (kernel 6.8.0-124-generic x64)
Install method
git clone — updated via cd ~/openclaw && git pull && git fetch origin --tags && git checkout <tag> && pnpm build && openclaw gateway start/stop
Model
llama/Qwen3.6-35B-A3B-UD-MTP-Q4_K_M (Qwen3.6-35B-A3B)
Provider / routing chain
OpenClaw → local llama.cpp
Additional provider/model setup details
Context window: 72k, reserveTokensFloor: 20000. Channel: Matrix direct chat. Model runs locally via llama.cpp with Qwen3.6-35B-A3B in Q4_K_M quantization. The model has a deterministic failure mode where extended thinking blocks (>250 chars) can trigger premature EOS (stopReason="stop") without emitting the toolUse token stream, even when the reasoning text contains explicit plans to execute tools with embedded <function=...> XML syntax as plain text — not structured tool calls.
This is distinct from truncated-response hangs: the model hits its own clean EOS, so truncation detectors do not trigger. The response appears structurally valid to the gateway (clean stopReason + text content → treated as completed turn).
Logs, screenshots, and evidence
Stall entries in session aa55eb34 (55 assistant entries / 124 JSONL lines):
| Entry | JSONL Line | Timestamp (UTC) | stopReason | Has toolCall? | Thinking length | Resolution |
|-------|------------|-----------------|------------|---------------|-----------------|------------|
| [13] | 14 | 2026-06-01T14:11:29.359Z | "stop" | ❌ No | 275 chars (<function=read>) | Resolved ~64s by user message "setze fort" |
| [53] | 54 | 2026-06-01T15:20:10.806Z | "stop" | ❌ No | 586 chars (<function=exec>) | Resolved by auto-compaction (~191s after overflow precheck) |
| [123] | 124 | 2026-06-01T15:32:51.707Z | "stop" | ❌ No | 264 chars (<function=process>) | Permanent stall — no further JSONL entries, never resolved |
(Note: Entries [62] and [102] are gateway re-serialization artifacts from compaction of Entries [13] and [53], not independent inference runs.)
Gateway log — context-overflow-precheck triggered at 17:20:40 CEST:
compactionAttempts=0 (first ever in this session)
estimatedPromptTokens=65345, promptBudgetBeforeReserve=51680, overflowTokens=13665
Gateway log — auto-compaction completed at 17:23:51 CEST (191s gap):
auto-compaction succeeded; retrying prompt (truncated 9 tool result(s))
llama.cpp slot release at 17:20:10 CEST:
slot release: id 0 | task 90849 | stop processing: n_tokens=45923
~1.254s inference duration — consistent with normal completion of a long thinking block, not an inference stall.
Entry [123] at 17:32:51 CEST: no gateway log activity after this point. Context had decreased after compaction, so no overflow was detected and no recovery was triggered.
Evidence available: session JSONL trajectory (aa55eb34-e588-439d-a780-359d5e0de27c.jsonl, 124 NDJSON lines), gateway journalctl logs with context-overflow-precheck entries, llama.cpp slot release logs.
Impact and severity
Affected: Embedded agents using models prone to the stopReason="stop" without toolCall failure mode (observed with Qwen3.6-35B-A3B)
Severity: Critical — sessions freeze permanently with no automatic recovery, zero error logging, no user-visible indication of failure
Frequency: Deterministic under context pressure (>~40k tokens accumulated in session); observed 3 independent occurrences out of 55 assistant turns (~5.5%) in session aa55eb34
Consequence: Agent stops responding; manual re-triggering required. For automated/long-running workflows, this means silent data loss or state drift.
Additional information
This is a second failure pattern distinct from the truncation bug reported in #89051 (and addressed by PR #89160). That bug covers truncated API responses where stopReason is missing/length — resolveIncompleteTurnPayloadText was silently returning success. The present bug covers clean EOS (stopReason="stop") with no toolCall emitted. Both produce identical observable behavior but require different gateway-layer fixes.
Root cause hypothesis:
- Model layer: Extended thinking blocks trigger premature EOS without emitting the toolUse token stream
- Gateway layer: No validation for stopReason="stop" + no toolCall case; the response appears structurally valid (has clean stopReason + text content → treated as completed turn)
Related: #89051 (original silent stall report), #87692 (silent abort without error log), PR #89160 (truncation path fix)
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
When the Qwen3.6-35B-A3B model terminates generation with stopReason="stop" inside a thinking block without emitting any toolCall, the embedded agent enters an infinite silent stall — no tools execute, no JSONL entries are appended, no gateway logs fire.
Steps to reproduce
<function=...>XML as plain text. No JSONL entries are appended after this, no gateway logs fire, and the session freezes indefinitely.Reproducible in session aa55eb34-e588-439d-a780-359d5e0de27c (1 June 2026), occurring 3 independent times out of 55 assistant entries over ~1h 24m.
Expected behavior
If an assistant response contains only a thinking block with no toolCall (regardless of stopReason), the gateway should detect this as an incomplete turn and either:
Actual behavior
Session remains silently idle — resolved only by user intervention (~64s for short thinking blocks), auto-compaction after context overflow (~191s for longer blocks), or never (permanent stall, no recovery path). No gateway log entries appear after the thinking-only JSONL entry until resolution occurs (or ever, in permanent stalls). Session appears healthy in dashboards despite being completely frozen.
OpenClaw version
2026.5.28
Operating system
Ubuntu Server 24.04 LTS (kernel 6.8.0-124-generic x64)
Install method
git clone — updated via
cd ~/openclaw && git pull && git fetch origin --tags && git checkout <tag> && pnpm build && openclaw gateway start/stopModel
llama/Qwen3.6-35B-A3B-UD-MTP-Q4_K_M (Qwen3.6-35B-A3B)
Provider / routing chain
OpenClaw → local llama.cpp
Additional provider/model setup details
Context window: 72k, reserveTokensFloor: 20000. Channel: Matrix direct chat. Model runs locally via llama.cpp with Qwen3.6-35B-A3B in Q4_K_M quantization. The model has a deterministic failure mode where extended thinking blocks (>250 chars) can trigger premature EOS (stopReason="stop") without emitting the toolUse token stream, even when the reasoning text contains explicit plans to execute tools with embedded <function=...> XML syntax as plain text — not structured tool calls.
This is distinct from truncated-response hangs: the model hits its own clean EOS, so truncation detectors do not trigger. The response appears structurally valid to the gateway (clean stopReason + text content → treated as completed turn).
Logs, screenshots, and evidence
Impact and severity
Affected: Embedded agents using models prone to the stopReason="stop" without toolCall failure mode (observed with Qwen3.6-35B-A3B)
Severity: Critical — sessions freeze permanently with no automatic recovery, zero error logging, no user-visible indication of failure
Frequency: Deterministic under context pressure (>~40k tokens accumulated in session); observed 3 independent occurrences out of 55 assistant turns (~5.5%) in session aa55eb34
Consequence: Agent stops responding; manual re-triggering required. For automated/long-running workflows, this means silent data loss or state drift.
Additional information
This is a second failure pattern distinct from the truncation bug reported in #89051 (and addressed by PR #89160). That bug covers truncated API responses where stopReason is missing/length —
resolveIncompleteTurnPayloadTextwas silently returning success. The present bug covers clean EOS (stopReason="stop") with no toolCall emitted. Both produce identical observable behavior but require different gateway-layer fixes.Root cause hypothesis:
Related: #89051 (original silent stall report), #87692 (silent abort without error log), PR #89160 (truncation path fix)