Summary
The post-tool empty-response retry guard in TurnStateTracker only counts
consecutive thinking-only/empty responses, and that counter is reset on
every tool-call batch. A reasoning model that alternates tool calls with
thinking-only responses (emits reasoning_content but no reply text and no
tool calls) therefore never reaches the cap, and the turn spins until the only
remaining ceiling — MaxToolIterationsPerTurn (default 60) — is hit. Each
iteration can carry up to ~3 thinking-only retries, many of them very large
reasoning blobs, which also drives context overflow → compaction → re-reason.
Observed on a self-hosted OpenAI-compatible provider (llama.cpp) running
Qwen3.6-35B-A3B, driven from Discord. Symptom in the daemon log, repeated
dozens of times within a single turn:
LLM produced ThinkingOnly response (137902 chars) — retrying with nudge
While this runs, the single local inference slot is monopolized and the channel
goes silent until the daemon is manually restarted.
Root cause
TurnStateTracker tracks _postToolEmptyResponseCount and fails the turn after
MaxPostToolEmptyRetries (3):
// TurnStateTracker.cs (EvaluateEmptyResponse, post-tool branch)
_postToolEmptyResponseCount++;
if (_postToolEmptyResponseCount > MaxPostToolEmptyRetries)
return new EmptyResponseAction.Fail(...);
return new EmptyResponseAction.Retry(hasThinking ? ThinkingOnlyNudge : ...);
But every tool-call batch resets that counter:
// LlmSessionActor.cs (HandleToolCallResponse)
// Model produced tool calls — reset empty-response guards so they can
// fire again if the model stalls later in the chain.
_turnState.ResetEmptyResponseGuards();
// TurnStateTracker.cs
public void ResetEmptyResponseGuards()
{
_postToolEmptyResponseCount = 0;
_preToolEmptyResponseCount = 0;
ForceNoToolsActive = false;
}
So the "3 strikes" only applies to consecutive thinking-only responses. The real-world loop interleaves:
tool call → ResetEmptyResponseGuards() → counter = 0
thinking-only → counter = 1 → retry
thinking-only → counter = 2 → retry
tool call → ResetEmptyResponseGuards() → counter = 0 ← cap never reached
thinking-only → counter = 1 → ...
The reset is correct in spirit (genuine tool progress means "not stuck right now"), but there is no per-turn backstop that survives it, so the alternating pattern is unbounded except by MaxToolIterationsPerTurn.
Why this matters beyond one model
The trigger is at the response-classification layer (LlmResponseKind.ThinkingOnly / Empty), not anything model-specific. Any provider that streams a separate reasoning channel and occasionally emits reasoning-without-answer-or-tools (Qwen3.x, DeepSeek-R1-style, etc.) can fall into this. Self-hosted single-slot backends feel it worst because the runaway turn blocks all other traffic on that endpoint.
Proposed fix
Add a cumulative, per-turn thinking-only/empty counter that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters(). Keep the existing consecutive counter for fast-fail; the cumulative counter is the backstop the tool-reset cannot evade.
Suggested shape (open to maintainer preference):
-
- New field _cumulativeEmptyResponsesThisTurn, incremented in EvaluateEmptyResponse for both ThinkingOnly and Empty, reset only in ResetForNewTurn().
-
- New config knob Session.MaxEmptyResponsesPerTurn (mirrors MaxToolIterationsPerTurn), default ~10, surfaced in SessionConfig + netclaw-config.v1.schema.json.
-
- At the cap, prefer graceful escalation over a hard fail — mirror the existing ToolBudgetStatus.Exhausted path (disable tools + ask for a final answer for one last attempt), then fail if that is still empty.
Environment
netclaw 0.23.0 (daemon, self-hosted), Discord channel.
Provider: local OpenAI-compatible (llama.cpp), Qwen3.6-35B-A3B.
Single local inference slot, so a runaway turn blocks the channel.
Summary
The post-tool empty-response retry guard in
TurnStateTrackeronly countsconsecutive thinking-only/empty responses, and that counter is reset on
every tool-call batch. A reasoning model that alternates tool calls with
thinking-only responses (emits
reasoning_contentbut no reply text and notool calls) therefore never reaches the cap, and the turn spins until the only
remaining ceiling —
MaxToolIterationsPerTurn(default 60) — is hit. Eachiteration can carry up to ~3 thinking-only retries, many of them very large
reasoning blobs, which also drives context overflow → compaction → re-reason.
Observed on a self-hosted OpenAI-compatible provider (llama.cpp) running
Qwen3.6-35B-A3B, driven from Discord. Symptom in the daemon log, repeated
dozens of times within a single turn:
LLM produced ThinkingOnly response (137902 chars) — retrying with nudge
While this runs, the single local inference slot is monopolized and the channel
goes silent until the daemon is manually restarted.
Root cause
TurnStateTrackertracks_postToolEmptyResponseCountand fails the turn afterMaxPostToolEmptyRetries(3):But every tool-call batch resets that counter:
So the "3 strikes" only applies to consecutive thinking-only responses. The real-world loop interleaves:
The reset is correct in spirit (genuine tool progress means "not stuck right now"), but there is no per-turn backstop that survives it, so the alternating pattern is unbounded except by MaxToolIterationsPerTurn.
Why this matters beyond one model
The trigger is at the response-classification layer (LlmResponseKind.ThinkingOnly / Empty), not anything model-specific. Any provider that streams a separate reasoning channel and occasionally emits reasoning-without-answer-or-tools (Qwen3.x, DeepSeek-R1-style, etc.) can fall into this. Self-hosted single-slot backends feel it worst because the runaway turn blocks all other traffic on that endpoint.
Proposed fix
Add a cumulative, per-turn thinking-only/empty counter that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters(). Keep the existing consecutive counter for fast-fail; the cumulative counter is the backstop the tool-reset cannot evade.
Suggested shape (open to maintainer preference):
Environment
netclaw 0.23.0 (daemon, self-hosted), Discord channel.
Provider: local OpenAI-compatible (llama.cpp), Qwen3.6-35B-A3B.
Single local inference slot, so a runaway turn blocks the channel.