Skip to content

ThinkingOnly/empty-response retry cap is reset by every tool call, allowing unbounded reasoning loops #1346

@nilact

Description

@nilact

Summary

The post-tool empty-response retry guard in TurnStateTracker only counts
consecutive thinking-only/empty responses, and that counter is reset on
every tool-call batch. A reasoning model that alternates tool calls with
thinking-only responses (emits reasoning_content but no reply text and no
tool calls) therefore never reaches the cap, and the turn spins until the only
remaining ceiling — MaxToolIterationsPerTurn (default 60) — is hit. Each
iteration can carry up to ~3 thinking-only retries, many of them very large
reasoning blobs, which also drives context overflow → compaction → re-reason.

Observed on a self-hosted OpenAI-compatible provider (llama.cpp) running
Qwen3.6-35B-A3B, driven from Discord. Symptom in the daemon log, repeated
dozens of times within a single turn:

LLM produced ThinkingOnly response (137902 chars) — retrying with nudge

While this runs, the single local inference slot is monopolized and the channel
goes silent until the daemon is manually restarted.

Root cause

TurnStateTracker tracks _postToolEmptyResponseCount and fails the turn after
MaxPostToolEmptyRetries (3):

// TurnStateTracker.cs (EvaluateEmptyResponse, post-tool branch)
_postToolEmptyResponseCount++;
if (_postToolEmptyResponseCount > MaxPostToolEmptyRetries)
    return new EmptyResponseAction.Fail(...);
return new EmptyResponseAction.Retry(hasThinking ? ThinkingOnlyNudge : ...);

But every tool-call batch resets that counter:

// LlmSessionActor.cs (HandleToolCallResponse)

// Model produced tool calls — reset empty-response guards so they can
// fire again if the model stalls later in the chain.
_turnState.ResetEmptyResponseGuards();

// TurnStateTracker.cs
public void ResetEmptyResponseGuards()
{
    _postToolEmptyResponseCount = 0;
    _preToolEmptyResponseCount = 0;
    ForceNoToolsActive = false;
}

So the "3 strikes" only applies to consecutive thinking-only responses. The real-world loop interleaves:

tool call → ResetEmptyResponseGuards() → counter = 0
thinking-only → counter = 1 → retry
thinking-only → counter = 2 → retry
tool call → ResetEmptyResponseGuards() → counter = 0 ← cap never reached
thinking-only → counter = 1 → ...

The reset is correct in spirit (genuine tool progress means "not stuck right now"), but there is no per-turn backstop that survives it, so the alternating pattern is unbounded except by MaxToolIterationsPerTurn.

Why this matters beyond one model

The trigger is at the response-classification layer (LlmResponseKind.ThinkingOnly / Empty), not anything model-specific. Any provider that streams a separate reasoning channel and occasionally emits reasoning-without-answer-or-tools (Qwen3.x, DeepSeek-R1-style, etc.) can fall into this. Self-hosted single-slot backends feel it worst because the runaway turn blocks all other traffic on that endpoint.

Proposed fix

Add a cumulative, per-turn thinking-only/empty counter that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters(). Keep the existing consecutive counter for fast-fail; the cumulative counter is the backstop the tool-reset cannot evade.

Suggested shape (open to maintainer preference):

    • New field _cumulativeEmptyResponsesThisTurn, incremented in EvaluateEmptyResponse for both ThinkingOnly and Empty, reset only in ResetForNewTurn().
    • New config knob Session.MaxEmptyResponsesPerTurn (mirrors MaxToolIterationsPerTurn), default ~10, surfaced in SessionConfig + netclaw-config.v1.schema.json.
    • At the cap, prefer graceful escalation over a hard fail — mirror the existing ToolBudgetStatus.Exhausted path (disable tools + ask for a final answer for one last attempt), then fail if that is still empty.

Environment

netclaw 0.23.0 (daemon, self-hosted), Discord channel.
Provider: local OpenAI-compatible (llama.cpp), Qwen3.6-35B-A3B.
Single local inference slot, so a runaway turn blocks the channel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions