ThinkingOnly/empty-response retry cap is reset by every tool call, allowing unbounded reasoning loops

## Summary

The post-tool empty-response retry guard in `TurnStateTracker` only counts
**consecutive** thinking-only/empty responses, and that counter is reset on
**every** tool-call batch. A reasoning model that alternates tool calls with
thinking-only responses (emits `reasoning_content` but no reply text and no
tool calls) therefore never reaches the cap, and the turn spins until the only
remaining ceiling — `MaxToolIterationsPerTurn` (default 60) — is hit. Each
iteration can carry up to ~3 thinking-only retries, many of them very large
reasoning blobs, which also drives context overflow → compaction → re-reason.

Observed on a self-hosted OpenAI-compatible provider (llama.cpp) running
Qwen3.6-35B-A3B, driven from Discord. Symptom in the daemon log, repeated
dozens of times within a single turn:

LLM produced ThinkingOnly response (137902 chars) — retrying with nudge

While this runs, the single local inference slot is monopolized and the channel
goes silent until the daemon is manually restarted.

## Root cause

`TurnStateTracker` tracks `_postToolEmptyResponseCount` and fails the turn after
`MaxPostToolEmptyRetries` (3):

```csharp
// TurnStateTracker.cs (EvaluateEmptyResponse, post-tool branch)
_postToolEmptyResponseCount++;
if (_postToolEmptyResponseCount > MaxPostToolEmptyRetries)
    return new EmptyResponseAction.Fail(...);
return new EmptyResponseAction.Retry(hasThinking ? ThinkingOnlyNudge : ...);
```

But every tool-call batch resets that counter:

```csharp
// LlmSessionActor.cs (HandleToolCallResponse)

// Model produced tool calls — reset empty-response guards so they can
// fire again if the model stalls later in the chain.
_turnState.ResetEmptyResponseGuards();

// TurnStateTracker.cs
public void ResetEmptyResponseGuards()
{
    _postToolEmptyResponseCount = 0;
    _preToolEmptyResponseCount = 0;
    ForceNoToolsActive = false;
}
```

So the "3 strikes" only applies to consecutive thinking-only responses. The real-world loop interleaves:

> tool call        → ResetEmptyResponseGuards() → counter = 0
> thinking-only    → counter = 1 → retry
> thinking-only    → counter = 2 → retry
> tool call        → ResetEmptyResponseGuards() → counter = 0   ← cap never reached
> thinking-only    → counter = 1 → ...

The reset is correct in spirit (genuine tool progress means "not stuck right now"), but there is no per-turn backstop that survives it, so the alternating pattern is unbounded except by MaxToolIterationsPerTurn.

### Why this matters beyond one model

The trigger is at the response-classification layer (LlmResponseKind.ThinkingOnly / Empty), not anything model-specific. Any provider that streams a separate reasoning channel and occasionally emits reasoning-without-answer-or-tools (Qwen3.x, DeepSeek-R1-style, etc.) can fall into this. Self-hosted single-slot backends feel it worst because the runaway turn blocks all other traffic on that endpoint.

### Proposed fix
Add a cumulative, per-turn thinking-only/empty counter that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters(). Keep the existing consecutive counter for fast-fail; the cumulative counter is the backstop the tool-reset cannot evade.

Suggested shape (open to maintainer preference):

1. - New field _cumulativeEmptyResponsesThisTurn, incremented in EvaluateEmptyResponse for both ThinkingOnly and Empty, reset only in ResetForNewTurn().
2. - New config knob Session.MaxEmptyResponsesPerTurn (mirrors MaxToolIterationsPerTurn), default ~10, surfaced in SessionConfig + netclaw-config.v1.schema.json.
3. - At the cap, prefer graceful escalation over a hard fail — mirror the existing ToolBudgetStatus.Exhausted path (disable tools + ask for a final answer for one last attempt), then fail if that is still empty.

### Environment
netclaw 0.23.0 (daemon, self-hosted), Discord channel.
Provider: local OpenAI-compatible (llama.cpp), Qwen3.6-35B-A3B.
Single local inference slot, so a runaway turn blocks the channel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThinkingOnly/empty-response retry cap is reset by every tool call, allowing unbounded reasoning loops #1346

Summary

Root cause

Why this matters beyond one model

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ThinkingOnly/empty-response retry cap is reset by every tool call, allowing unbounded reasoning loops #1346

Description

Summary

Root cause

Why this matters beyond one model

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions