fix: bound per-turn empty/thinking-only response loops (#1346) by Aaronontheweb · Pull Request #1358 · netclaw-dev/netclaw

Aaronontheweb · 2026-06-08T13:22:13Z

Summary

Fixes #1346. A reasoning model that interleaves genuine tool calls with thinking-only/empty responses could spin until MaxToolIterationsPerTurn (60), because the consecutive empty-response guard is reset by ResetEmptyResponseGuards() on every tool batch and so never reaches its cap. Observed on self-hosted Qwen3 (llama.cpp) over Discord: repeated ThinkingOnly retries monopolizing the inference slot until manual restart.

What changed

Cumulative per-turn backstop. TurnStateTracker now tracks a cumulative empty/thinking-only counter reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters() — so the interleaved tool/empty pattern still terminates. At the ceiling the turn escalates once with tools disabled (RetryWithoutTools), mirroring the tool-budget Exhausted path, then fails.
Config knob. Session.MaxEmptyResponsesPerTurn (default 10), clamped to >= 1 at bind, with a schema entry (default included for doctor --fix migration) and docs.
Finish-reason awareness. A length-truncated thinking-only response was cut off mid-output, not refused — it now gets a brevity nudge instead of the counterproductive "stop thinking and answer now" scold. truncated is threaded from response.FinishReason at the session and sub-agent call sites.
Tests. TurnStateTracker unit tests (cumulative ceiling survives tool-batch resets, truncation nudge, reset clears the counter) plus an actor-level EmptyResponseEscalationTests that pins the RetryWithoutTools → forceNoTools wiring end-to-end.

Validation

dotnet test src/Netclaw.Actors.Tests/... (TurnStateTracker, EmptyResponseEscalation, MaxToolIteration, ToolLoopCompaction, SubAgentActor) — pass
dotnet test src/Netclaw.Configuration.Tests/... — 410 pass
dotnet test src/Netclaw.Cli.Tests/... --filter Doctor — 143 pass (schema)
dotnet slopwatch analyze — 0 issues
Add-FileHeaders.ps1 -Verify — clean

Known limitation (not addressed here)

MaxToolIterationsPerTurn is best-effort once a model stops producing tool-callable responses: after the tool-budget Exhausted path disables tools, a thinking-only reply routes through the empty-response Retry arm (FireLlmCall() with default forceNoTools:false), re-enabling tools. The new cumulative ceiling bounds this to ~MaxEmptyResponsesPerTurn extra tool rounds (previously unbounded). Real containment for dangerous tools remains the approval/ACL layer. Worth revisiting only for metered/non-approval-gated tools; the fix would unify tool-exposure and the force-no-tools violation check (LlmSessionActor ~705/2644) onto one predicate.

Aaronontheweb

The cap on thinking responses is done poorly and needs to be more tolerant of thinking models doing thinking IMHO.

Aaronontheweb · 2026-06-08T13:41:56Z

        "Your last response contained only reasoning and no reply to the user. "
        + "Stop thinking and write your answer now as a normal assistant message.";

+    // A length-truncated response was cut off mid-output by the provider's token


the nudges look good

) A reasoning model that interleaves genuine tool calls with thinking-only or empty responses could spin until MaxToolIterationsPerTurn (60), because the consecutive empty-response guard is reset by ResetEmptyResponseGuards() on every tool batch and so never reaches its cap. - Add a cumulative per-turn empty/thinking-only counter in TurnStateTracker that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters() — so the interleaved pattern still terminates. At the ceiling, escalate once with tools disabled (RetryWithoutTools) before failing, mirroring the tool-budget Exhausted path. - Add Session.MaxEmptyResponsesPerTurn config knob (default 10) plus schema entry with a default value for doctor --fix migration. - Make empty/thinking-only handling finish-reason aware: a length-truncated response was cut off mid-output, not refused, so it gets a brevity nudge instead of the counterproductive "stop thinking and answer now" scold. Thread truncation from response.FinishReason at both the session and sub-agent call sites. - Add TurnStateTracker regression tests (cumulative ceiling survives tool batch resets, truncation nudge branch, ResetForNewTurn clears the counter) and a MaxEmptyResponsesPerTurn default test.

- Clamp Session.MaxEmptyResponsesPerTurn to >= 1 at bind time, matching the sibling timeout fields. The schema minimum is only enforced by the advisory `netclaw doctor` check, so an out-of-range 0/negative would otherwise escalate-and-fail every turn on the first empty response. - Drop TurnStateTracker.DefaultMaxEmptyResponsesPerTurn and the default parameter value; make maxEmptyResponsesPerTurn a required argument, matching the MaxToolIterationsPerTurn precedent (no tracker-side const, value flows from the caller). The sub-agent path carries its own independent ceiling const, the way it already carries DefaultMaxToolIterations, instead of a hand-synced duplicate of the session default. - Add EmptyResponseEscalationTests: drive repeated thinking-only responses through the session actor and assert the escalation call is issued with tools disabled, then the turn fails — pinning the RetryWithoutTools -> forceNoTools wiring that the tracker unit tests cannot reach. - Document MaxEmptyResponsesPerTurn in docs/spec/configuration.md.

…ive guards only The cumulative counter penalised thinking models that legitimately emit thinking-only responses interleaved with tool calls (think → tool → think → tool). The consecutive pre-tool and post-tool counters already handle the stuck-in-a-loop case since they only reset when genuine tool work happens via ResetEmptyResponseGuards(). Removes MaxEmptyResponsesPerTurn from SessionConfig, the JSON schema, and both actor call sites.

Pre-tool: 2 → 5, post-tool: 3 → 8. The previous limits were too aggressive for reasoning models that legitimately produce several thinking-only responses in a row before answering or calling tools.

LlmSessionIntegrationTests hardcoded empty response counts matching the old MaxPreToolEmptyRetries=2 and MaxPostToolEmptyRetries=3. Updated to match 5 and 8 respectively.

Aaronontheweb · 2026-06-08T17:20:22Z

LGTM - we'll have to see how it performs in the beta channel

Aaronontheweb force-pushed the investigate/thinking-only-classification branch from 086f001 to e1b3469 Compare June 8, 2026 13:23

Aaronontheweb added bug Something isn't working sessions LLM session actor, turn lifecycle, pipelines reliability Retries, resilience, graceful degradation config Configuration issues, netclaw doctor, schema validation. labels Jun 8, 2026

Aaronontheweb commented Jun 8, 2026

View reviewed changes

Aaronontheweb added 4 commits June 8, 2026 16:43

fix: raise consecutive empty-response limits to tolerate thinking models

401ae87

Pre-tool: 2 → 5, post-tool: 3 → 8. The previous limits were too aggressive for reasoning models that legitimately produce several thinking-only responses in a row before answering or calling tools.

Aaronontheweb force-pushed the investigate/thinking-only-classification branch from ae4df17 to 401ae87 Compare June 8, 2026 16:44

fix: update integration tests for raised empty-response limits

91e2867

LlmSessionIntegrationTests hardcoded empty response counts matching the old MaxPreToolEmptyRetries=2 and MaxPostToolEmptyRetries=3. Updated to match 5 and 8 respectively.

Aaronontheweb enabled auto-merge (squash) June 8, 2026 17:20

Merge branch 'dev' into investigate/thinking-only-classification

7e6095e

Aaronontheweb merged commit 13c9cc5 into netclaw-dev:dev Jun 8, 2026
14 of 15 checks passed

Aaronontheweb mentioned this pull request Jun 8, 2026

Release 0.24.0-beta.1 #1364

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bound per-turn empty/thinking-only response loops (#1346)#1358

fix: bound per-turn empty/thinking-only response loops (#1346)#1358
Aaronontheweb merged 6 commits into
netclaw-dev:devfrom
Aaronontheweb:investigate/thinking-only-classification

Aaronontheweb commented Jun 8, 2026

Uh oh!

Aaronontheweb left a comment

Uh oh!

Aaronontheweb Jun 8, 2026

Uh oh!

Uh oh!

Aaronontheweb commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Jun 8, 2026

Summary

What changed

Validation

Known limitation (not addressed here)

Uh oh!

Aaronontheweb left a comment

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Aaronontheweb commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant