Skip to content

fix: bound per-turn empty/thinking-only response loops (#1346)#1358

Merged
Aaronontheweb merged 6 commits into
netclaw-dev:devfrom
Aaronontheweb:investigate/thinking-only-classification
Jun 8, 2026
Merged

fix: bound per-turn empty/thinking-only response loops (#1346)#1358
Aaronontheweb merged 6 commits into
netclaw-dev:devfrom
Aaronontheweb:investigate/thinking-only-classification

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1346. A reasoning model that interleaves genuine tool calls with thinking-only/empty responses could spin until MaxToolIterationsPerTurn (60), because the consecutive empty-response guard is reset by ResetEmptyResponseGuards() on every tool batch and so never reaches its cap. Observed on self-hosted Qwen3 (llama.cpp) over Discord: repeated ThinkingOnly retries monopolizing the inference slot until manual restart.

What changed

  • Cumulative per-turn backstop. TurnStateTracker now tracks a cumulative empty/thinking-only counter reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters() — so the interleaved tool/empty pattern still terminates. At the ceiling the turn escalates once with tools disabled (RetryWithoutTools), mirroring the tool-budget Exhausted path, then fails.
  • Config knob. Session.MaxEmptyResponsesPerTurn (default 10), clamped to >= 1 at bind, with a schema entry (default included for doctor --fix migration) and docs.
  • Finish-reason awareness. A length-truncated thinking-only response was cut off mid-output, not refused — it now gets a brevity nudge instead of the counterproductive "stop thinking and answer now" scold. truncated is threaded from response.FinishReason at the session and sub-agent call sites.
  • Tests. TurnStateTracker unit tests (cumulative ceiling survives tool-batch resets, truncation nudge, reset clears the counter) plus an actor-level EmptyResponseEscalationTests that pins the RetryWithoutTools → forceNoTools wiring end-to-end.

Validation

  • dotnet test src/Netclaw.Actors.Tests/... (TurnStateTracker, EmptyResponseEscalation, MaxToolIteration, ToolLoopCompaction, SubAgentActor) — pass
  • dotnet test src/Netclaw.Configuration.Tests/... — 410 pass
  • dotnet test src/Netclaw.Cli.Tests/... --filter Doctor — 143 pass (schema)
  • dotnet slopwatch analyze — 0 issues
  • Add-FileHeaders.ps1 -Verify — clean

Known limitation (not addressed here)

MaxToolIterationsPerTurn is best-effort once a model stops producing tool-callable responses: after the tool-budget Exhausted path disables tools, a thinking-only reply routes through the empty-response Retry arm (FireLlmCall() with default forceNoTools:false), re-enabling tools. The new cumulative ceiling bounds this to ~MaxEmptyResponsesPerTurn extra tool rounds (previously unbounded). Real containment for dangerous tools remains the approval/ACL layer. Worth revisiting only for metered/non-approval-gated tools; the fix would unify tool-exposure and the force-no-tools violation check (LlmSessionActor ~705/2644) onto one predicate.

@Aaronontheweb Aaronontheweb force-pushed the investigate/thinking-only-classification branch from 086f001 to e1b3469 Compare June 8, 2026 13:23
@Aaronontheweb Aaronontheweb added bug Something isn't working sessions LLM session actor, turn lifecycle, pipelines reliability Retries, resilience, graceful degradation config Configuration issues, netclaw doctor, schema validation. labels Jun 8, 2026

@Aaronontheweb Aaronontheweb left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cap on thinking responses is done poorly and needs to be more tolerant of thinking models doing thinking IMHO.

"Your last response contained only reasoning and no reply to the user. "
+ "Stop thinking and write your answer now as a normal assistant message.";

// A length-truncated response was cut off mid-output by the provider's token

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nudges look good

Comment thread src/Netclaw.Actors/Sessions/Handlers/TurnStateTracker.cs Outdated
)

A reasoning model that interleaves genuine tool calls with thinking-only
or empty responses could spin until MaxToolIterationsPerTurn (60), because
the consecutive empty-response guard is reset by ResetEmptyResponseGuards()
on every tool batch and so never reaches its cap.

- Add a cumulative per-turn empty/thinking-only counter in TurnStateTracker
  that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards()
  or ResetToolCounters() — so the interleaved pattern still terminates. At the
  ceiling, escalate once with tools disabled (RetryWithoutTools) before failing,
  mirroring the tool-budget Exhausted path.
- Add Session.MaxEmptyResponsesPerTurn config knob (default 10) plus schema
  entry with a default value for doctor --fix migration.
- Make empty/thinking-only handling finish-reason aware: a length-truncated
  response was cut off mid-output, not refused, so it gets a brevity nudge
  instead of the counterproductive "stop thinking and answer now" scold.
  Thread truncation from response.FinishReason at both the session and
  sub-agent call sites.
- Add TurnStateTracker regression tests (cumulative ceiling survives tool
  batch resets, truncation nudge branch, ResetForNewTurn clears the counter)
  and a MaxEmptyResponsesPerTurn default test.
- Clamp Session.MaxEmptyResponsesPerTurn to >= 1 at bind time, matching the
  sibling timeout fields. The schema minimum is only enforced by the advisory
  `netclaw doctor` check, so an out-of-range 0/negative would otherwise
  escalate-and-fail every turn on the first empty response.
- Drop TurnStateTracker.DefaultMaxEmptyResponsesPerTurn and the default
  parameter value; make maxEmptyResponsesPerTurn a required argument, matching
  the MaxToolIterationsPerTurn precedent (no tracker-side const, value flows
  from the caller). The sub-agent path carries its own independent ceiling
  const, the way it already carries DefaultMaxToolIterations, instead of a
  hand-synced duplicate of the session default.
- Add EmptyResponseEscalationTests: drive repeated thinking-only responses
  through the session actor and assert the escalation call is issued with tools
  disabled, then the turn fails — pinning the RetryWithoutTools -> forceNoTools
  wiring that the tracker unit tests cannot reach.
- Document MaxEmptyResponsesPerTurn in docs/spec/configuration.md.
…ive guards only

The cumulative counter penalised thinking models that legitimately emit
thinking-only responses interleaved with tool calls (think → tool → think →
tool). The consecutive pre-tool and post-tool counters already handle the
stuck-in-a-loop case since they only reset when genuine tool work happens
via ResetEmptyResponseGuards(). Removes MaxEmptyResponsesPerTurn from
SessionConfig, the JSON schema, and both actor call sites.
Pre-tool: 2 → 5, post-tool: 3 → 8. The previous limits were too
aggressive for reasoning models that legitimately produce several
thinking-only responses in a row before answering or calling tools.
@Aaronontheweb Aaronontheweb force-pushed the investigate/thinking-only-classification branch from ae4df17 to 401ae87 Compare June 8, 2026 16:44
LlmSessionIntegrationTests hardcoded empty response counts matching the
old MaxPreToolEmptyRetries=2 and MaxPostToolEmptyRetries=3. Updated to
match 5 and 8 respectively.
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) June 8, 2026 17:20
@Aaronontheweb

Copy link
Copy Markdown
Collaborator Author

LGTM - we'll have to see how it performs in the beta channel

@Aaronontheweb Aaronontheweb merged commit 13c9cc5 into netclaw-dev:dev Jun 8, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working config Configuration issues, netclaw doctor, schema validation. reliability Retries, resilience, graceful degradation sessions LLM session actor, turn lifecycle, pipelines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ThinkingOnly/empty-response retry cap is reset by every tool call, allowing unbounded reasoning loops

1 participant