fix: bound per-turn empty/thinking-only response loops (#1346)#1358
Merged
Aaronontheweb merged 6 commits intoJun 8, 2026
Merged
Conversation
086f001 to
e1b3469
Compare
Aaronontheweb
commented
Jun 8, 2026
Aaronontheweb
left a comment
Collaborator
Author
There was a problem hiding this comment.
The cap on thinking responses is done poorly and needs to be more tolerant of thinking models doing thinking IMHO.
| "Your last response contained only reasoning and no reply to the user. " | ||
| + "Stop thinking and write your answer now as a normal assistant message."; | ||
|
|
||
| // A length-truncated response was cut off mid-output by the provider's token |
Collaborator
Author
There was a problem hiding this comment.
the nudges look good
) A reasoning model that interleaves genuine tool calls with thinking-only or empty responses could spin until MaxToolIterationsPerTurn (60), because the consecutive empty-response guard is reset by ResetEmptyResponseGuards() on every tool batch and so never reaches its cap. - Add a cumulative per-turn empty/thinking-only counter in TurnStateTracker that is reset only in ResetForNewTurn() — never in ResetEmptyResponseGuards() or ResetToolCounters() — so the interleaved pattern still terminates. At the ceiling, escalate once with tools disabled (RetryWithoutTools) before failing, mirroring the tool-budget Exhausted path. - Add Session.MaxEmptyResponsesPerTurn config knob (default 10) plus schema entry with a default value for doctor --fix migration. - Make empty/thinking-only handling finish-reason aware: a length-truncated response was cut off mid-output, not refused, so it gets a brevity nudge instead of the counterproductive "stop thinking and answer now" scold. Thread truncation from response.FinishReason at both the session and sub-agent call sites. - Add TurnStateTracker regression tests (cumulative ceiling survives tool batch resets, truncation nudge branch, ResetForNewTurn clears the counter) and a MaxEmptyResponsesPerTurn default test.
- Clamp Session.MaxEmptyResponsesPerTurn to >= 1 at bind time, matching the sibling timeout fields. The schema minimum is only enforced by the advisory `netclaw doctor` check, so an out-of-range 0/negative would otherwise escalate-and-fail every turn on the first empty response. - Drop TurnStateTracker.DefaultMaxEmptyResponsesPerTurn and the default parameter value; make maxEmptyResponsesPerTurn a required argument, matching the MaxToolIterationsPerTurn precedent (no tracker-side const, value flows from the caller). The sub-agent path carries its own independent ceiling const, the way it already carries DefaultMaxToolIterations, instead of a hand-synced duplicate of the session default. - Add EmptyResponseEscalationTests: drive repeated thinking-only responses through the session actor and assert the escalation call is issued with tools disabled, then the turn fails — pinning the RetryWithoutTools -> forceNoTools wiring that the tracker unit tests cannot reach. - Document MaxEmptyResponsesPerTurn in docs/spec/configuration.md.
…ive guards only The cumulative counter penalised thinking models that legitimately emit thinking-only responses interleaved with tool calls (think → tool → think → tool). The consecutive pre-tool and post-tool counters already handle the stuck-in-a-loop case since they only reset when genuine tool work happens via ResetEmptyResponseGuards(). Removes MaxEmptyResponsesPerTurn from SessionConfig, the JSON schema, and both actor call sites.
Pre-tool: 2 → 5, post-tool: 3 → 8. The previous limits were too aggressive for reasoning models that legitimately produce several thinking-only responses in a row before answering or calling tools.
ae4df17 to
401ae87
Compare
LlmSessionIntegrationTests hardcoded empty response counts matching the old MaxPreToolEmptyRetries=2 and MaxPostToolEmptyRetries=3. Updated to match 5 and 8 respectively.
Collaborator
Author
|
LGTM - we'll have to see how it performs in the beta channel |
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1346. A reasoning model that interleaves genuine tool calls with thinking-only/empty responses could spin until
MaxToolIterationsPerTurn(60), because the consecutive empty-response guard is reset byResetEmptyResponseGuards()on every tool batch and so never reaches its cap. Observed on self-hosted Qwen3 (llama.cpp) over Discord: repeatedThinkingOnlyretries monopolizing the inference slot until manual restart.What changed
TurnStateTrackernow tracks a cumulative empty/thinking-only counter reset only inResetForNewTurn()— never inResetEmptyResponseGuards()orResetToolCounters()— so the interleaved tool/empty pattern still terminates. At the ceiling the turn escalates once with tools disabled (RetryWithoutTools), mirroring the tool-budgetExhaustedpath, then fails.Session.MaxEmptyResponsesPerTurn(default 10), clamped to >= 1 at bind, with a schema entry (default included fordoctor --fixmigration) and docs.truncatedis threaded fromresponse.FinishReasonat the session and sub-agent call sites.TurnStateTrackerunit tests (cumulative ceiling survives tool-batch resets, truncation nudge, reset clears the counter) plus an actor-levelEmptyResponseEscalationTeststhat pins theRetryWithoutTools → forceNoToolswiring end-to-end.Validation
dotnet test src/Netclaw.Actors.Tests/...(TurnStateTracker, EmptyResponseEscalation, MaxToolIteration, ToolLoopCompaction, SubAgentActor) — passdotnet test src/Netclaw.Configuration.Tests/...— 410 passdotnet test src/Netclaw.Cli.Tests/... --filter Doctor— 143 pass (schema)dotnet slopwatch analyze— 0 issuesAdd-FileHeaders.ps1 -Verify— cleanKnown limitation (not addressed here)
MaxToolIterationsPerTurnis best-effort once a model stops producing tool-callable responses: after the tool-budgetExhaustedpath disables tools, a thinking-only reply routes through the empty-responseRetryarm (FireLlmCall()with defaultforceNoTools:false), re-enabling tools. The new cumulative ceiling bounds this to ~MaxEmptyResponsesPerTurnextra tool rounds (previously unbounded). Real containment for dangerous tools remains the approval/ACL layer. Worth revisiting only for metered/non-approval-gated tools; the fix would unify tool-exposure and the force-no-tools violation check (LlmSessionActor~705/2644) onto one predicate.