feat(core): detect tool validation retry loops and inject stop directive#3178
Conversation
When a tool call repeatedly fails schema validation with the same error (e.g., ask_user_question with malformed params), the model enters an infinite retry loop burning tokens. After 3 consecutive validation failures for the same tool, inject a strong stop directive into the error response telling the model to change approach. The retry counter resets when: - Any tool call succeeds (clears all counters) - A different tool is called (clears previous tool's counter) Adds unit tests for the retry detection behavior. Fixes the class of loop where the model repeatedly generates the same malformed tool call parameters and never recovers.
EAGAIN (resource temporarily unavailable) is a transient non-blocking read error from node-pty that should be treated as expected rather than crashing the process. Also fix the settings schema generation script that failed under Bun by using node --import tsx/esm instead of npx tsx.
Addressed all feedback from PR QwenLM#3178: 1. Fixed validation retry counter keying to use both tool name and error message to prevent different validation errors on the same tool from accumulating 2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors rather than all EAGAIN errors globally 3. Fixed validation retry counter reset behavior to only clear counters for the specific tool that succeeded, not all counters
Addressed all feedback from PR QwenLM#3178: 1. Fixed validation retry counter keying to use both tool name and error message to prevent different validation errors on the same tool from accumulating 2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors rather than all EAGAIN errors globally 3. Fixed validation retry counter reset behavior to only clear counters for the specific tool that succeeded, not all counters
Addressed all feedback from PR QwenLM#3178: 1. Fixed validation retry counter keying to use both tool name and error message to prevent different validation errors on the same tool from accumulating 2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors rather than all EAGAIN errors globally 3. Fixed validation retry counter reset behavior to only clear counters for the specific tool that succeeded, not all counters
…tive Resolves lint failure from unused VALIDATION_RETRY_LOOP_THRESHOLD and RETRY_LOOP_STOP_DIRECTIVE declarations after review feedback inlined literals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| // If the same tool name that previously failed validation appears again, | ||
| // keep tracking; otherwise reset. | ||
| if (this.validationRetryCounts.size > 0) { | ||
| const prevTool = this.validationRetryCounts.keys().next().value; |
There was a problem hiding this comment.
[Critical] The batch-level check extracts prevTool as the full key from validationRetryCounts (e.g., "strictStringTool:params/value must be string") but compares it against r.name (e.g., "strictStringTool"). These never match, so hasPrevFailingTool is always false and validationRetryCounts.clear() is called on every subsequent call — resetting the counter before it can reach the threshold. The directive never fires.
| const prevTool = this.validationRetryCounts.keys().next().value; | |
| const prevKeys = [...this.validationRetryCounts.keys()]; | |
| const prevTools = new Set(prevKeys.map(k => k.split(':')[0])); | |
| const hasPrevFailingTool = requestsToProcess.some( | |
| (r) => prevTools.has(r.name), | |
| ); |
— qwen3.6-plus via Qwen Code /review
|
|
||
| switch (newStatus) { | ||
| case 'success': { | ||
| // Successful execution only resets retry state for this tool |
There was a problem hiding this comment.
[Critical] this.validationRetryCounts.delete(currentCall.request.name) uses the raw tool name as the key, but map entries are keyed as "toolName:errorMessage". This .delete() is a no-op — stale counters accumulate indefinitely.
| // Successful execution only resets retry state for this tool | |
| const prefix = `${currentCall.request.name}:`; | |
| for (const [key] of this.validationRetryCounts) { | |
| if (key.startsWith(prefix)) { | |
| this.validationRetryCounts.delete(key); | |
| } | |
| } |
— qwen3.6-plus via Qwen Code /review
| this.validationRetryCounts.set(errorKey, count); | ||
|
|
||
| const finalError = | ||
| count >= VALIDATION_RETRY_LOOP_THRESHOLD |
There was a problem hiding this comment.
[Suggestion] The injected directive is advisory text only — there is no enforcement mechanism. If the model ignores it, validation failures continue indefinitely with no upper bound. Consider adding a hard cap (e.g., after 5-6 failures, stop calling the tool entirely for this turn) or telemetry logging when the directive is ignored.
— qwen3.6-plus via Qwen Code /review
Validation retry counters are keyed by "<toolName>:<errorMessage>", but two call sites treated them as plain tool names — so the directive never fired and per-tool counters never cleared on success. - Batch-level reset (`_schedule`): derive prior tool names from keys before checking against incoming requests, instead of comparing incoming `r.name` to a full `"name:msg"` key (which never matches). - Success path: introduce `clearRetryCountsForTool(name)` that deletes by `"<name>:"` prefix and reuse it from both the success branch and the post-validation-pass cleanup. - Add a regression test covering: fail → fail → succeed → fail → fail must not trip the retry-loop directive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@euxaristia small PR-hygiene request — the description only covers the retry-loop detection change, but the diff also includes two unrelated fixes:
These are explained in commit
Option 2 is cleaner since they're logically independent from the retry-loop work, but either is fine. No need to block on this — just want the merge record to be self-describing. Thanks! |
wenshao
left a comment
There was a problem hiding this comment.
Verified against HEAD f34c22a:
- Both critical issues on counter key handling are fixed (
clearRetryCountsForTool+indexOf(':')toolname extraction in the batch-level reset) - Counter now keyed by
(tool, errorMessage)so unrelated validation mistakes don't accumulate - EAGAIN suppression narrowed to read-path
- Per-tool counter reset no longer wipes other tools' history
- Regression test for
fail → fail → succeed → fail → failis in place
The hard-cap / short-circuit enforcement (my earlier suggestion #6) is out of scope for this PR — tracking as a follow-up. LGTM.
…eight, model-switch event (QwenLM#3178, QwenLM#3721, QwenLM#3667) (#199) * feat(core): detect tool validation retry loops and inject stop directive (QwenLM#3178) Primary change: prevent the model from burning tokens in an infinite retry loop when a tool call repeatedly fails schema validation with the same error (observed with ask_user_question and a malformed `questions` parameter retrying 10+ times with the same validation error). - Track consecutive validation failures per (tool name, error message) pair in CoreToolScheduler via a `validationRetryCounts` Map. - After 3 consecutive failures for the same (tool, error) pair, append a RETRY LOOP DETECTED directive to the error response instructing the model to stop, re-examine the schema, try a fundamentally different approach, or surface the issue to the user. - Reset per-tool counters when the tool invocation succeeds; reset globally when an incoming batch shares no tool name with any previously failing tool; reset the per-tool counter when the tool returns a different validation error so unrelated mistakes do not accumulate toward the threshold. - Distinct from LoopDetectionService, which tracks model-behavior loops (repeated thoughts, stagnant actions); this change catches tool-API misuse loops at the scheduler layer. Piggyback fixes bundled in the same PR: - packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts: treat PTY `EAGAIN` on the read path as an expected read error alongside `EIO`, avoiding noisy surface-level failures from transient non-blocking reads. - scripts/build.js: switch the settings-schema generation step from `npx tsx` to `node --import tsx/esm` for Bun compatibility. Tests: - Unit tests in coreToolScheduler.test.ts cover: directive injection on the 3rd consecutive failure, counter reset when a different tool is called, and counter reset after a successful invocation of the same tool (fail → fail → succeed → fail → fail must not trip the directive). * fix(cli): bound SubAgent display by visual height to prevent flicker (QwenLM#3721) Cherry-picked from QwenLM/qwen-code: cae0927 Brings AgentExecutionDisplay's `availableHeight` budget enforcement — header/section overhead is subtracted, prompt + tool-list budgets get the rest. Adds shared `sliceTextByVisualHeight()` utility (lifted from ToolMessage), measure-flicker test harness, and a subagent-flicker regression integration test. Adaptations: - Skipped upstream's 90-line AppContainer layout block (`useFeedbackDialog`, `dialogsVisible`, `controlsHeight`, `availableTerminalHeight`, `setShellExecutionConfig`) — our HEAD already has equivalent layout machinery at lines 995-1053. Upstream's block depends on un-ported features (memory dialog, hooks dialog, bg tasks, rewind selector, delete dialog). - Skipped 4 upstream `AppContainer.test.tsx` tests asserting refreshStatic / handleClearScreen behavior — our refresh logic differs. - Removed unused `isInitialMount` ref — upstream PR intentionally removed the width-driven refreshStatic effect that used it. - Defaulted `isFocused` to `true` to match upstream's prop default. - Skipped `sanitizeSensitiveText` import in textUtils.test.ts (function not present in our fork). Local lint-staged + CI lint disagree on `vitest/no-conditional-expect` in the new sliceTextByVisualHeight tests; CI is the authoritative gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): refresh static header on model switch (QwenLM#3667) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com> Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pomelo <czynwu@outlook.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…ive (QwenLM#3178) Primary change: prevent the model from burning tokens in an infinite retry loop when a tool call repeatedly fails schema validation with the same error (observed with ask_user_question and a malformed `questions` parameter retrying 10+ times with the same validation error). - Track consecutive validation failures per (tool name, error message) pair in CoreToolScheduler via a `validationRetryCounts` Map. - After 3 consecutive failures for the same (tool, error) pair, append a RETRY LOOP DETECTED directive to the error response instructing the model to stop, re-examine the schema, try a fundamentally different approach, or surface the issue to the user. - Reset per-tool counters when the tool invocation succeeds; reset globally when an incoming batch shares no tool name with any previously failing tool; reset the per-tool counter when the tool returns a different validation error so unrelated mistakes do not accumulate toward the threshold. - Distinct from LoopDetectionService, which tracks model-behavior loops (repeated thoughts, stagnant actions); this change catches tool-API misuse loops at the scheduler layer. Piggyback fixes bundled in the same PR: - packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts: treat PTY `EAGAIN` on the read path as an expected read error alongside `EIO`, avoiding noisy surface-level failures from transient non-blocking reads. - scripts/build.js: switch the settings-schema generation step from `npx tsx` to `node --import tsx/esm` for Bun compatibility. Tests: - Unit tests in coreToolScheduler.test.ts cover: directive injection on the 3rd consecutive failure, counter reset when a different tool is called, and counter reset after a successful invocation of the same tool (fail → fail → succeed → fail → fail must not trip the directive).
Summary
Prevents the model from entering an infinite retry loop when a tool call repeatedly fails schema validation with the same error (e.g.,
ask_user_questionwith malformedquestionsparameter).Problem
When the model generates a tool call with invalid parameters, the validation error is returned as a function response. The model then retries with slightly different formatting but the same fundamental mistake, burning tokens in an infinite loop. This was observed with
ask_user_questionwhere malformed JSON caused 10+ retries with the sameParameter "questions" must be an arrayerror.Solution
CoreToolSchedulerChanges
packages/core/src/core/coreToolScheduler.ts: AddvalidationRetryCountsMap, batch-level retry tracking in_schedule(), directive injection on threshold, counter reset on successpackages/core/src/core/coreToolScheduler.test.ts: Unit tests for retry detection and counter reset behaviorHistory note
This addresses a distinct class of loop from the existing
LoopDetectionService(which tracks model behavior like repetitive thoughts and action stagnation). This fix catches tool API misuse loops where the same malformed tool call repeats at the scheduler level.