feat(core): detect tool validation retry loops and inject stop directive by euxaristia · Pull Request #3178 · QwenLM/qwen-code

euxaristia · 2026-04-12T23:37:25Z

Summary

Prevents the model from entering an infinite retry loop when a tool call repeatedly fails schema validation with the same error (e.g., ask_user_question with malformed questions parameter).

Problem

When the model generates a tool call with invalid parameters, the validation error is returned as a function response. The model then retries with slightly different formatting but the same fundamental mistake, burning tokens in an infinite loop. This was observed with ask_user_question where malformed JSON caused 10+ retries with the same Parameter "questions" must be an array error.

Solution

Track consecutive validation failures per tool name in CoreToolScheduler
After 3 consecutive failures for the same tool, inject a strong stop directive into the error response:

⚠️ RETRY LOOP DETECTED: This tool call has failed validation multiple times with the same error. STOP retrying the same approach...
Reset the counter when any tool succeeds or a different tool is called

Changes

packages/core/src/core/coreToolScheduler.ts: Add validationRetryCounts Map, batch-level retry tracking in _schedule(), directive injection on threshold, counter reset on success
packages/core/src/core/coreToolScheduler.test.ts: Unit tests for retry detection and counter reset behavior

History note

This addresses a distinct class of loop from the existing LoopDetectionService (which tracks model behavior like repetitive thoughts and action stagnation). This fix catches tool API misuse loops where the same malformed tool call repeats at the scheduler level.

When a tool call repeatedly fails schema validation with the same error (e.g., ask_user_question with malformed params), the model enters an infinite retry loop burning tokens. After 3 consecutive validation failures for the same tool, inject a strong stop directive into the error response telling the model to change approach. The retry counter resets when: - Any tool call succeeds (clears all counters) - A different tool is called (clears previous tool's counter) Adds unit tests for the retry detection behavior. Fixes the class of loop where the model repeatedly generates the same malformed tool call parameters and never recovers.

EAGAIN (resource temporarily unavailable) is a transient non-blocking read error from node-pty that should be treated as expected rather than crashing the process. Also fix the settings schema generation script that failed under Bun by using node --import tsx/esm instead of npx tsx.

Addressed all feedback from PR QwenLM#3178: 1. Fixed validation retry counter keying to use both tool name and error message to prevent different validation errors on the same tool from accumulating 2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors rather than all EAGAIN errors globally 3. Fixed validation retry counter reset behavior to only clear counters for the specific tool that succeeded, not all counters

…tive Resolves lint failure from unused VALIDATION_RETRY_LOOP_THRESHOLD and RETRY_LOOP_STOP_DIRECTIVE declarations after review feedback inlined literals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wenshao · 2026-04-15T03:59:40Z

+      // If the same tool name that previously failed validation appears again,
+      // keep tracking; otherwise reset.
+      if (this.validationRetryCounts.size > 0) {
+        const prevTool = this.validationRetryCounts.keys().next().value;


[Critical] The batch-level check extracts prevTool as the full key from validationRetryCounts (e.g., "strictStringTool:params/value must be string") but compares it against r.name (e.g., "strictStringTool"). These never match, so hasPrevFailingTool is always false and validationRetryCounts.clear() is called on every subsequent call — resetting the counter before it can reach the threshold. The directive never fires.

Suggested change

const prevTool = this.validationRetryCounts.keys().next().value;

const prevKeys = [...this.validationRetryCounts.keys()];

const prevTools = new Set(prevKeys.map(k => k.split(':')[0]));

const hasPrevFailingTool = requestsToProcess.some(

(r) => prevTools.has(r.name),

);

— qwen3.6-plus via Qwen Code /review

wenshao · 2026-04-15T03:59:40Z


      switch (newStatus) {
        case 'success': {
+          // Successful execution only resets retry state for this tool


[Critical] this.validationRetryCounts.delete(currentCall.request.name) uses the raw tool name as the key, but map entries are keyed as "toolName:errorMessage". This .delete() is a no-op — stale counters accumulate indefinitely.

Suggested change

// Successful execution only resets retry state for this tool

const prefix = `${currentCall.request.name}:`;

for (const [key] of this.validationRetryCounts) {

if (key.startsWith(prefix)) {

this.validationRetryCounts.delete(key);

}

}

— qwen3.6-plus via Qwen Code /review

wenshao · 2026-04-15T03:59:40Z

+          this.validationRetryCounts.set(errorKey, count);
+
+          const finalError =
+            count >= VALIDATION_RETRY_LOOP_THRESHOLD


[Suggestion] The injected directive is advisory text only — there is no enforcement mechanism. If the model ignores it, validation failures continue indefinitely with no upper bound. Consider adding a hard cap (e.g., after 5-6 failures, stop calling the tool entirely for this turn) or telemetry logging when the directive is ignored.

— qwen3.6-plus via Qwen Code /review

Validation retry counters are keyed by "<toolName>:<errorMessage>", but two call sites treated them as plain tool names — so the directive never fired and per-tool counters never cleared on success. - Batch-level reset (`_schedule`): derive prior tool names from keys before checking against incoming requests, instead of comparing incoming `r.name` to a full `"name:msg"` key (which never matches). - Success path: introduce `clearRetryCountsForTool(name)` that deletes by `"<name>:"` prefix and reuse it from both the success branch and the post-validation-pass cleanup. - Add a regression test covering: fail → fail → succeed → fail → fail must not trip the retry-loop directive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wenshao · 2026-04-17T23:13:49Z

@euxaristia small PR-hygiene request — the description only covers the retry-loop detection change, but the diff also includes two unrelated fixes:

packages/cli/index.ts + packages/core/src/services/shellExecutionService.ts: treat PTY EAGAIN as an expected read error
scripts/build.js: switch schema generation from npx tsx to node --import tsx/esm (Bun compatibility)

These are explained in commit d375dd1, so the history isn't opaque, but they're invisible at the PR-description level. Could you either:

Update the PR body to note these two piggyback fixes, or
Split them into a separate PR

Option 2 is cleaner since they're logically independent from the retry-loop work, but either is fine. No need to block on this — just want the merge record to be self-describing. Thanks!

wenshao

Verified against HEAD f34c22a:

Both critical issues on counter key handling are fixed (clearRetryCountsForTool + indexOf(':') toolname extraction in the batch-level reset)
Counter now keyed by (tool, errorMessage) so unrelated validation mistakes don't accumulate
EAGAIN suppression narrowed to read-path
Per-tool counter reset no longer wipes other tools' history
Regression test for fail → fail → succeed → fail → fail is in place

The hard-cap / short-circuit enforcement (my earlier suggestion #6) is out of scope for this PR — tracking as a follow-up. LGTM.

…eight, model-switch event (QwenLM#3178, QwenLM#3721, QwenLM#3667) (#199) * feat(core): detect tool validation retry loops and inject stop directive (QwenLM#3178) Primary change: prevent the model from burning tokens in an infinite retry loop when a tool call repeatedly fails schema validation with the same error (observed with ask_user_question and a malformed `questions` parameter retrying 10+ times with the same validation error). - Track consecutive validation failures per (tool name, error message) pair in CoreToolScheduler via a `validationRetryCounts` Map. - After 3 consecutive failures for the same (tool, error) pair, append a RETRY LOOP DETECTED directive to the error response instructing the model to stop, re-examine the schema, try a fundamentally different approach, or surface the issue to the user. - Reset per-tool counters when the tool invocation succeeds; reset globally when an incoming batch shares no tool name with any previously failing tool; reset the per-tool counter when the tool returns a different validation error so unrelated mistakes do not accumulate toward the threshold. - Distinct from LoopDetectionService, which tracks model-behavior loops (repeated thoughts, stagnant actions); this change catches tool-API misuse loops at the scheduler layer. Piggyback fixes bundled in the same PR: - packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts: treat PTY `EAGAIN` on the read path as an expected read error alongside `EIO`, avoiding noisy surface-level failures from transient non-blocking reads. - scripts/build.js: switch the settings-schema generation step from `npx tsx` to `node --import tsx/esm` for Bun compatibility. Tests: - Unit tests in coreToolScheduler.test.ts cover: directive injection on the 3rd consecutive failure, counter reset when a different tool is called, and counter reset after a successful invocation of the same tool (fail → fail → succeed → fail → fail must not trip the directive). * fix(cli): bound SubAgent display by visual height to prevent flicker (QwenLM#3721) Cherry-picked from QwenLM/qwen-code: cae0927 Brings AgentExecutionDisplay's `availableHeight` budget enforcement — header/section overhead is subtracted, prompt + tool-list budgets get the rest. Adds shared `sliceTextByVisualHeight()` utility (lifted from ToolMessage), measure-flicker test harness, and a subagent-flicker regression integration test. Adaptations: - Skipped upstream's 90-line AppContainer layout block (`useFeedbackDialog`, `dialogsVisible`, `controlsHeight`, `availableTerminalHeight`, `setShellExecutionConfig`) — our HEAD already has equivalent layout machinery at lines 995-1053. Upstream's block depends on un-ported features (memory dialog, hooks dialog, bg tasks, rewind selector, delete dialog). - Skipped 4 upstream `AppContainer.test.tsx` tests asserting refreshStatic / handleClearScreen behavior — our refresh logic differs. - Removed unused `isInitialMount` ref — upstream PR intentionally removed the width-driven refreshStatic effect that used it. - Defaulted `isFocused` to `true` to match upstream's prop default. - Skipped `sanitizeSensitiveText` import in textUtils.test.ts (function not present in our fork). Local lint-staged + CI lint disagree on `vitest/no-conditional-expect` in the new sliceTextByVisualHeight tests; CI is the authoritative gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): refresh static header on model switch (QwenLM#3667) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com> Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: pomelo <czynwu@outlook.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

…ive (QwenLM#3178) Primary change: prevent the model from burning tokens in an infinite retry loop when a tool call repeatedly fails schema validation with the same error (observed with ask_user_question and a malformed `questions` parameter retrying 10+ times with the same validation error). - Track consecutive validation failures per (tool name, error message) pair in CoreToolScheduler via a `validationRetryCounts` Map. - After 3 consecutive failures for the same (tool, error) pair, append a RETRY LOOP DETECTED directive to the error response instructing the model to stop, re-examine the schema, try a fundamentally different approach, or surface the issue to the user. - Reset per-tool counters when the tool invocation succeeds; reset globally when an incoming batch shares no tool name with any previously failing tool; reset the per-tool counter when the tool returns a different validation error so unrelated mistakes do not accumulate toward the threshold. - Distinct from LoopDetectionService, which tracks model-behavior loops (repeated thoughts, stagnant actions); this change catches tool-API misuse loops at the scheduler layer. Piggyback fixes bundled in the same PR: - packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts: treat PTY `EAGAIN` on the read path as an expected read error alongside `EIO`, avoiding noisy surface-level failures from transient non-blocking reads. - scripts/build.js: switch the settings-schema generation step from `npx tsx` to `node --import tsx/esm` for Bun compatibility. Tests: - Unit tests in coreToolScheduler.test.ts cover: directive injection on the 3rd consecutive failure, counter reset when a different tool is called, and counter reset after a successful invocation of the same tool (fail → fail → succeed → fail → fail must not trip the directive).

euxaristia added 2 commits April 12, 2026 19:36

wenshao reviewed Apr 13, 2026

View reviewed changes

Comment thread packages/core/src/core/coreToolScheduler.ts Outdated

Comment thread packages/cli/index.ts Outdated

Comment thread packages/core/src/core/coreToolScheduler.ts Outdated

github-actions Bot mentioned this pull request Apr 13, 2026

📊 AI CLI 工具社区动态日报 2026-04-13 gsscsd/big_model_radar#177

Open

fix(core): use extracted constants for retry loop threshold and direc…

8456487

…tive Resolves lint failure from unused VALIDATION_RETRY_LOOP_THRESHOLD and RETRY_LOOP_STOP_DIRECTIVE declarations after review feedback inlined literals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wenshao requested changes Apr 15, 2026

View reviewed changes

wenshao approved these changes Apr 17, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 18, 2026

📊 AI CLI 工具社区动态日报 2026-04-18 gsscsd/big_model_radar#203

Open

wenshao merged commit 5facd87 into QwenLM:main Apr 18, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): detect tool validation retry loops and inject stop directive#3178

feat(core): detect tool validation retry loops and inject stop directive#3178
wenshao merged 5 commits into
QwenLM:mainfrom
euxaristia:feat/tool-validation-retry-loop-detection

euxaristia commented Apr 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao Apr 15, 2026

Uh oh!

wenshao Apr 15, 2026

Uh oh!

wenshao Apr 15, 2026

Uh oh!

wenshao commented Apr 17, 2026

Uh oh!

wenshao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        const prevTool = this.validationRetryCounts.keys().next().value;
+        const prevKeys = [...this.validationRetryCounts.keys()];
+        const prevTools = new Set(prevKeys.map(k => k.split(':')[0]));
+        const hasPrevFailingTool = requestsToProcess.some(
+          (r) => prevTools.has(r.name),
+        );

-          // Successful execution only resets retry state for this tool
+          const prefix = `${currentCall.request.name}:`;
+          for (const [key] of this.validationRetryCounts) {
+            if (key.startsWith(prefix)) {
+              this.validationRetryCounts.delete(key);
+            }
+          }

Conversation

euxaristia commented Apr 12, 2026

Summary

Problem

Solution

Changes

History note

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao commented Apr 17, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants