Skip to content

feat(core): detect tool validation retry loops and inject stop directive#3178

Merged
wenshao merged 5 commits into
QwenLM:mainfrom
euxaristia:feat/tool-validation-retry-loop-detection
Apr 18, 2026
Merged

feat(core): detect tool validation retry loops and inject stop directive#3178
wenshao merged 5 commits into
QwenLM:mainfrom
euxaristia:feat/tool-validation-retry-loop-detection

Conversation

@euxaristia

Copy link
Copy Markdown
Contributor

Summary

Prevents the model from entering an infinite retry loop when a tool call repeatedly fails schema validation with the same error (e.g., ask_user_question with malformed questions parameter).

Problem

When the model generates a tool call with invalid parameters, the validation error is returned as a function response. The model then retries with slightly different formatting but the same fundamental mistake, burning tokens in an infinite loop. This was observed with ask_user_question where malformed JSON caused 10+ retries with the same Parameter "questions" must be an array error.

Solution

  • Track consecutive validation failures per tool name in CoreToolScheduler
  • After 3 consecutive failures for the same tool, inject a strong stop directive into the error response:

    ⚠️ RETRY LOOP DETECTED: This tool call has failed validation multiple times with the same error. STOP retrying the same approach...

  • Reset the counter when any tool succeeds or a different tool is called

Changes

  • packages/core/src/core/coreToolScheduler.ts: Add validationRetryCounts Map, batch-level retry tracking in _schedule(), directive injection on threshold, counter reset on success
  • packages/core/src/core/coreToolScheduler.test.ts: Unit tests for retry detection and counter reset behavior

History note

This addresses a distinct class of loop from the existing LoopDetectionService (which tracks model behavior like repetitive thoughts and action stagnation). This fix catches tool API misuse loops where the same malformed tool call repeats at the scheduler level.

When a tool call repeatedly fails schema validation with the same error
(e.g., ask_user_question with malformed params), the model enters an
infinite retry loop burning tokens. After 3 consecutive validation
failures for the same tool, inject a strong stop directive into the
error response telling the model to change approach.

The retry counter resets when:
- Any tool call succeeds (clears all counters)
- A different tool is called (clears previous tool's counter)

Adds unit tests for the retry detection behavior.

Fixes the class of loop where the model repeatedly generates the same
malformed tool call parameters and never recovers.
EAGAIN (resource temporarily unavailable) is a transient non-blocking
read error from node-pty that should be treated as expected rather than
crashing the process. Also fix the settings schema generation script
that failed under Bun by using node --import tsx/esm instead of npx tsx.
Comment thread packages/core/src/core/coreToolScheduler.ts Outdated
Comment thread packages/cli/index.ts Outdated
Comment thread packages/core/src/core/coreToolScheduler.ts Outdated
Addressed all feedback from PR QwenLM#3178:

1. Fixed validation retry counter keying to use both tool name and error message
   to prevent different validation errors on the same tool from accumulating

2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors
   rather than all EAGAIN errors globally

3. Fixed validation retry counter reset behavior to only clear counters for
   the specific tool that succeeded, not all counters
euxaristia added a commit to euxaristia/qwen-code that referenced this pull request Apr 13, 2026
Addressed all feedback from PR QwenLM#3178:

1. Fixed validation retry counter keying to use both tool name and error message
   to prevent different validation errors on the same tool from accumulating

2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors
   rather than all EAGAIN errors globally

3. Fixed validation retry counter reset behavior to only clear counters for
   the specific tool that succeeded, not all counters
euxaristia added a commit to euxaristia/qwen-code that referenced this pull request Apr 13, 2026
Addressed all feedback from PR QwenLM#3178:

1. Fixed validation retry counter keying to use both tool name and error message
   to prevent different validation errors on the same tool from accumulating

2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors
   rather than all EAGAIN errors globally

3. Fixed validation retry counter reset behavior to only clear counters for
   the specific tool that succeeded, not all counters
…tive

Resolves lint failure from unused VALIDATION_RETRY_LOOP_THRESHOLD and
RETRY_LOOP_STOP_DIRECTIVE declarations after review feedback inlined
literals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// If the same tool name that previously failed validation appears again,
// keep tracking; otherwise reset.
if (this.validationRetryCounts.size > 0) {
const prevTool = this.validationRetryCounts.keys().next().value;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The batch-level check extracts prevTool as the full key from validationRetryCounts (e.g., "strictStringTool:params/value must be string") but compares it against r.name (e.g., "strictStringTool"). These never match, so hasPrevFailingTool is always false and validationRetryCounts.clear() is called on every subsequent call — resetting the counter before it can reach the threshold. The directive never fires.

Suggested change
const prevTool = this.validationRetryCounts.keys().next().value;
const prevKeys = [...this.validationRetryCounts.keys()];
const prevTools = new Set(prevKeys.map(k => k.split(':')[0]));
const hasPrevFailingTool = requestsToProcess.some(
(r) => prevTools.has(r.name),
);

— qwen3.6-plus via Qwen Code /review


switch (newStatus) {
case 'success': {
// Successful execution only resets retry state for this tool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] this.validationRetryCounts.delete(currentCall.request.name) uses the raw tool name as the key, but map entries are keyed as "toolName:errorMessage". This .delete() is a no-op — stale counters accumulate indefinitely.

Suggested change
// Successful execution only resets retry state for this tool
const prefix = `${currentCall.request.name}:`;
for (const [key] of this.validationRetryCounts) {
if (key.startsWith(prefix)) {
this.validationRetryCounts.delete(key);
}
}

— qwen3.6-plus via Qwen Code /review

this.validationRetryCounts.set(errorKey, count);

const finalError =
count >= VALIDATION_RETRY_LOOP_THRESHOLD

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The injected directive is advisory text only — there is no enforcement mechanism. If the model ignores it, validation failures continue indefinitely with no upper bound. Consider adding a hard cap (e.g., after 5-6 failures, stop calling the tool entirely for this turn) or telemetry logging when the directive is ignored.

— qwen3.6-plus via Qwen Code /review

Validation retry counters are keyed by "<toolName>:<errorMessage>", but
two call sites treated them as plain tool names — so the directive never
fired and per-tool counters never cleared on success.

- Batch-level reset (`_schedule`): derive prior tool names from keys
  before checking against incoming requests, instead of comparing
  incoming `r.name` to a full `"name:msg"` key (which never matches).
- Success path: introduce `clearRetryCountsForTool(name)` that deletes
  by `"<name>:"` prefix and reuse it from both the success branch and
  the post-validation-pass cleanup.
- Add a regression test covering: fail → fail → succeed → fail → fail
  must not trip the retry-loop directive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wenshao

wenshao commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

@euxaristia small PR-hygiene request — the description only covers the retry-loop detection change, but the diff also includes two unrelated fixes:

  • packages/cli/index.ts + packages/core/src/services/shellExecutionService.ts: treat PTY EAGAIN as an expected read error
  • scripts/build.js: switch schema generation from npx tsx to node --import tsx/esm (Bun compatibility)

These are explained in commit d375dd1, so the history isn't opaque, but they're invisible at the PR-description level. Could you either:

  1. Update the PR body to note these two piggyback fixes, or
  2. Split them into a separate PR

Option 2 is cleaner since they're logically independent from the retry-loop work, but either is fine. No need to block on this — just want the merge record to be self-describing. Thanks!

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified against HEAD f34c22a:

  • Both critical issues on counter key handling are fixed (clearRetryCountsForTool + indexOf(':') toolname extraction in the batch-level reset)
  • Counter now keyed by (tool, errorMessage) so unrelated validation mistakes don't accumulate
  • EAGAIN suppression narrowed to read-path
  • Per-tool counter reset no longer wipes other tools' history
  • Regression test for fail → fail → succeed → fail → fail is in place

The hard-cap / short-circuit enforcement (my earlier suggestion #6) is out of scope for this PR — tracking as a follow-up. LGTM.

@wenshao wenshao merged commit 5facd87 into QwenLM:main Apr 18, 2026
13 checks passed
mabry1985 added a commit to protoLabsAI/protoCLI that referenced this pull request May 2, 2026
…eight, model-switch event (QwenLM#3178, QwenLM#3721, QwenLM#3667) (#199)

* feat(core): detect tool validation retry loops and inject stop directive (QwenLM#3178)

Primary change: prevent the model from burning tokens in an infinite retry
loop when a tool call repeatedly fails schema validation with the same
error (observed with ask_user_question and a malformed `questions`
parameter retrying 10+ times with the same validation error).

- Track consecutive validation failures per (tool name, error message)
  pair in CoreToolScheduler via a `validationRetryCounts` Map.
- After 3 consecutive failures for the same (tool, error) pair, append a
  RETRY LOOP DETECTED directive to the error response instructing the
  model to stop, re-examine the schema, try a fundamentally different
  approach, or surface the issue to the user.
- Reset per-tool counters when the tool invocation succeeds; reset
  globally when an incoming batch shares no tool name with any
  previously failing tool; reset the per-tool counter when the tool
  returns a different validation error so unrelated mistakes do not
  accumulate toward the threshold.
- Distinct from LoopDetectionService, which tracks model-behavior loops
  (repeated thoughts, stagnant actions); this change catches tool-API
  misuse loops at the scheduler layer.

Piggyback fixes bundled in the same PR:

- packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts:
  treat PTY `EAGAIN` on the read path as an expected read error alongside
  `EIO`, avoiding noisy surface-level failures from transient
  non-blocking reads.
- scripts/build.js: switch the settings-schema generation step from
  `npx tsx` to `node --import tsx/esm` for Bun compatibility.

Tests:

- Unit tests in coreToolScheduler.test.ts cover: directive injection on
  the 3rd consecutive failure, counter reset when a different tool is
  called, and counter reset after a successful invocation of the same
  tool (fail → fail → succeed → fail → fail must not trip the directive).

* fix(cli): bound SubAgent display by visual height to prevent flicker (QwenLM#3721)

Cherry-picked from QwenLM/qwen-code: cae0927

Brings AgentExecutionDisplay's `availableHeight` budget enforcement —
header/section overhead is subtracted, prompt + tool-list budgets get
the rest. Adds shared `sliceTextByVisualHeight()` utility (lifted from
ToolMessage), measure-flicker test harness, and a subagent-flicker
regression integration test.

Adaptations:
- Skipped upstream's 90-line AppContainer layout block (`useFeedbackDialog`,
  `dialogsVisible`, `controlsHeight`, `availableTerminalHeight`,
  `setShellExecutionConfig`) — our HEAD already has equivalent layout
  machinery at lines 995-1053. Upstream's block depends on un-ported
  features (memory dialog, hooks dialog, bg tasks, rewind selector,
  delete dialog).
- Skipped 4 upstream `AppContainer.test.tsx` tests asserting
  refreshStatic / handleClearScreen behavior — our refresh logic differs.
- Removed unused `isInitialMount` ref — upstream PR intentionally
  removed the width-driven refreshStatic effect that used it.
- Defaulted `isFocused` to `true` to match upstream's prop default.
- Skipped `sanitizeSensitiveText` import in textUtils.test.ts (function
  not present in our fork).

Local lint-staged + CI lint disagree on `vitest/no-conditional-expect`
in the new sliceTextByVisualHeight tests; CI is the authoritative gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cli): refresh static header on model switch (QwenLM#3667)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com>
Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pomelo <czynwu@outlook.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
…ive (QwenLM#3178)

Primary change: prevent the model from burning tokens in an infinite retry
loop when a tool call repeatedly fails schema validation with the same
error (observed with ask_user_question and a malformed `questions`
parameter retrying 10+ times with the same validation error).

- Track consecutive validation failures per (tool name, error message)
  pair in CoreToolScheduler via a `validationRetryCounts` Map.
- After 3 consecutive failures for the same (tool, error) pair, append a
  RETRY LOOP DETECTED directive to the error response instructing the
  model to stop, re-examine the schema, try a fundamentally different
  approach, or surface the issue to the user.
- Reset per-tool counters when the tool invocation succeeds; reset
  globally when an incoming batch shares no tool name with any
  previously failing tool; reset the per-tool counter when the tool
  returns a different validation error so unrelated mistakes do not
  accumulate toward the threshold.
- Distinct from LoopDetectionService, which tracks model-behavior loops
  (repeated thoughts, stagnant actions); this change catches tool-API
  misuse loops at the scheduler layer.

Piggyback fixes bundled in the same PR:

- packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts:
  treat PTY `EAGAIN` on the read path as an expected read error alongside
  `EIO`, avoiding noisy surface-level failures from transient
  non-blocking reads.
- scripts/build.js: switch the settings-schema generation step from
  `npx tsx` to `node --import tsx/esm` for Bun compatibility.

Tests:

- Unit tests in coreToolScheduler.test.ts cover: directive injection on
  the 3rd consecutive failure, counter reset when a different tool is
  called, and counter reset after a successful invocation of the same
  tool (fail → fail → succeed → fail → fail must not trip the directive).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants