Skip to content

feat(core): enhanced loop detection with stagnation + validation-retry checks#3236

Merged
wenshao merged 4 commits into
QwenLM:mainfrom
euxaristia:feat/enhanced-loop-detection
Apr 19, 2026
Merged

feat(core): enhanced loop detection with stagnation + validation-retry checks#3236
wenshao merged 4 commits into
QwenLM:mainfrom
euxaristia:feat/enhanced-loop-detection

Conversation

@euxaristia

@euxaristia euxaristia commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Detect tool validation retry loops and inject a stop directive to break the cycle
  • Enhance loop detection with read-file, action-stagnation, and repetitive-thought checks
  • Repetitive-thought detection now hooks GeminiEventType.Thought events (the model's actual reasoning channel) rather than substring-matching hedge phrases in prose Content, and resets across tool-call boundaries so one prompt can drive many roundtrips without false positives

Test plan

  • loopDetectionService.test.ts passes (41 tests)
  • coreToolScheduler.test.ts passes
  • Manual: trigger a validation-retry loop and confirm stop directive injection
  • Manual: confirm thought/action-stagnation/read-file detection fires

Comment thread packages/core/src/services/loopDetectionService.ts
Comment thread packages/core/src/core/coreToolScheduler.ts
Comment thread packages/core/src/services/loopDetectionService.ts Outdated
@euxaristia euxaristia force-pushed the feat/enhanced-loop-detection branch from fa14ed9 to 5b1ffd9 Compare April 18, 2026 14:12

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] packages/sdk-java/qwencode/src/main/java/com/alibaba/qwen/code/cli/protocol/data/AssistantContent.java:49 renames the public nested Java SDK interface from ThingkingAssistantContent to ThinkingAssistantContent without a compatibility alias. This is a breaking public API change for downstream SDK consumers that still import or reference the old type name, and their code will stop compiling after upgrade. Please keep a deprecated compatibility alias for at least one deprecation cycle.

[Critical] packages/core/src/index.ts:77-116 stops exporting runtime tool class values such as EditTool, ShellTool, ExitPlanModeTool, and TodoWriteTool, and now re-exports them as export type only. External TypeScript/JavaScript consumers importing these symbols from @qwen-code/qwen-code-core for runtime usage will break because those named value exports no longer exist on the package entrypoint. Please preserve the prior value exports from the root barrel or add a compatibility layer/deprecation period instead of converting them to type-only exports in one release.

— gpt-5.4 via Qwen Code /review

@euxaristia

Copy link
Copy Markdown
Contributor Author

Both "Critical" items in this review reference files that aren't in this PR's diff.

Verified against GET /repos/QwenLM/qwen-code/pulls/3236/files — head 5b1ffd9, 6 files changed, +577/−16:

  • .gitignore
  • packages/core/src/core/coreToolScheduler.ts (+ test)
  • packages/core/src/services/loopDetectionService.ts (+ test)
  • packages/core/src/telemetry/types.ts
  1. packages/sdk-java/.../AssistantContent.java:49 — the entire packages/sdk-java/ tree is untouched by this PR. No rename of ThingkingAssistantContentThinkingAssistantContent happens here.
  2. packages/core/src/index.ts:77-116packages/core/src/index.ts is not in the changed-files list. No exports for EditTool, ShellTool, ExitPlanModeTool, or TodoWriteTool are altered by this PR.

For context, the three inline comments from the earlier 02:50 UTC review pass (repetitive-thoughts window, isReadLikeTool substring match, per-tool validationRetryCounts pruning) were all addressed in commit 5b1ffd9 before this second review ran.

— Claude Opus 4.7 (xhigh) via Claude Code

@wenshao

wenshao commented Apr 18, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the iterations — the per-tool validationRetryCounts pruning and the READ_LIKE_TOOL_NAMES exact-set + read_/list_ prefix fallback are clearly better than the earlier pass. A few concerns remain after a close re-read at head 5b1ffd9:

1. thoughtHistory is never cleared between tool calls within a single prompt

In addAndCheck, ToolCallRequest calls resetContentTracking() which wipes streamContentHistory / contentStats, but thoughtHistory is only cleared in reset(promptId). And reset(promptId) only runs on UserQuery / Cron / Notification (see packages/core/src/core/client.ts around loopDetector.reset(prompt_id)). One user prompt can drive many tool roundtrips in a single turn, so thoughtHistory accumulates across the whole chain. Sequence like:

  • Content "I should check..." (push)
  • ToolCallRequest (real progress)
  • Content "I should check..." (push)
  • ToolCallRequest
  • Content "I should check..." (push) → third identical entry → REPETITIVE_THOUGHTS fires

even though the intervening tool calls represent real progress. Either clear thoughtHistory inside resetContentTracking(), or scope repetition to a single content stream.

2. trackThoughtPatterns hooks Content, not Thought

In this codebase, model thinking is yielded as a distinct GeminiEventType.Thought event (see packages/core/src/core/turn.ts, getThoughtText(resp)yield { type: Thought, ... }), separate from GeminiEventType.Content. The new detector only runs on Content events, so it never actually sees thought text — only hedged phrasing (should, might, could, etc.) in regular output. If the intent is "detect repetitive reasoning", Thought events need to be handled too. If the intent is "detect repeated hedge prefixes in prose output", the name and the inline comment referring to "internal reasoning" are misleading.

3. READ_FILE_LOOP fires on legitimate initial exploration

With FILE_READ_THRESHOLD = 5 and FILE_READ_WINDOW = 10, a very common opening pattern — list_directory followed by four read_file calls — hits the threshold on the 5th tool call (all 5 are read-like). "Explain the architecture of X" or "summarize this directory" tasks routinely start with exactly this shape and will trip the detector on their first productive moves. Options: raise the threshold (e.g. 8 / 15), or exempt early windows that contain only read-like tools (fresh exploration vs. post-action re-reading).

4. PR description mentions changes not present in the diff

The TLDR lists "Fix EAGAIN pty errors and schema generation build", but the 6 changed files at head 5b1ffd9 (.gitignore, coreToolScheduler.ts + test, loopDetectionService.ts + test, telemetry/types.ts) don't touch pty handling or schema generation. Either the description is stale or those commits are missing.


Minor: .gitignore gains benchmark_test_dir/, which doesn't seem related to loop detection — worth a separate PR or dropping.

euxaristia and others added 3 commits April 19, 2026 00:55
…ation checks

Adds three new detectors to LoopDetectionService, complementing the
existing CONSECUTIVE_IDENTICAL_TOOL_CALLS detector:

- REPETITIVE_THOUGHTS: fires when the same thought-like phrase appears
  THOUGHT_REPEAT_THRESHOLD times across the tracked content history.
  Runs on Content events (previously only on ToolCallRequest, so it
  could never fire).

- READ_FILE_LOOP: fires when FILE_READ_THRESHOLD read-like tools (read,
  cat, view, list) appear within a bounded window.

- ACTION_STAGNATION: fires when the same tool *name* is called
  STAGNATION_THRESHOLD times consecutively regardless of arguments.
  Distinct from CONSECUTIVE_IDENTICAL_TOOL_CALLS (same name AND args)
  and from READ_FILE_LOOP (catches thrashing with varying params).

Shared helper isReadLikeTool() deduplicates read-category checks.
- checkRepetitiveThoughts: scope to a consecutive window of the last
  THOUGHT_REPEAT_THRESHOLD thoughts instead of counting across the full
  retained history, so a long-running session that revisits an earlier
  phrase after genuine progress is no longer flagged as a loop.
- isReadLikeTool: replace loose substring matching on 'read'/'cat'/
  'view'/'list' with an exact-name allowlist (read_file, read_many_files,
  list_directory) plus a read_/list_ prefix fallback for MCP-provided
  tools. Prevents unrelated tools like 'review' or 'listener_bind' from
  contributing to READ_FILE_LOOP detection.
- coreToolScheduler._schedule: prune validationRetryCounts per-tool
  rather than wholesale. Stale counters for tools absent from the
  current batch were surviving and could fire RETRY LOOP DETECTED
  prematurely the next time an unrelated tool was used.

Adds regression tests for each case.
… tool calls

- trackThoughtPatterns previously hooked GeminiEventType.Content and
  flagged hedge phrases ("should", "might", etc.) via a substring regex.
  That never actually sampled the model's reasoning channel, because
  structured thinking is emitted as GeminiEventType.Thought. Replace the
  Content hook with a proper Thought handler that stores a
  (subject, description) signature.
- thoughtHistory was only cleared on reset(promptId), which runs on new
  user prompts. Within a single prompt, the model can drive many tool
  roundtrips; a thought like "Inspect the schema" reappearing three
  times across those roundtrips incorrectly fired REPETITIVE_THOUGHTS
  despite real progress in between. Clear thoughtHistory at every
  ToolCallRequest so repetition only fires within one contiguous
  reasoning stream.
- Update tests to use Thought events, add a regression for the
  cross-roundtrip leak, and assert hedge phrases in Content no longer
  influence thought detection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@euxaristia euxaristia force-pushed the feat/enhanced-loop-detection branch from 7d46d1e to 78c8f43 Compare April 19, 2026 04:55
wenshao
wenshao previously approved these changes Apr 19, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review

@wenshao

wenshao commented Apr 19, 2026

Copy link
Copy Markdown
Collaborator

E2E-verified that the READ_FILE_LOOP false-positive is real, not hypothetical. On the PR head with model.skipLoopDetection: false:

  • Prompt: Summarize what this project is and briefly describe what each source file does. Be concise.
  • Target: a 4-file toy project (README.md, package.json, src/{config,index,utils}.ts)
  • Model: gpt-5.4 via DashScope OpenAI-compatible

The model issued 5 parallel read_file calls in one assistant turn. Instrumenting LoopDetectionService.addAndCheck shows the 5th call trips it:

[LOOPDET] name=read_file recent=5 readLikes=5 sameNameStreak=5
          toolCallLoop=false readFileLoop=true actionStagnation=false

The non-interactive run then exits with empty stdout — no visible reason surfaced to the user.

Two small asks before merge:

  1. Raise FILE_READ_THRESHOLD / FILE_READ_WINDOW to something like 8 / 15, or add a cold-start exemption so the first window of read-likes (which is almost always legitimate opening exploration on a new project) doesn't count as loop evidence. Update the corresponding test to match.
  2. Non-interactive mode currently fails silently when a loop is detected. At minimum, print a one-liner to stderr so users know what happened and which detector fired.

Separately: one behavioral note worth surfacing — packages/cli/src/config/config.ts:1148 defaults skipLoopDetection to true, so none of the new detectors run for default users. Not asking to flip that here, but it means this PR's detector logic is only exercised by users who opt in, and the unit tests enshrine thresholds that break on the first real-world opt-in workload.

Everything else — per-tool validationRetryCounts pruning, Thought-based repetition detection, exact-name + read_/list_ prefix matching for read-like tools, the cross-turn history clearing on ToolCallRequest — looks good.

…eractive mode

Addresses PR QwenLM#3236 review feedback:

1. Raise FILE_READ_THRESHOLD/FILE_READ_WINDOW to 8/15 and add a
   cold-start exemption (hasSeenNonReadTool gate) so opening
   exploration (list_directory + parallel read_file calls) no longer
   trips READ_FILE_LOOP on its first productive move.

2. In non-interactive TEXT mode, print a one-liner to stderr when a
   loop is detected, naming which detector fired. Previously the run
   exited silently with empty stdout. Routed via a new
   getLastLoopType() on LoopDetectionService and an optional
   value.loopType on ServerGeminiLoopDetectedEvent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full trace of how each earlier review concern landed. All blockers closed out — posting APPROVED to supersede the two earlier CHANGES_REQUESTED.

From the 2026-04-18 19:02 re-review (head 5b1ffd9)

  1. thoughtHistory never cleared between tool calls within a single prompt → fixed in 78c8f43: thoughtHistory is now cleared inside resetContentTracking(), so ToolCallRequest resets it the same way it resets streamContentHistory. A user prompt that drives multiple tool roundtrips no longer accumulates hedge-phrase history across boundaries.

  2. trackThoughtPatterns hooks Content, not Thought → fixed in 78c8f43: repetition detection now subscribes to GeminiEventType.Thought (the model's actual reasoning channel surfaced by parseThought(resp) in turn.ts), not substring matching in prose Content. The detector's name and intent now match its wiring.

  3. READ_FILE_LOOP fires on legitimate initial exploration → fixed in b1bd1a6: FILE_READ_THRESHOLD raised 5→8, FILE_READ_WINDOW 10→15, plus a hasSeenNonReadTool cold-start gate so the typical opening list_directory + parallel read_file pattern (e.g. "summarize this project") no longer trips on its first productive move. Tests updated with a primeNonReadTool() helper that explicitly documents the gate semantics.

  4. PR description mentions pty / schema-gen changes not in diff — noted, nit only. Worth a cleanup pass before merge but not blocking.

From the 2026-04-19 07:51 E2E verification

  1. Raise threshold or add cold-start exemption → same b1bd1a6 fix as above.

  2. Non-interactive mode silently exits on loop detection → fixed in b1bd1a6:

    • LoopDetectionService.getLastLoopType() exposes which detector fired
    • ServerGeminiLoopDetectedEvent carries optional value: { loopType }
    • nonInteractiveCli.ts TEXT mode emits a one-liner to stderr naming the detector (READ_FILE_LOOP, REPETITIVE_THOUGHTS, etc.) with a hint pointing at model.skipLoopDetection to disable

Also verified in b1bd1a6 that the change surface stays within loop detection + non-interactive output surfacing — no unrelated drive-bys (6 files, +198/−9, all in packages/core/src/services/, packages/core/src/core/{client,turn}.ts, packages/core/src/index.ts export line, packages/cli/src/nonInteractiveCli.ts).

AI-hallucinated items in earlier CHANGES_REQUESTED

The older review at 2026-04-18 16:39 flagged a ThingkingAssistantContentThinkingAssistantContent rename in packages/sdk-java/... as a Critical breaking change. That file is not in this PR's diff — the 6 changed files are in packages/core/ and packages/cli/, with no touch on sdk-java. Already established as a fabricated concern; calling it out here so the supersession rationale is on record.

CI

  • 2 Windows failures on SettingsDialog > should keep restart prompt when switching scopes — file not touched by this PR, same Windows-flakiness pattern as other recent unrelated PRs
  • Other 10 checks green

LGTM.

@wenshao wenshao merged commit c175fd3 into QwenLM:main Apr 19, 2026
22 of 24 checks passed
@tanzhenxin tanzhenxin added the TBD To Be Discussed label Apr 22, 2026
mabry1985 added a commit to protoLabsAI/protoCLI that referenced this pull request May 2, 2026
…y checks (QwenLM#3236) (#196)

Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com>
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

TBD To Be Discussed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants