feat(core): enhanced loop detection with stagnation + validation-retry checks by euxaristia · Pull Request #3236 · QwenLM/qwen-code

euxaristia · 2026-04-13T19:17:49Z

Summary

Detect tool validation retry loops and inject a stop directive to break the cycle
Enhance loop detection with read-file, action-stagnation, and repetitive-thought checks
Repetitive-thought detection now hooks GeminiEventType.Thought events (the model's actual reasoning channel) rather than substring-matching hedge phrases in prose Content, and resets across tool-call boundaries so one prompt can drive many roundtrips without false positives

Test plan

loopDetectionService.test.ts passes (41 tests)
coreToolScheduler.test.ts passes
Manual: trigger a validation-retry loop and confirm stop directive injection
Manual: confirm thought/action-stagnation/read-file detection fires

wenshao

[Critical] packages/sdk-java/qwencode/src/main/java/com/alibaba/qwen/code/cli/protocol/data/AssistantContent.java:49 renames the public nested Java SDK interface from ThingkingAssistantContent to ThinkingAssistantContent without a compatibility alias. This is a breaking public API change for downstream SDK consumers that still import or reference the old type name, and their code will stop compiling after upgrade. Please keep a deprecated compatibility alias for at least one deprecation cycle.

[Critical] packages/core/src/index.ts:77-116 stops exporting runtime tool class values such as EditTool, ShellTool, ExitPlanModeTool, and TodoWriteTool, and now re-exports them as export type only. External TypeScript/JavaScript consumers importing these symbols from @qwen-code/qwen-code-core for runtime usage will break because those named value exports no longer exist on the package entrypoint. Please preserve the prior value exports from the root barrel or add a compatibility layer/deprecation period instead of converting them to type-only exports in one release.

— gpt-5.4 via Qwen Code /review

euxaristia · 2026-04-18T17:46:52Z

Both "Critical" items in this review reference files that aren't in this PR's diff.

Verified against GET /repos/QwenLM/qwen-code/pulls/3236/files — head 5b1ffd9, 6 files changed, +577/−16:

.gitignore
packages/core/src/core/coreToolScheduler.ts (+ test)
packages/core/src/services/loopDetectionService.ts (+ test)
packages/core/src/telemetry/types.ts

packages/sdk-java/.../AssistantContent.java:49 — the entire packages/sdk-java/ tree is untouched by this PR. No rename of ThingkingAssistantContent → ThinkingAssistantContent happens here.
packages/core/src/index.ts:77-116 — packages/core/src/index.ts is not in the changed-files list. No exports for EditTool, ShellTool, ExitPlanModeTool, or TodoWriteTool are altered by this PR.

For context, the three inline comments from the earlier 02:50 UTC review pass (repetitive-thoughts window, isReadLikeTool substring match, per-tool validationRetryCounts pruning) were all addressed in commit 5b1ffd9 before this second review ran.

— Claude Opus 4.7 (xhigh) via Claude Code

wenshao · 2026-04-18T19:02:30Z

Thanks for the iterations — the per-tool validationRetryCounts pruning and the READ_LIKE_TOOL_NAMES exact-set + read_/list_ prefix fallback are clearly better than the earlier pass. A few concerns remain after a close re-read at head 5b1ffd9:

1. thoughtHistory is never cleared between tool calls within a single prompt

In addAndCheck, ToolCallRequest calls resetContentTracking() which wipes streamContentHistory / contentStats, but thoughtHistory is only cleared in reset(promptId). And reset(promptId) only runs on UserQuery / Cron / Notification (see packages/core/src/core/client.ts around loopDetector.reset(prompt_id)). One user prompt can drive many tool roundtrips in a single turn, so thoughtHistory accumulates across the whole chain. Sequence like:

Content "I should check..." (push)
ToolCallRequest (real progress)
Content "I should check..." (push)
ToolCallRequest
Content "I should check..." (push) → third identical entry → REPETITIVE_THOUGHTS fires

even though the intervening tool calls represent real progress. Either clear thoughtHistory inside resetContentTracking(), or scope repetition to a single content stream.

2. trackThoughtPatterns hooks Content, not Thought

In this codebase, model thinking is yielded as a distinct GeminiEventType.Thought event (see packages/core/src/core/turn.ts, getThoughtText(resp) → yield { type: Thought, ... }), separate from GeminiEventType.Content. The new detector only runs on Content events, so it never actually sees thought text — only hedged phrasing (should, might, could, etc.) in regular output. If the intent is "detect repetitive reasoning", Thought events need to be handled too. If the intent is "detect repeated hedge prefixes in prose output", the name and the inline comment referring to "internal reasoning" are misleading.

3. READ_FILE_LOOP fires on legitimate initial exploration

With FILE_READ_THRESHOLD = 5 and FILE_READ_WINDOW = 10, a very common opening pattern — list_directory followed by four read_file calls — hits the threshold on the 5th tool call (all 5 are read-like). "Explain the architecture of X" or "summarize this directory" tasks routinely start with exactly this shape and will trip the detector on their first productive moves. Options: raise the threshold (e.g. 8 / 15), or exempt early windows that contain only read-like tools (fresh exploration vs. post-action re-reading).

4. PR description mentions changes not present in the diff

The TLDR lists "Fix EAGAIN pty errors and schema generation build", but the 6 changed files at head 5b1ffd9 (.gitignore, coreToolScheduler.ts + test, loopDetectionService.ts + test, telemetry/types.ts) don't touch pty handling or schema generation. Either the description is stale or those commits are missing.

Minor: .gitignore gains benchmark_test_dir/, which doesn't seem related to loop detection — worth a separate PR or dropping.

…ation checks Adds three new detectors to LoopDetectionService, complementing the existing CONSECUTIVE_IDENTICAL_TOOL_CALLS detector: - REPETITIVE_THOUGHTS: fires when the same thought-like phrase appears THOUGHT_REPEAT_THRESHOLD times across the tracked content history. Runs on Content events (previously only on ToolCallRequest, so it could never fire). - READ_FILE_LOOP: fires when FILE_READ_THRESHOLD read-like tools (read, cat, view, list) appear within a bounded window. - ACTION_STAGNATION: fires when the same tool *name* is called STAGNATION_THRESHOLD times consecutively regardless of arguments. Distinct from CONSECUTIVE_IDENTICAL_TOOL_CALLS (same name AND args) and from READ_FILE_LOOP (catches thrashing with varying params). Shared helper isReadLikeTool() deduplicates read-category checks.

- checkRepetitiveThoughts: scope to a consecutive window of the last THOUGHT_REPEAT_THRESHOLD thoughts instead of counting across the full retained history, so a long-running session that revisits an earlier phrase after genuine progress is no longer flagged as a loop. - isReadLikeTool: replace loose substring matching on 'read'/'cat'/ 'view'/'list' with an exact-name allowlist (read_file, read_many_files, list_directory) plus a read_/list_ prefix fallback for MCP-provided tools. Prevents unrelated tools like 'review' or 'listener_bind' from contributing to READ_FILE_LOOP detection. - coreToolScheduler._schedule: prune validationRetryCounts per-tool rather than wholesale. Stale counters for tools absent from the current batch were surviving and could fire RETRY LOOP DETECTED prematurely the next time an unrelated tool was used. Adds regression tests for each case.

… tool calls - trackThoughtPatterns previously hooked GeminiEventType.Content and flagged hedge phrases ("should", "might", etc.) via a substring regex. That never actually sampled the model's reasoning channel, because structured thinking is emitted as GeminiEventType.Thought. Replace the Content hook with a proper Thought handler that stores a (subject, description) signature. - thoughtHistory was only cleared on reset(promptId), which runs on new user prompts. Within a single prompt, the model can drive many tool roundtrips; a thought like "Inspect the schema" reappearing three times across those roundtrips incorrectly fired REPETITIVE_THOUGHTS despite real progress in between. Clear thoughtHistory at every ToolCallRequest so repetition only fires within one contiguous reasoning stream. - Update tests to use Thought events, add a regression for the cross-roundtrip leak, and assert hedge phrases in Content no longer influence thought detection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wenshao

No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review

wenshao · 2026-04-19T07:51:23Z

E2E-verified that the READ_FILE_LOOP false-positive is real, not hypothetical. On the PR head with model.skipLoopDetection: false:

Prompt: Summarize what this project is and briefly describe what each source file does. Be concise.
Target: a 4-file toy project (README.md, package.json, src/{config,index,utils}.ts)
Model: gpt-5.4 via DashScope OpenAI-compatible

The model issued 5 parallel read_file calls in one assistant turn. Instrumenting LoopDetectionService.addAndCheck shows the 5th call trips it:

[LOOPDET] name=read_file recent=5 readLikes=5 sameNameStreak=5
          toolCallLoop=false readFileLoop=true actionStagnation=false

The non-interactive run then exits with empty stdout — no visible reason surfaced to the user.

Two small asks before merge:

Raise FILE_READ_THRESHOLD / FILE_READ_WINDOW to something like 8 / 15, or add a cold-start exemption so the first window of read-likes (which is almost always legitimate opening exploration on a new project) doesn't count as loop evidence. Update the corresponding test to match.
Non-interactive mode currently fails silently when a loop is detected. At minimum, print a one-liner to stderr so users know what happened and which detector fired.

Separately: one behavioral note worth surfacing — packages/cli/src/config/config.ts:1148 defaults skipLoopDetection to true, so none of the new detectors run for default users. Not asking to flip that here, but it means this PR's detector logic is only exercised by users who opt in, and the unit tests enshrine thresholds that break on the first real-world opt-in workload.

Everything else — per-tool validationRetryCounts pruning, Thought-based repetition detection, exact-name + read_/list_ prefix matching for read-like tools, the cross-turn history clearing on ToolCallRequest — looks good.

…eractive mode Addresses PR QwenLM#3236 review feedback: 1. Raise FILE_READ_THRESHOLD/FILE_READ_WINDOW to 8/15 and add a cold-start exemption (hasSeenNonReadTool gate) so opening exploration (list_directory + parallel read_file calls) no longer trips READ_FILE_LOOP on its first productive move. 2. In non-interactive TEXT mode, print a one-liner to stderr when a loop is detected, naming which detector fired. Previously the run exited silently with empty stdout. Routed via a new getLastLoopType() on LoopDetectionService and an optional value.loopType on ServerGeminiLoopDetectedEvent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wenshao

Full trace of how each earlier review concern landed. All blockers closed out — posting APPROVED to supersede the two earlier CHANGES_REQUESTED.

From the 2026-04-18 19:02 re-review (head `5b1ffd9`)

thoughtHistory never cleared between tool calls within a single prompt → fixed in 78c8f43: thoughtHistory is now cleared inside resetContentTracking(), so ToolCallRequest resets it the same way it resets streamContentHistory. A user prompt that drives multiple tool roundtrips no longer accumulates hedge-phrase history across boundaries.
trackThoughtPatterns hooks Content, not Thought → fixed in 78c8f43: repetition detection now subscribes to GeminiEventType.Thought (the model's actual reasoning channel surfaced by parseThought(resp) in turn.ts), not substring matching in prose Content. The detector's name and intent now match its wiring.
READ_FILE_LOOP fires on legitimate initial exploration → fixed in b1bd1a6: FILE_READ_THRESHOLD raised 5→8, FILE_READ_WINDOW 10→15, plus a hasSeenNonReadTool cold-start gate so the typical opening list_directory + parallel read_file pattern (e.g. "summarize this project") no longer trips on its first productive move. Tests updated with a primeNonReadTool() helper that explicitly documents the gate semantics.
PR description mentions pty / schema-gen changes not in diff — noted, nit only. Worth a cleanup pass before merge but not blocking.

From the 2026-04-19 07:51 E2E verification

Raise threshold or add cold-start exemption → same b1bd1a6 fix as above.
Non-interactive mode silently exits on loop detection → fixed in b1bd1a6:
- LoopDetectionService.getLastLoopType() exposes which detector fired
- ServerGeminiLoopDetectedEvent carries optional value: { loopType }
- nonInteractiveCli.ts TEXT mode emits a one-liner to stderr naming the detector (READ_FILE_LOOP, REPETITIVE_THOUGHTS, etc.) with a hint pointing at model.skipLoopDetection to disable

Also verified in b1bd1a6 that the change surface stays within loop detection + non-interactive output surfacing — no unrelated drive-bys (6 files, +198/−9, all in packages/core/src/services/, packages/core/src/core/{client,turn}.ts, packages/core/src/index.ts export line, packages/cli/src/nonInteractiveCli.ts).

AI-hallucinated items in earlier CHANGES_REQUESTED

The older review at 2026-04-18 16:39 flagged a ThingkingAssistantContent → ThinkingAssistantContent rename in packages/sdk-java/... as a Critical breaking change. That file is not in this PR's diff — the 6 changed files are in packages/core/ and packages/cli/, with no touch on sdk-java. Already established as a fabricated concern; calling it out here so the supersession rationale is on record.

CI

2 Windows failures on SettingsDialog > should keep restart prompt when switching scopes — file not touched by this PR, same Windows-flakiness pattern as other recent unrelated PRs
Other 10 checks green

LGTM.

…y checks (QwenLM#3236) (#196) Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com>

…y checks (QwenLM#3236)

euxaristia force-pushed the feat/enhanced-loop-detection branch 2 times, most recently from 055c756 to fa14ed9 Compare April 14, 2026 01:17

github-actions Bot mentioned this pull request Apr 14, 2026

📊 AI CLI 工具社区动态日报 2026-04-14 gsscsd/big_model_radar#183

Open

wenshao requested changes Apr 18, 2026

View reviewed changes

Comment thread packages/core/src/services/loopDetectionService.ts

Comment thread packages/core/src/core/coreToolScheduler.ts

Comment thread packages/core/src/services/loopDetectionService.ts Outdated

euxaristia force-pushed the feat/enhanced-loop-detection branch from fa14ed9 to 5b1ffd9 Compare April 18, 2026 14:12

wenshao requested changes Apr 18, 2026

View reviewed changes

euxaristia and others added 3 commits April 19, 2026 00:55

euxaristia force-pushed the feat/enhanced-loop-detection branch from 7d46d1e to 78c8f43 Compare April 19, 2026 04:55

wenshao previously approved these changes Apr 19, 2026

View reviewed changes

euxaristia dismissed wenshao’s stale review via b1bd1a6 April 19, 2026 09:39

wenshao approved these changes Apr 19, 2026

View reviewed changes

wenshao merged commit c175fd3 into QwenLM:main Apr 19, 2026
22 of 24 checks passed

tanzhenxin added the TBD To Be Discussed label Apr 22, 2026

This was referenced May 2, 2026

upstream port: enhanced loop detection (#3236) protoLabsAI/protoCLI#185

Closed

feat(core): backport enhanced loop detection (#3236) protoLabsAI/protoCLI#196

Merged

mabry1985 added a commit to protoLabsAI/protoCLI that referenced this pull request May 2, 2026

feat(core): enhanced loop detection with stagnation + validation-retr…

7249945

…y checks (QwenLM#3236) (#196) Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com>

mabry1985 mentioned this pull request May 2, 2026

feat(cli,core): backport upstream batch — retry directive, SubAgent height, model-switch event (#3178, #3721, #3667) protoLabsAI/protoCLI#199

Merged

5 tasks

xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026

feat(core): enhanced loop detection with stagnation + validation-retr…

daf69b9

…y checks (QwenLM#3236)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): enhanced loop detection with stagnation + validation-retry checks#3236

feat(core): enhanced loop detection with stagnation + validation-retry checks#3236
wenshao merged 4 commits into
QwenLM:mainfrom
euxaristia:feat/enhanced-loop-detection

euxaristia commented Apr 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Uh oh!

euxaristia commented Apr 18, 2026

Uh oh!

wenshao commented Apr 18, 2026

Uh oh!

wenshao left a comment

Uh oh!

wenshao commented Apr 19, 2026

Uh oh!

wenshao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

euxaristia commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

euxaristia commented Apr 18, 2026

Uh oh!

wenshao commented Apr 18, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented Apr 19, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

From the 2026-04-18 19:02 re-review (head 5b1ffd9)

From the 2026-04-19 07:51 E2E verification

AI-hallucinated items in earlier CHANGES_REQUESTED

CI

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

euxaristia commented Apr 13, 2026 •

edited

Loading

From the 2026-04-18 19:02 re-review (head `5b1ffd9`)