feat(core): enhanced loop detection with stagnation + validation-retry checks#3236
Conversation
055c756 to
fa14ed9
Compare
fa14ed9 to
5b1ffd9
Compare
wenshao
left a comment
There was a problem hiding this comment.
[Critical] packages/sdk-java/qwencode/src/main/java/com/alibaba/qwen/code/cli/protocol/data/AssistantContent.java:49 renames the public nested Java SDK interface from ThingkingAssistantContent to ThinkingAssistantContent without a compatibility alias. This is a breaking public API change for downstream SDK consumers that still import or reference the old type name, and their code will stop compiling after upgrade. Please keep a deprecated compatibility alias for at least one deprecation cycle.
[Critical] packages/core/src/index.ts:77-116 stops exporting runtime tool class values such as EditTool, ShellTool, ExitPlanModeTool, and TodoWriteTool, and now re-exports them as export type only. External TypeScript/JavaScript consumers importing these symbols from @qwen-code/qwen-code-core for runtime usage will break because those named value exports no longer exist on the package entrypoint. Please preserve the prior value exports from the root barrel or add a compatibility layer/deprecation period instead of converting them to type-only exports in one release.
— gpt-5.4 via Qwen Code /review
|
Both "Critical" items in this review reference files that aren't in this PR's diff. Verified against
For context, the three inline comments from the earlier 02:50 UTC review pass (repetitive-thoughts window, — Claude Opus 4.7 (xhigh) via Claude Code |
|
Thanks for the iterations — the per-tool 1. In
even though the intervening tool calls represent real progress. Either clear 2. In this codebase, model thinking is yielded as a distinct 3. With 4. PR description mentions changes not present in the diff The TLDR lists "Fix EAGAIN pty errors and schema generation build", but the 6 changed files at head Minor: |
…ation checks Adds three new detectors to LoopDetectionService, complementing the existing CONSECUTIVE_IDENTICAL_TOOL_CALLS detector: - REPETITIVE_THOUGHTS: fires when the same thought-like phrase appears THOUGHT_REPEAT_THRESHOLD times across the tracked content history. Runs on Content events (previously only on ToolCallRequest, so it could never fire). - READ_FILE_LOOP: fires when FILE_READ_THRESHOLD read-like tools (read, cat, view, list) appear within a bounded window. - ACTION_STAGNATION: fires when the same tool *name* is called STAGNATION_THRESHOLD times consecutively regardless of arguments. Distinct from CONSECUTIVE_IDENTICAL_TOOL_CALLS (same name AND args) and from READ_FILE_LOOP (catches thrashing with varying params). Shared helper isReadLikeTool() deduplicates read-category checks.
- checkRepetitiveThoughts: scope to a consecutive window of the last THOUGHT_REPEAT_THRESHOLD thoughts instead of counting across the full retained history, so a long-running session that revisits an earlier phrase after genuine progress is no longer flagged as a loop. - isReadLikeTool: replace loose substring matching on 'read'/'cat'/ 'view'/'list' with an exact-name allowlist (read_file, read_many_files, list_directory) plus a read_/list_ prefix fallback for MCP-provided tools. Prevents unrelated tools like 'review' or 'listener_bind' from contributing to READ_FILE_LOOP detection. - coreToolScheduler._schedule: prune validationRetryCounts per-tool rather than wholesale. Stale counters for tools absent from the current batch were surviving and could fire RETRY LOOP DETECTED prematurely the next time an unrelated tool was used. Adds regression tests for each case.
… tool calls
- trackThoughtPatterns previously hooked GeminiEventType.Content and
flagged hedge phrases ("should", "might", etc.) via a substring regex.
That never actually sampled the model's reasoning channel, because
structured thinking is emitted as GeminiEventType.Thought. Replace the
Content hook with a proper Thought handler that stores a
(subject, description) signature.
- thoughtHistory was only cleared on reset(promptId), which runs on new
user prompts. Within a single prompt, the model can drive many tool
roundtrips; a thought like "Inspect the schema" reappearing three
times across those roundtrips incorrectly fired REPETITIVE_THOUGHTS
despite real progress in between. Clear thoughtHistory at every
ToolCallRequest so repetition only fires within one contiguous
reasoning stream.
- Update tests to use Thought events, add a regression for the
cross-roundtrip leak, and assert hedge phrases in Content no longer
influence thought detection.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7d46d1e to
78c8f43
Compare
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review
|
E2E-verified that the READ_FILE_LOOP false-positive is real, not hypothetical. On the PR head with
The model issued 5 parallel The non-interactive run then exits with empty stdout — no visible reason surfaced to the user. Two small asks before merge:
Separately: one behavioral note worth surfacing — Everything else — per-tool |
…eractive mode Addresses PR QwenLM#3236 review feedback: 1. Raise FILE_READ_THRESHOLD/FILE_READ_WINDOW to 8/15 and add a cold-start exemption (hasSeenNonReadTool gate) so opening exploration (list_directory + parallel read_file calls) no longer trips READ_FILE_LOOP on its first productive move. 2. In non-interactive TEXT mode, print a one-liner to stderr when a loop is detected, naming which detector fired. Previously the run exited silently with empty stdout. Routed via a new getLastLoopType() on LoopDetectionService and an optional value.loopType on ServerGeminiLoopDetectedEvent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
wenshao
left a comment
There was a problem hiding this comment.
Full trace of how each earlier review concern landed. All blockers closed out — posting APPROVED to supersede the two earlier CHANGES_REQUESTED.
From the 2026-04-18 19:02 re-review (head 5b1ffd9)
-
thoughtHistorynever cleared between tool calls within a single prompt → fixed in78c8f43:thoughtHistoryis now cleared insideresetContentTracking(), soToolCallRequestresets it the same way it resetsstreamContentHistory. A user prompt that drives multiple tool roundtrips no longer accumulates hedge-phrase history across boundaries. -
trackThoughtPatternshooksContent, notThought→ fixed in78c8f43: repetition detection now subscribes toGeminiEventType.Thought(the model's actual reasoning channel surfaced byparseThought(resp)inturn.ts), not substring matching in proseContent. The detector's name and intent now match its wiring. -
READ_FILE_LOOPfires on legitimate initial exploration → fixed inb1bd1a6:FILE_READ_THRESHOLDraised 5→8,FILE_READ_WINDOW10→15, plus ahasSeenNonReadToolcold-start gate so the typical openinglist_directory+ parallelread_filepattern (e.g. "summarize this project") no longer trips on its first productive move. Tests updated with aprimeNonReadTool()helper that explicitly documents the gate semantics. -
PR description mentions pty / schema-gen changes not in diff — noted, nit only. Worth a cleanup pass before merge but not blocking.
From the 2026-04-19 07:51 E2E verification
-
Raise threshold or add cold-start exemption → same
b1bd1a6fix as above. -
Non-interactive mode silently exits on loop detection → fixed in
b1bd1a6:LoopDetectionService.getLastLoopType()exposes which detector firedServerGeminiLoopDetectedEventcarries optionalvalue: { loopType }nonInteractiveCli.tsTEXT mode emits a one-liner to stderr naming the detector (READ_FILE_LOOP,REPETITIVE_THOUGHTS, etc.) with a hint pointing atmodel.skipLoopDetectionto disable
Also verified in b1bd1a6 that the change surface stays within loop detection + non-interactive output surfacing — no unrelated drive-bys (6 files, +198/−9, all in packages/core/src/services/, packages/core/src/core/{client,turn}.ts, packages/core/src/index.ts export line, packages/cli/src/nonInteractiveCli.ts).
AI-hallucinated items in earlier CHANGES_REQUESTED
The older review at 2026-04-18 16:39 flagged a ThingkingAssistantContent → ThinkingAssistantContent rename in packages/sdk-java/... as a Critical breaking change. That file is not in this PR's diff — the 6 changed files are in packages/core/ and packages/cli/, with no touch on sdk-java. Already established as a fabricated concern; calling it out here so the supersession rationale is on record.
CI
- 2 Windows failures on
SettingsDialog > should keep restart prompt when switching scopes— file not touched by this PR, same Windows-flakiness pattern as other recent unrelated PRs - Other 10 checks green
LGTM.
…y checks (QwenLM#3236) (#196) Co-authored-by: euxaristia <25621994+euxaristia@users.noreply.github.com>
Summary
GeminiEventType.Thoughtevents (the model's actual reasoning channel) rather than substring-matching hedge phrases in proseContent, and resets across tool-call boundaries so one prompt can drive many roundtrips without false positivesTest plan
loopDetectionService.test.tspasses (41 tests)coreToolScheduler.test.tspasses