fix(core): allow thought-only responses in GeminiChat stream validation#3251
Conversation
📋 Review SummaryThis PR fixes a critical bug where reasoning/thinking models (e.g., 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Models using thinking/reasoning modes may emit only thought content without explicit text output. The stream validation previously rejected these as 'empty' responses. Now accepts responses that contain either text content or thought content when a finish reason is present. (cherry picked from commit a0b1391)
365f0ef to
6026843
Compare
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review
yiliang114
left a comment
There was a problem hiding this comment.
Main fix looks right to me. The current head also already addressed the earlier stale-comment nit in the test file.
I also did a quick local pass on the GeminiChat path and the related thought-only subagent behavior, and the targeted tests, typecheck, and lint looked good. No further comments from me.
|
This PR doesn't fix the error for me. I keep getting > /status
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Status │
│ │
│ Qwen Code 0.15.6-nightly.20260504.e617f20d1 (d4baaf9e0) │
│ Runtime Node.js v22.22.2 / npm 10.9.7 │
│ OS linux x64 (6.8.0-110-generic) │
│ │
│ Auth API Key - openai │
│ Base URL http://127.0.0.1:11434/v1 │
│ Model qwen3.6:27b Ollama service for self hosted qwen3.6: services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "127.0.0.1:11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_KEEP_ALIVE=24h
- OLLAMA_FLASH_ATTENTION=1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]Here are my relevant qwen settings: ...
"env": {
"OLLAMA_API_KEY": "***"
},
"modelProviders": {
"openai": [
{
"id": "qwen3.6:27b",
"name": "Local Qwen3.6 27B",
"envKey": "OLLAMA_API_KEY",
"baseUrl": "http://127.0.0.1:11434/v1",
"generationConfig": {
"timeout": 300000,
"maxRetries": 1,
"contextWindowSize": 262144,
"modalities": {
"image": true
},
"samplingParams": {
"temperature": 0.2
}
}
}
]
},
"security": {
"auth": {
"selectedType": "openai"
}
},
"model": {
"name": "qwen3.6:27b"
},
"$version": 3,
....
Also qwen just ocasionally stops executing without any error at all, but I suspect its related to the error above as well. |
…on (QwenLM#3251) Models using thinking/reasoning modes may emit only thought content without explicit text output. The stream validation previously rejected these as 'empty' responses. Now accepts responses that contain either text content or thought content when a finish reason is present. (cherry picked from commit e8db8679305370de91e466475e8e672b2e782499) Co-authored-by: mingholy.lmh <mingholy.lmh@alibaba-inc.com>
Problem
Subagents running against thinking/reasoning models (qwen3-thinking, qwen-plus-thinking, etc.) hit
Failed to run subagent: Model stream ended with empty response text.on code-review-style tasks like/review's undirected audit:The existing transient-stream retry budget (
INVALID_STREAM_RETRY_CONFIG.maxRetries = 2) retries twice, but the same prompt + same model consistently produces the same thought-only response, so all three attempts fail.Root cause
packages/core/src/core/geminiChat.ts:939validates that a successful stream has either a tool call or non-emptycontentText.contentTextis built fromconsolidatedHistoryParts, which filters out parts wherepart.thought === true:So when a reasoning model emits only
thought: trueparts and finishes with a validfinishReason: STOP,contentTextis empty and the validator throwsInvalidStreamError('NO_RESPONSE_TEXT'). This is a legitimate response shape for thinking-mode providers — they emit internal reasoning and sometimes end without producing a distinct visible-text part.Fix
Accept responses that contain thought text when no visible-text content is present. The validator now asks "is there any content?" instead of "is there visible text?":
Truly empty responses (no text AND no thought AND no tool call) still throw — only the "thought-only is empty" misclassification is fixed.
Attribution
This is cherry-picked from the previously abandoned PR #2636 by @Mingholy (commit
a0b13911f), which was closed on 2026-03-26 without human review despite being a correct fix for two user-reported issues (#2530 and #1700). The authorship on the commit is preserved viagit cherry-pick -x. Reopening here so the fix is not lost — it matches a production failure hit today on a/reviewsubagent.Test plan
npx vitest run packages/core/src/core/geminiChat.test.ts— 46 tests pass (43 pre-existing + 3 new thought-only coverage cases)The three new test cases (from mingholy's original PR) cover:
The existing "empty response" test is retained but restructured to assert that truly empty responses (no text, no thoughts, no tool call) still throw — regression-proofing the negative case.
Related issues