You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pipeline.ts:362-365 only re-writes enable_thinkingwhen the field already exists on the request body. The default OpenAI-compatible request body never pre-populates this qwen3-specific extension, so the check never fires, and thinkingConfig.includeThoughts === false (set by every side-query via sideQuery.ts:applyThinkingDefault) silently fails to disable thinking on qwen3 series models.
The DeepSeek path next to it (pipeline.ts:384-386) does this correctly via a hostname-gated unconditional typed['thinking'] = { type: 'disabled' }. There's no equivalent for qwen3 / DashScope.
Repro / evidence (production data)
tool-use-summary side-queries on qwen3.5-flash (a hybrid thinking model with thinking enabled by default) showing 24–95× output bloat — visible output is ~3–6 tokens (a 30-char git-commit-subject label, exactly what maxOutputTokens: 60 budgeted), but output_token_count (which on OpenAI-compatible reasoning models includes reasoning tokens) is 1454–5708:
#
Input
Output (visible + reasoning)
Total
Duration
1
242
2 441
2 683
17.8 s
2
296
2 067
2 363
15.8 s
3
332
5 708
6 040
41.8 s
4
337
2 886
3 223
22.0 s
5
323
4 085
4 408
29.0 s
6
328
5 472
5 800
38.5 s
7
351
1 454
1 805
11.5 s
For a cosmetic 30-char label whose design budget is ~1 s (per the JSDoc on toolUseSummary.ts:13), this is 11–42× over budget. The visible output complies with maxOutputTokens: 60 — the entire overshoot is reasoning tokens that should have been disabled at the model level.
'enable_thinking' in typed checks whether the wire body already has the field. It never will, because:
enable_thinking is a qwen3 non-standard extension, not part of OpenAI Chat Completions
buildBaseRequest constructs a vanilla OpenAI-compatible payload that never includes this field
DashScopeOpenAICompatibleProvider.buildRequest (dashscope.ts:165-208) only injects extra_body from user config — it never auto-injects an enable_thinking: false based on includeThoughts
Wire body goes out without any thinking-disable signal
qwen3.5-flash keeps thinking-by-default
Reasoning tokens get burned on every cosmetic side-query
The DeepSeek path immediately after (pipeline.ts:384-386) does the equivalent unconditionally:
if(isDeepSeekHostname(this.contentGeneratorConfig)){typed['thinking']={type: 'disabled'};// direct set, no `in` check}
There is no isDashScopeHostname / isQwen3Series parallel branch.
pipeline.ts:471-473 documents the right wire shape but no code emits it:
- qwen3 series — model-dependent; can be manually disabled via `extra_body.enable_thinking`
Affected paths (every side-query on qwen3)
sideQuery.ts:applyThinkingDefault enforces includeThoughts: false for all side-queries, so the bug fires on:
tool-use-summary (this report's evidence — fires once per tool batch)
session-title (every turn)
prompt-suggestion (every turn)
auto-memory-recall (every prompt)
chat_compression (when fallback hits a qwen3 fast model)
next-speaker-check
subagentGenerator (planning side-queries)
relevanceSelector, forget, sessionRecap, ArenaManager (each call sites of runSideQuery)
A typical session sees 10–30 side-queries per user prompt. Each currently burns 1.5–6 K reasoning tokens it shouldn't, in 11–42 s of wall time the design budgeted for ~1 s.
Impact
Cost: 24–95× token bloat on every cosmetic side-query (multiply by ≥10 side-queries per turn × N users)
Latency: 11–42 s for what should be ~1 s — completely defeats the "hidden behind 5–30 s main-model streaming" design assumption documented in toolUseSummary.ts:13
Cascading congestion: heavy in-flight side-queries on the fast-model gateway path correlate with main-conversation call hangs (see related observation in #TODO another issue covering the network gateway side, separate root cause)
No user-visible benefit: cleanSummary() already takes only the first line and caps at 100 chars (MAX_SUMMARY_LENGTH), so all the reasoning output is discarded client-side
Suggested fix
Two equivalent routes — pick whichever fits the codebase style:
Option A (smaller diff): in pipeline.ts:362-365, drop the in typed guard and inject via extra_body with model/hostname gating, mirroring the DeepSeek branch:
Option B (cleaner separation): lift the responsibility to DashScopeOpenAICompatibleProvider.buildRequest (dashscope.ts:165), where it can inspect request.config?.thinkingConfig?.includeThoughts === false and inject extra_body.enable_thinking: false. The pipeline-level branch then no longer needs a qwen3 case.
Test: add a regression assertion that with thinkingConfig.includeThoughts: false on a qwen3 model, the wire request body contains extra_body: { enable_thinking: false }.
Affected version
Reproduced on origin/main @ 84f408017. The buggy 'enable_thinking' in typed check has been there since the reasoning-disable consolidation — see git blame on pipeline.ts:362-365.
Related
packages/core/src/utils/sideQuery.ts:applyThinkingDefault — sets includeThoughts: false for all side-queries
TL;DR
pipeline.ts:362-365only re-writesenable_thinkingwhen the field already exists on the request body. The default OpenAI-compatible request body never pre-populates this qwen3-specific extension, so the check never fires, andthinkingConfig.includeThoughts === false(set by every side-query viasideQuery.ts:applyThinkingDefault) silently fails to disable thinking on qwen3 series models.The DeepSeek path next to it (
pipeline.ts:384-386) does this correctly via a hostname-gated unconditionaltyped['thinking'] = { type: 'disabled' }. There's no equivalent for qwen3 / DashScope.Repro / evidence (production data)
tool-use-summaryside-queries onqwen3.5-flash(a hybrid thinking model with thinking enabled by default) showing 24–95× output bloat — visible output is ~3–6 tokens (a 30-char git-commit-subject label, exactly whatmaxOutputTokens: 60budgeted), butoutput_token_count(which on OpenAI-compatible reasoning models includes reasoning tokens) is 1454–5708:For a cosmetic 30-char label whose design budget is ~1 s (per the JSDoc on
toolUseSummary.ts:13), this is 11–42× over budget. The visible output complies withmaxOutputTokens: 60— the entire overshoot is reasoning tokens that should have been disabled at the model level.Why the disable doesn't reach qwen3
'enable_thinking' in typedchecks whether the wire body already has the field. It never will, because:enable_thinkingis a qwen3 non-standard extension, not part of OpenAI Chat CompletionsbuildBaseRequestconstructs a vanilla OpenAI-compatible payload that never includes this fieldDashScopeOpenAICompatibleProvider.buildRequest(dashscope.ts:165-208) only injectsextra_bodyfrom user config — it never auto-injects anenable_thinking: falsebased onincludeThoughtsSo:
sideQuery.ts:applyThinkingDefaultsetsthinkingConfig.includeThoughts = false(correct intent)pipeline.ts:358-360correctly detectsreasoningDisabled = true(correct)pipeline.ts:362-364checks'enable_thinking' in typed→ always false → nothing happensThe DeepSeek path immediately after (
pipeline.ts:384-386) does the equivalent unconditionally:There is no
isDashScopeHostname/isQwen3Seriesparallel branch.pipeline.ts:471-473documents the right wire shape but no code emits it:Affected paths (every side-query on qwen3)
sideQuery.ts:applyThinkingDefaultenforcesincludeThoughts: falsefor all side-queries, so the bug fires on:tool-use-summary(this report's evidence — fires once per tool batch)session-title(every turn)prompt-suggestion(every turn)auto-memory-recall(every prompt)chat_compression(when fallback hits a qwen3 fast model)next-speaker-checksubagentGenerator(planning side-queries)relevanceSelector,forget,sessionRecap,ArenaManager(each call sites ofrunSideQuery)A typical session sees 10–30 side-queries per user prompt. Each currently burns 1.5–6 K reasoning tokens it shouldn't, in 11–42 s of wall time the design budgeted for ~1 s.
Impact
toolUseSummary.ts:13cleanSummary()already takes only the first line and caps at 100 chars (MAX_SUMMARY_LENGTH), so all the reasoning output is discarded client-sideSuggested fix
Two equivalent routes — pick whichever fits the codebase style:
Option A (smaller diff): in
pipeline.ts:362-365, drop thein typedguard and inject viaextra_bodywith model/hostname gating, mirroring the DeepSeek branch:Option B (cleaner separation): lift the responsibility to
DashScopeOpenAICompatibleProvider.buildRequest(dashscope.ts:165), where it can inspectrequest.config?.thinkingConfig?.includeThoughts === falseand injectextra_body.enable_thinking: false. The pipeline-level branch then no longer needs a qwen3 case.Test: add a regression assertion that with
thinkingConfig.includeThoughts: falseon a qwen3 model, the wire request body containsextra_body: { enable_thinking: false }.Affected version
Reproduced on
origin/main @ 84f408017. The buggy'enable_thinking' in typedcheck has been there since the reasoning-disable consolidation — see git blame onpipeline.ts:362-365.Related
packages/core/src/utils/sideQuery.ts:applyThinkingDefault— setsincludeThoughts: falsefor all side-queriespackages/core/src/core/openaiContentGenerator/pipeline.ts:362-365— bug locationpackages/core/src/core/openaiContentGenerator/pipeline.ts:384-386— DeepSeek's working patternpackages/core/src/core/openaiContentGenerator/provider/dashscope.ts:165-208— provider buildRequest where Option B fix would livepackages/core/src/services/toolUseSummary.ts— the most visible victim