Skip to content

bug(provider/dashscope): side-query thinking disable doesn't reach qwen3 series — 'enable_thinking' in typed check never fires #4501

@doudouOUC

Description

@doudouOUC

TL;DR

pipeline.ts:362-365 only re-writes enable_thinking when the field already exists on the request body. The default OpenAI-compatible request body never pre-populates this qwen3-specific extension, so the check never fires, and thinkingConfig.includeThoughts === false (set by every side-query via sideQuery.ts:applyThinkingDefault) silently fails to disable thinking on qwen3 series models.

The DeepSeek path next to it (pipeline.ts:384-386) does this correctly via a hostname-gated unconditional typed['thinking'] = { type: 'disabled' }. There's no equivalent for qwen3 / DashScope.

Repro / evidence (production data)

tool-use-summary side-queries on qwen3.5-flash (a hybrid thinking model with thinking enabled by default) showing 24–95× output bloat — visible output is ~3–6 tokens (a 30-char git-commit-subject label, exactly what maxOutputTokens: 60 budgeted), but output_token_count (which on OpenAI-compatible reasoning models includes reasoning tokens) is 1454–5708:

# Input Output (visible + reasoning) Total Duration
1 242 2 441 2 683 17.8 s
2 296 2 067 2 363 15.8 s
3 332 5 708 6 040 41.8 s
4 337 2 886 3 223 22.0 s
5 323 4 085 4 408 29.0 s
6 328 5 472 5 800 38.5 s
7 351 1 454 1 805 11.5 s

For a cosmetic 30-char label whose design budget is ~1 s (per the JSDoc on toolUseSummary.ts:13), this is 11–42× over budget. The visible output complies with maxOutputTokens: 60 — the entire overshoot is reasoning tokens that should have been disabled at the model level.

Why the disable doesn't reach qwen3

// pipeline.ts:358-365
const reasoningDisabled =
  request.config?.thinkingConfig?.includeThoughts === false ||
  this.contentGeneratorConfig.reasoning === false;
if (reasoningDisabled) {
  const typed = providerRequest as unknown as Record<string, unknown>;
  if ('enable_thinking' in typed) {     // ← BUG
    typed['enable_thinking'] = false;
  }
  ...
}

'enable_thinking' in typed checks whether the wire body already has the field. It never will, because:

  • enable_thinking is a qwen3 non-standard extension, not part of OpenAI Chat Completions
  • buildBaseRequest constructs a vanilla OpenAI-compatible payload that never includes this field
  • DashScopeOpenAICompatibleProvider.buildRequest (dashscope.ts:165-208) only injects extra_body from user config — it never auto-injects an enable_thinking: false based on includeThoughts

So:

  1. sideQuery.ts:applyThinkingDefault sets thinkingConfig.includeThoughts = false (correct intent)
  2. pipeline.ts:358-360 correctly detects reasoningDisabled = true (correct)
  3. pipeline.ts:362-364 checks 'enable_thinking' in typedalways false → nothing happens
  4. Wire body goes out without any thinking-disable signal
  5. qwen3.5-flash keeps thinking-by-default
  6. Reasoning tokens get burned on every cosmetic side-query

The DeepSeek path immediately after (pipeline.ts:384-386) does the equivalent unconditionally:

if (isDeepSeekHostname(this.contentGeneratorConfig)) {
  typed['thinking'] = { type: 'disabled' };   // direct set, no `in` check
}

There is no isDashScopeHostname / isQwen3Series parallel branch.

pipeline.ts:471-473 documents the right wire shape but no code emits it:

- qwen3 series        — model-dependent; can be manually disabled via `extra_body.enable_thinking`

Affected paths (every side-query on qwen3)

sideQuery.ts:applyThinkingDefault enforces includeThoughts: false for all side-queries, so the bug fires on:

  • tool-use-summary (this report's evidence — fires once per tool batch)
  • session-title (every turn)
  • prompt-suggestion (every turn)
  • auto-memory-recall (every prompt)
  • chat_compression (when fallback hits a qwen3 fast model)
  • next-speaker-check
  • subagentGenerator (planning side-queries)
  • relevanceSelector, forget, sessionRecap, ArenaManager (each call sites of runSideQuery)

A typical session sees 10–30 side-queries per user prompt. Each currently burns 1.5–6 K reasoning tokens it shouldn't, in 11–42 s of wall time the design budgeted for ~1 s.

Impact

  1. Cost: 24–95× token bloat on every cosmetic side-query (multiply by ≥10 side-queries per turn × N users)
  2. Latency: 11–42 s for what should be ~1 s — completely defeats the "hidden behind 5–30 s main-model streaming" design assumption documented in toolUseSummary.ts:13
  3. Cascading congestion: heavy in-flight side-queries on the fast-model gateway path correlate with main-conversation call hangs (see related observation in #TODO another issue covering the network gateway side, separate root cause)
  4. No user-visible benefit: cleanSummary() already takes only the first line and caps at 100 chars (MAX_SUMMARY_LENGTH), so all the reasoning output is discarded client-side

Suggested fix

Two equivalent routes — pick whichever fits the codebase style:

Option A (smaller diff): in pipeline.ts:362-365, drop the in typed guard and inject via extra_body with model/hostname gating, mirroring the DeepSeek branch:

if (isQwen3Series(this.contentGeneratorConfig.model) ||
    DashScopeOpenAICompatibleProvider.isDashScopeProvider(this.contentGeneratorConfig)) {
  const eb = (typed['extra_body'] as Record<string, unknown> | undefined) ?? {};
  typed['extra_body'] = { ...eb, enable_thinking: false };
}

Option B (cleaner separation): lift the responsibility to DashScopeOpenAICompatibleProvider.buildRequest (dashscope.ts:165), where it can inspect request.config?.thinkingConfig?.includeThoughts === false and inject extra_body.enable_thinking: false. The pipeline-level branch then no longer needs a qwen3 case.

Test: add a regression assertion that with thinkingConfig.includeThoughts: false on a qwen3 model, the wire request body contains extra_body: { enable_thinking: false }.

Affected version

Reproduced on origin/main @ 84f408017. The buggy 'enable_thinking' in typed check has been there since the reasoning-disable consolidation — see git blame on pipeline.ts:362-365.

Related

  • packages/core/src/utils/sideQuery.ts:applyThinkingDefault — sets includeThoughts: false for all side-queries
  • packages/core/src/core/openaiContentGenerator/pipeline.ts:362-365 — bug location
  • packages/core/src/core/openaiContentGenerator/pipeline.ts:384-386 — DeepSeek's working pattern
  • packages/core/src/core/openaiContentGenerator/provider/dashscope.ts:165-208 — provider buildRequest where Option B fix would live
  • packages/core/src/services/toolUseSummary.ts — the most visible victim
  • Issue bug(telemetry): qwen-code.interaction span has wrong trace id (escapes session root context) #4486 (telemetry trace id bug) — separate issue, found together while debugging stuck sessions

Metadata

Metadata

Assignees

Labels

category/telemetryTelemetry and analyticspriority/P2Medium - Moderately impactful, noticeable problemstatus/needs-triageIssue needs to be triaged and labeledtype/bugSomething isn't working as expected

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions