bug(provider/dashscope): side-query thinking disable doesn't reach qwen3 series — 'enable_thinking' in typed check never fires

## TL;DR

`pipeline.ts:362-365` only re-writes `enable_thinking` **when the field already exists on the request body**. The default OpenAI-compatible request body never pre-populates this qwen3-specific extension, so the check never fires, and `thinkingConfig.includeThoughts === false` (set by every side-query via `sideQuery.ts:applyThinkingDefault`) silently fails to disable thinking on qwen3 series models.

The DeepSeek path next to it (`pipeline.ts:384-386`) does this correctly via a hostname-gated unconditional `typed['thinking'] = { type: 'disabled' }`. There's no equivalent for qwen3 / DashScope.

## Repro / evidence (production data)

`tool-use-summary` side-queries on `qwen3.5-flash` (a hybrid thinking model with thinking enabled by default) showing 24–95× output bloat — visible output is ~3–6 tokens (a 30-char git-commit-subject label, exactly what `maxOutputTokens: 60` budgeted), but `output_token_count` (which on OpenAI-compatible reasoning models includes reasoning tokens) is **1454–5708**:

| # | Input | Output (visible + reasoning) | Total | Duration |
|---|-------|------------------------------|-------|----------|
| 1 | 242   | 2 441                        | 2 683 | 17.8 s   |
| 2 | 296   | 2 067                        | 2 363 | 15.8 s   |
| 3 | 332   | **5 708**                    | 6 040 | **41.8 s** |
| 4 | 337   | 2 886                        | 3 223 | 22.0 s   |
| 5 | 323   | 4 085                        | 4 408 | 29.0 s   |
| 6 | 328   | 5 472                        | 5 800 | 38.5 s   |
| 7 | 351   | 1 454                        | 1 805 | 11.5 s   |

For a cosmetic 30-char label whose design budget is **~1 s** (per the JSDoc on `toolUseSummary.ts:13`), this is 11–42× over budget. The visible output complies with `maxOutputTokens: 60` — the entire overshoot is reasoning tokens that should have been disabled at the model level.

## Why the disable doesn't reach qwen3

```ts
// pipeline.ts:358-365
const reasoningDisabled =
  request.config?.thinkingConfig?.includeThoughts === false ||
  this.contentGeneratorConfig.reasoning === false;
if (reasoningDisabled) {
  const typed = providerRequest as unknown as Record<string, unknown>;
  if ('enable_thinking' in typed) {     // ← BUG
    typed['enable_thinking'] = false;
  }
  ...
}
```

`'enable_thinking' in typed` checks whether the wire body **already** has the field. It never will, because:

- `enable_thinking` is a qwen3 non-standard extension, not part of OpenAI Chat Completions
- `buildBaseRequest` constructs a vanilla OpenAI-compatible payload that never includes this field
- `DashScopeOpenAICompatibleProvider.buildRequest` (dashscope.ts:165-208) only injects `extra_body` from user config — it never auto-injects an `enable_thinking: false` based on `includeThoughts`

So:
1. `sideQuery.ts:applyThinkingDefault` sets `thinkingConfig.includeThoughts = false` (correct intent)
2. `pipeline.ts:358-360` correctly detects `reasoningDisabled = true` (correct)
3. `pipeline.ts:362-364` checks `'enable_thinking' in typed` → **always false** → nothing happens
4. Wire body goes out without any thinking-disable signal
5. qwen3.5-flash keeps thinking-by-default
6. Reasoning tokens get burned on every cosmetic side-query

The DeepSeek path immediately after (`pipeline.ts:384-386`) does the equivalent unconditionally:

```ts
if (isDeepSeekHostname(this.contentGeneratorConfig)) {
  typed['thinking'] = { type: 'disabled' };   // direct set, no `in` check
}
```

There is no `isDashScopeHostname` / `isQwen3Series` parallel branch.

`pipeline.ts:471-473` documents the right wire shape but no code emits it:

```
- qwen3 series        — model-dependent; can be manually disabled via `extra_body.enable_thinking`
```

## Affected paths (every side-query on qwen3)

`sideQuery.ts:applyThinkingDefault` enforces `includeThoughts: false` for **all** side-queries, so the bug fires on:

- `tool-use-summary` (this report's evidence — fires once per tool batch)
- `session-title` (every turn)
- `prompt-suggestion` (every turn)
- `auto-memory-recall` (every prompt)
- `chat_compression` (when fallback hits a qwen3 fast model)
- `next-speaker-check`
- `subagentGenerator` (planning side-queries)
- `relevanceSelector`, `forget`, `sessionRecap`, `ArenaManager` (each call sites of `runSideQuery`)

A typical session sees 10–30 side-queries per user prompt. Each currently burns 1.5–6 K reasoning tokens it shouldn't, in 11–42 s of wall time the design budgeted for ~1 s.

## Impact

1. **Cost**: 24–95× token bloat on every cosmetic side-query (multiply by ≥10 side-queries per turn × N users)
2. **Latency**: 11–42 s for what should be ~1 s — completely defeats the "hidden behind 5–30 s main-model streaming" design assumption documented in `toolUseSummary.ts:13`
3. **Cascading congestion**: heavy in-flight side-queries on the fast-model gateway path correlate with main-conversation call hangs (see related observation in #TODO another issue covering the network gateway side, separate root cause)
4. **No user-visible benefit**: `cleanSummary()` already takes only the first line and caps at 100 chars (`MAX_SUMMARY_LENGTH`), so all the reasoning output is discarded client-side

## Suggested fix

Two equivalent routes — pick whichever fits the codebase style:

**Option A (smaller diff):** in `pipeline.ts:362-365`, drop the `in typed` guard and inject via `extra_body` with model/hostname gating, mirroring the DeepSeek branch:

```ts
if (isQwen3Series(this.contentGeneratorConfig.model) ||
    DashScopeOpenAICompatibleProvider.isDashScopeProvider(this.contentGeneratorConfig)) {
  const eb = (typed['extra_body'] as Record<string, unknown> | undefined) ?? {};
  typed['extra_body'] = { ...eb, enable_thinking: false };
}
```

**Option B (cleaner separation):** lift the responsibility to `DashScopeOpenAICompatibleProvider.buildRequest` (dashscope.ts:165), where it can inspect `request.config?.thinkingConfig?.includeThoughts === false` and inject `extra_body.enable_thinking: false`. The pipeline-level branch then no longer needs a qwen3 case.

**Test**: add a regression assertion that with `thinkingConfig.includeThoughts: false` on a qwen3 model, the wire request body contains `extra_body: { enable_thinking: false }`.

## Affected version

Reproduced on `origin/main @ 84f408017`. The buggy `'enable_thinking' in typed` check has been there since the reasoning-disable consolidation — see git blame on `pipeline.ts:362-365`.

## Related

- `packages/core/src/utils/sideQuery.ts:applyThinkingDefault` — sets `includeThoughts: false` for all side-queries
- `packages/core/src/core/openaiContentGenerator/pipeline.ts:362-365` — bug location
- `packages/core/src/core/openaiContentGenerator/pipeline.ts:384-386` — DeepSeek's working pattern
- `packages/core/src/core/openaiContentGenerator/provider/dashscope.ts:165-208` — provider buildRequest where Option B fix would live
- `packages/core/src/services/toolUseSummary.ts` — the most visible victim
- Issue #4486 (telemetry trace id bug) — separate issue, found together while debugging stuck sessions


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(provider/dashscope): side-query thinking disable doesn't reach qwen3 series — 'enable_thinking' in typed check never fires #4501

TL;DR

Repro / evidence (production data)

Why the disable doesn't reach qwen3

Affected paths (every side-query on qwen3)

Impact

Suggested fix

Affected version

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	Input	Output (visible + reasoning)	Total	Duration
1	242	2 441	2 683	17.8 s
2	296	2 067	2 363	15.8 s
3	332	5 708	6 040	41.8 s
4	337	2 886	3 223	22.0 s
5	323	4 085	4 408	29.0 s
6	328	5 472	5 800	38.5 s
7	351	1 454	1 805	11.5 s

bug(provider/dashscope): side-query thinking disable doesn't reach qwen3 series — 'enable_thinking' in typed check never fires #4501

Description

TL;DR

Repro / evidence (production data)

Why the disable doesn't reach qwen3

Affected paths (every side-query on qwen3)

Impact

Suggested fix

Affected version

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions