fix(core): emit enable_thinking on DashScope when reasoning is disabled#4505
Conversation
The previous `'enable_thinking' in typed` guard only flipped the field when pre-populated, but provider buildRequest never auto-injects this qwen3-specific extension. So QWEN_OAUTH defaults and any qwen3 model without `extra_body.enable_thinking` configured (e.g. qwen3.5-flash) silently burned reasoning tokens on every side-query despite includeThoughts: false. Hostname-gated unconditional set mirrors the existing DeepSeek branch. Closes QwenLM#4501 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
There was a problem hiding this comment.
Pull request overview
This PR fixes a regression where “thinking disabled” signals were not reliably emitted for Qwen3 hybrid-thinking models when calling DashScope OpenAI-compatible endpoints, causing side-queries to unintentionally consume large reasoning token budgets. It updates the OpenAI pipeline to always emit enable_thinking: false when reasoning is disabled on DashScope endpoints, and adds targeted regression tests to prevent reintroducing the issue.
Changes:
- Inject
enable_thinking: falseinto the outgoing wire request when reasoning is disabled and the endpoint is identified as DashScope-compatible. - Update inline reasoning-shape documentation to reflect the DashScope/Qwen3 disable behavior.
- Add/adjust unit tests to cover DashScope hostname gating (including QWEN_OAUTH and internal
*.alibaba-inc.com) and negative cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/core/src/core/openaiContentGenerator/pipeline.ts | Adds DashScope-gated unconditional emission of enable_thinking: false when reasoning is disabled; updates related reasoning-shape commentary. |
| packages/core/src/core/openaiContentGenerator/pipeline.test.ts | Adds regression tests for DashScope/Qwen3 thinking-disable propagation and tightens an existing override test to exercise hostname gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📋 Review SummaryThis PR fixes a critical bug where Qwen3 hybrid-thinking models (e.g., 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
The previous comment block restated mechanics already visible in the code (overrides, mirrors DeepSeek, set-by-side-query) and pinned "includeThoughts: false" as the trigger when reasoning: false also qualifies. Keep only the two non-obvious whys (provider doesn't auto-inject; gate prevents leaking the qwen-specific field) plus a one-line scope note. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Qwen Code Review (LIGHT)What this PR does: Replaces the Findings: No issues found in the visible diff at this tier. The logic change is correct and mirrors the existing Qwen Code |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. LGTM! ✅ — qwen3.7-max via Qwen Code /review
🧪 本地构建验证报告 (维护者)在隔离 worktree 中对 PR 合并后状态做了完整验证,结论是可以合并。 验证环境
静态检查
单元测试 (与 PR 描述对齐)
🔬 回归证据 (核心)为了验证新增测试是否真的能捕获原 bug,做了一组对比实验:把 PR 的 5 个新测试保留,仅把 4 个失败测试与 PR 修复点完全对应:
把
🌐 端到端 wire-body 验证 (跳过 mock, 抓真实 HTTP)为了越过单元测试的 mock 层, 启了一个本地 HTTP server 作为 mock 后端, 用
这是发到线上的真实 HTTP 请求体, 不是任何 mock 自证. QWEN_OAUTH 默认路径——也就是 issue #4501 在生产环境烧 token 的那条路径——现在确实带上了 范围检查 (符合 PR 自述)
结论修复点位准确, 与 issue #4501 根因匹配 (qwen3 建议:可以合并。 在我的环境里复现的回归与 issue 描述完全一致, 修复后所有 wire body 检查通过. |
LaZzyMan
left a comment
There was a problem hiding this comment.
Review
Targeted fix for the qwen3 default case where ${enable_thinking} in typed was a no-op because extra_body.enable_thinking is absent from default configs — side-queries on qwen3-hybrid models burned reasoning tokens despite includeThoughts: false. The new hostname-gated unconditional set mirrors the existing DeepSeek branch idiom, keeps the override semantics inside the reasoningDisabled block intact, and adds strong test coverage including a api.openai.com negative case that pins the gate.
One scope-widening note worth tightening before merge. The new gate is isDashScopeProvider-only, so when a user routes a non-Qwen model (GLM, DeepSeek) through DashScope's compatible endpoint and reasoning is disabled, the patch now sends enable_thinking: false regardless of model id. Practical impact depends on DashScope's leniency toward unknown extras: if it silently drops them, harmless; if it forwards them to the upstream model serving, the side-query may 400 where it previously succeeded. The PR description and inline comments already acknowledge that GLM uses extra_body.thinking.enabled and DeepSeek-on-DashScope uses thinking: { type: 'disabled' } — different shapes the qwen3 disable doesn't satisfy. ANDing the host check with a model-id regex (e.g. ^qwen[23]) restores the pre-PR conservative posture for non-Qwen routing without breaking the qwen3 fix.
Verdict
COMMENT — solves #4501 correctly with strong tests; the host-only gate widens the trigger surface beyond what the fix needs, worth restricting to qwen-prefixed models before the next provider lands the same shape.
DashScope's compatible-mode endpoint routes multiple model families
(qwen3, GLM, DeepSeek). The hostname-only gate would send
`enable_thinking: false` to all of them when reasoning is disabled —
fine for qwen, but GLM uses `extra_body.thinking.enabled` and
DeepSeek-on-DashScope uses `thinking: { type: 'disabled' }`. Sending
the qwen-specific field to those is at best a no-op and at worst
forwarded upstream and rejected. AND the gate with a `qwen` model-name
prefix to restore conservative posture for non-qwen routings.
Per review feedback from @LaZzyMan and @wenshao.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
R4: both R3 suggestions addressed in 7a7da7c — scope widening closed via startsWith('qwen') model-name filter, test gap filled with non-qwen negative test (glm-5 on DashScope). Multi-agent review (9 agents + reverse audit) across correctness, security, quality, performance, and test coverage found no high-confidence issues. Build clean, 48/48 tests pass, CI green 9/9. LGTM. ✅ — qwen3.7-max via Qwen Code /review
|
This PR shares modified code with PR #4518 (fix(core): stabilize DeepSeek tool cache prefix). Shared functions:
Both PRs also modify files in the
Recommendation: These PRs both modify the OpenAI content generation pipeline. PR #4505 adds DashScope reasoning config while PR #4518 stabilizes the response converter. Test together before merging. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
|
This PR shares modified code with PR #4518 (fix(core): stabilize DeepSeek tool cache prefix). Shared functions:
Both PRs also modify files in the
Recommendation: These PRs both modify the OpenAI content generation pipeline. PR #4505 adds DashScope reasoning config while PR #4518 stabilizes the response converter. Test together before merging. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
`coder-model` is `DEFAULT_QWEN_MODEL` (config/models.ts:7) — the
QWEN_OAUTH default model, aliased to Qwen 3.6 Plus hybrid. The previous
`startsWith('qwen')` gate didn't match it (string starts with `c`), so
QWEN_OAUTH default-flow users — the exact scenario QwenLM#4501 targets —
continued burning reasoning tokens on side-queries. The original
QWEN_OAUTH test masked this by using `qwen3-coder-flash`; switch it to
the real default `coder-model` so the regression is pinned.
Also add a `!baseUrl` regression test to cover the implicit-DashScope
default branch in `isDashScopeProvider` (dashscope.ts:49) — all other
positive tests explicitly set baseUrl.
Per @wenshao's [Critical] review.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
本地真实测试报告(merge 参考)按 PR body 的 Test plan 完成静态检查 + 全套相关 vitest + 一段直接命中 测试环境
1. 静态检查与单测(PR body Test plan 三项)
2. 真磁盘 smoke——直接命中生产
|
| PR body 声称 | 验证 |
|---|---|
DashScope hostname + includeThoughts: false → 发出 enable_thinking: false |
✅ S1(dashscope.aliyuncs.com) |
DashScope + reasoning: false 配置侧 opt-out → 发出 |
✅ S2 |
| QWEN_OAUTH(无论 baseUrl)→ 发出 | ✅ S3(无 baseUrl 也 OK,因 isDashScopeProvider 第一行就是 if (authType === QWEN_OAUTH) return true) |
内部 *.alibaba-inc.com → 发出 |
✅ S4(这恰好是大量内部用户的实际 baseUrl 形态) |
coder-model(QWEN_OAUTH 默认 DEFAULT_QWEN_MODEL,名字不以 qwen 开头)→ 发出 |
✅ S5(review 期间补的 case,PR fix(core): cover coder-model in DashScope enable_thinking gate) |
| 非 DashScope hostname → 不发出 | ✅ S6(openrouter) |
| DashScope + 非 qwen 模型(GLM/DeepSeek-on-DashScope)→ 不发出 | ✅ S7(GLM 用 extra_body.thinking.enabled,DeepSeek-on-DashScope 用 thinking: { type: 'disabled' },把 qwen-specific 的 enable_thinking 转给它们至少是 noop、上游可能 reject) |
reasoningDisabled === false 时 → 不发出(不能误关 thinking) |
✅ S8 |
#4501 production-shape repro(脚本最后一段)
Wire body for #4501-shape repro (qwen3.5-flash on dashscope.aliyuncs.com,
thinkingConfig.includeThoughts=false):
enable_thinking=false ← pre-fix 是 undefined → 1.5–6K reasoning tokens
修复前 'enable_thinking' in typed 永远为 false,所以 wire body 不带这个 field,qwen3.5-flash 默认 thinking-on,side-query 就烧 reasoning token;现在 wire 上就有了 enable_thinking: false,qwen3 hybrid 模型据此关掉 thinking。
3. 我尝试但暂未跑通的额外 smoke:真 CLI HTTP-proxy 抓包
本想再加一条"启 fake DashScope-compatible proxy + 让 qwen-code 真的把请求发过来"的端到端,但卡在了与本 PR 无关的配置层:
- 维护者的
~/.qwen/settings.json里有modelProviders.openai(14 个命名 provider,每个带自己的baseUrl+envKey) - 即便用
QWEN_HOME=/tmp/...-isolated把全局 settings 重定向到一个干净的目录、并把 worktree 内.qwen/settings.json也清掉,dev 进程的请求仍然打到idealab.alibaba-inc.com(响应是中文 "无效的api key",明显非我的 proxy) - 估计是
modelProviders.openai/env块的合并语义之外还有一个我没追到的 baseUrl 来源(dev 模式.env解析、或别的 lazy resolver) - 这是我本机环境的 setup 复杂度,不是本 PR 的问题——PR 改的是
pipeline.ts里provider.buildRequest()之后的 in-place mutation,那一段已经在 §2 的 8 个 smoke 里全 case 覆盖。proxy 抓包只是 §2 的 superset,提供不出新信号
为不让这条挂着,把它在这里如实记一笔。
4. 结论
- ✅ PR body Test plan(typecheck / eslint / 47 单测 / 640 邻近单测)全部本地复现通过;review 期间补的 2 个 case + coder-model 覆盖也跟着绿
- ✅ 真磁盘 smoke 8/8 把 fix 的 5 个正向 + 3 个负向不变量在生产代码路径上跑了一遍——包含维护者实际使用的
*.alibaba-inc.com形态(S4) - ✅ bug(provider/dashscope): side-query thinking disable doesn't reach qwen3 series — 'enable_thinking' in typed check never fires #4501 production-shape repro:post-fix wire body 携带
enable_thinking: false,对 qwen3.5-flash / qwen3.7-max 等 hybrid 模型有效
从本地真实测试角度,该 PR 可以合并。 8 个 smoke + 49 个 pipeline 单测在 hostname × model × reasoning-state 三轴上把 fix 的全部分支覆住;生产侧 24–95× token bloat 的 root cause('enable_thinking' in typed 永远为 false)已被替换为 DashScopeOpenAICompatibleProvider.isDashScopeProvider + qwen-model 双闸口下的 unconditional set,与下方 DeepSeek 分支的写法对仗。
Review round summaryAll 8 unresolved threads addressed:
Pushed back on 4 threads (2 bot, 2 wenshao style suggestions). The Critical and 3 Suggestions were already addressed in earlier fix commits. |
The gate keyed off `contentGeneratorConfig.model`, but buildRequest ships `context.model` (= `request.model || contentGeneratorConfig.model`, the value baseRequest.model is built from). A request-level model override desynced the gate from what actually goes on the wire: - qwen config + non-qwen request model still emitted `enable_thinking`, leaking the qwen-only field to the non-qwen routing on the wire - non-qwen config + qwen request model missed the disable signal, regressing QwenLM#4501 Gate on `context.model` instead. Also swap the hardcoded `'coder-model'` for the `DEFAULT_QWEN_MODEL` constant so a rename can't silently break the match. Add regression tests for both override directions; the existing positive tests set `request.model: 'test-model'`, which — now that the gate reads the wire model — was overriding the qwen config they meant to exercise, so point them at the intended wire model. Per @yiliang114's review. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Reverts the `DEFAULT_QWEN_MODEL` import added in 420a131. Importing the constant was already raised in review (@yiliang114) and declined: `coder-model` is a public DashScope routing alias, effectively frozen — not an internal constant that might drift silently. The inline comment already documents the relationship, and the import adds a cross-module dependency not worth it for this bugfix. The wire-model gate fix (gating on `context.model` rather than `contentGeneratorConfig.model`) is unaffected and stays. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
|
Addressed @yiliang114's wire-model review:
|
wenshao
left a comment
There was a problem hiding this comment.
No review findings. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Summary
pipeline.ts:362previously gated the qwen3 thinking-disable on'enable_thinking' in typed— but providerbuildRequestnever auto-injects this qwen3-specific extension, so the check failed on QWEN_OAUTH defaults and any qwen3 model withoutextra_body.enable_thinkingexplicitly configured (e.g.qwen3.5-flashon the fast-model path). Side-queries silently burned 1.5–6 K reasoning tokens (24–95× over budget per bug(provider/dashscope): side-query thinking disable doesn't reach qwen3 series — 'enable_thinking' in typed check never fires #4501) despitesideQuery.applyThinkingDefaultsettingincludeThoughts: false.DashScopeOpenAICompatibleProvider.isDashScopeProvider-gated unconditional set, mirroring the existingisDeepSeekHostnamebranch immediately below.extra_body.thinking.enabled) and DeepSeek-on-DashScope (thinking: { type: 'disabled' }) need different disable shapes — pre-existing gap, called out in the new in-code comments but not closed in this PR.Test plan
includeThoughts: false; DashScope +reasoning: falseconfig opt-out;QWEN_OAUTHregardless of baseUrl; internal*.alibaba-inc.com; non-DashScope hostname (negative — must NOT emitenable_thinking)pipeline.test.ts:465) with explicit DashScope baseUrl so the hostname gate is exercised by intent rather than the!baseUrldefaultpipeline.test.ts— 47/47 passingtsc --noEmitcleaneslintclean on changed files人工验证
setting不配置enable_thinking

之前:
之后:
Closes #4501
🤖 Generated with Qwen Code