💄 style: add preserve thinking feature for Qwen3.7 Max model#13494
Conversation
|
@sxjeru is attempting to deploy a commit to the LobeHub OSS Team on Vercel. A member of the Team first needs to authorize it. |
|
@ONLY-yours @canisminor1990 - This PR adds preserve thinking support for Qwen3.6 Plus and updates model providers (Qwen, Zhipu). It also modifies ModelSwitchPanel UI and chat config types. Please take a look. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## canary #13494 +/- ##
==========================================
- Coverage 70.50% 70.41% -0.09%
==========================================
Files 3312 3312
Lines 327060 326158 -902
Branches 34721 35735 +1014
==========================================
- Hits 230582 229654 -928
- Misses 96296 96320 +24
- Partials 182 184 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26db4ac105
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR introduces a preserveThinking extended parameter intended to let Qwen3.6 Plus reuse prior assistant “thinking/reasoning” by mapping it to Qwen’s preserve_thinking request option, with corresponding UI/config/type support across the app.
Changes:
- Add
preserveThinkingto chat config + model extend-param resolution, and expose a toggle in the model controls UI. - Extend the Qwen runtime adapter to (optionally) map assistant
reasoningintoreasoning_contentand sendpreserve_thinking. - Update builtin model catalogs/locales (including adding Qwen3.6 Plus, and other model list edits).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/services/chat/mecha/modelParamsResolver.ts | Adds preserveThinking into resolved runtime extend params |
| src/services/chat/mecha/modelParamsResolver.test.ts | Adds unit tests for preserveThinking resolution |
| src/server/modules/AgentRuntime/RuntimeExecutors.ts | Persists assistant reasoning into runtime state messages |
| src/routes/(main)/settings/provider/features/ModelList/CreateNewModelModal/ExtendParamsSelect.tsx | Adds preserveThinking to selectable extend params + preview metadata |
| src/locales/default/modelProvider.ts | Adds hint text for preserveThinking |
| src/locales/default/chat.ts | Adds title/description strings for preserveThinking |
| src/features/ModelSwitchPanel/components/ControlsForm/ControlsForm.tsx | Adds preserveThinking switch in controls form |
| packages/types/src/openai/chat.ts | Extends shared message/payload types (reasoning + preserveThinking) |
| packages/types/src/agent/chatConfig.ts | Adds preserveThinking to agent chat config + zod schema |
| packages/model-runtime/src/types/chat.ts | Extends runtime payload/message types (reasoning_content + preserveThinking) |
| packages/model-runtime/src/providers/qwen/index.ts | Maps preserveThinking → preserve_thinking and reasoning → reasoning_content for supported Qwen models |
| packages/model-runtime/src/providers/qwen/index.test.ts | Adds tests for Qwen preserve-thinking payload mapping |
| packages/model-bank/src/types/aiModel.ts | Adds preserveThinking to extend param union + zod enum |
| packages/model-bank/src/aiModels/zhipu.ts | Adds new Zhipu model cards (but currently with issues) |
| packages/model-bank/src/aiModels/qwen.ts | Adds Qwen3.6 Plus model card; modifies Qwen3.5 Plus entry |
| packages/model-bank/src/aiModels/google.ts | Removes a Gemini preview model from builtin list |
| locales/zh-CN/modelProvider.json | Adds zh-CN hint for preserveThinking |
| locales/zh-CN/chat.json | Adds zh-CN title/desc for preserveThinking |
| locales/en-US/modelProvider.json | Adds en-US hint for preserveThinking |
| locales/en-US/chat.json | Adds en-US title/desc for preserveThinking |
Comments suppressed due to low confidence (1)
packages/model-bank/src/aiModels/google.ts:452
- This PR removes the Google model
gemini-2.5-flash-lite-preview-09-2025from the builtin list. The PR description focuses on Qwenpreserve_thinking, so please confirm this removal is intentional and, if so, consider mentioning it in the PR description to avoid accidental scope creep.
{
abilities: {
functionCall: true,
search: true,
vision: true,
},
contextWindowTokens: 1_048_576 + 8192,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 590e104070
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
I think GLM4.7 and above can also use this feature, enabled with
https://docs.bigmodel.cn/cn/guide/capabilities/thinking-mode#轮级思考
Original Content感觉 GLM4.7 以上也能用这个特性,
https://docs.bigmodel.cn/cn/guide/capabilities/thinking-mode#轮级思考 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc9c0bca6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b3d406a24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c27df820a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
After testing, when disabling the historical reasoning content toggle for qwen3.6-plus, the input tokens decreased from 5,643 to 4,962, a reduction of 681 ≈ 676 (the deep thinking token count from historical conversations). Switching to a different conversation, when disabling the glm-4.7 toggle, the input tokens decreased from 4983 to 4352, a reduction of 631. Additionally, I noticed that Qwen removes the thinking content from all historical messages, while GLM appears to retain the thinking content from the last historical message. The current strategy is for the backend to pass the full historical message reasoning content to the model upstream, along with parameters, letting the upstream decide which reasoning content to use. Additionally, the build memory has currently been adjusted to around 6500, and after multiple retries, the build may succeed.
Original Content经过测试,关闭 qwen3.6-plus 的历史推理内容开关,输入 token 从 5,643 - 4,962 = 681 ≈ 676(历史会话的深度思考 token 数) 更换对话,关闭 glm-4.7 开关,输入 token 从 4983 - 4352 = 631 此外发现 qwen 会去掉所有历史消息的思考内容,而 glm 似乎会保留最后一条历史消息的思考内容。 目前策略是后端传递全量历史消息推理内容给模型上游,并搭配参数,由上游决定使用哪些推理内容。 此外目前将构建内存调节为 6500 左右,多次重试后可能会构建成功。 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 15819e6017
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| provider === 'qwen' || provider === 'zhipu'; | ||
| const modelSupportsPreserveThinking = | ||
| modelSupportsPreserveThinkingFromCard || | ||
| (!modelCard && providerSupportsPreserveThinkingFallback); |
There was a problem hiding this comment.
Do not assume unknown Qwen/Zhipu models support preserveThinking
The provider fallback marks any unmatched Qwen/Zhipu model as preserveThinking-capable, which is too broad. If a built-in unsupported model is used via a custom deployment name (so modelCard lookup misses), this branch treats it as supported and reuses agentConfig.chatConfig.preserveThinking, causing call_llm to persist/replay assistant reasoning and pass preserve-thinking flags even though that model never declared the capability. This can silently increase prompt tokens and alter behavior for renamed deployments that are not actually preserve-thinking models.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c117c89abe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a8ff55e0c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (agentConfig) { | ||
| const { LOBE_DEFAULT_MODEL_LIST } = await import('model-bank'); | ||
|
|
||
| const preserveThinkingFromPayload = (llmPayload as { preserveThinking?: boolean }) | ||
| .preserveThinking; |
There was a problem hiding this comment.
Honor payload preserveThinking without agentConfig
call_llm supports running without ctx.agentConfig (there is an explicit fallback branch), but preserveThinking is only read inside if (agentConfig). In no-agent-config executions, a caller-provided payload.preserveThinking is silently dropped, so chatPayload never receives it and assistant reasoning is never persisted/replayed even when explicitly requested. This breaks preserve-thinking behavior for instruction-driven/legacy contexts that invoke call_llm without agent config.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is also still reproducible on the current head (62c191e). preserveThinkingFromPayload is read only inside if (agentConfig), while the no-agent-config branch is valid and falls back to processedMessages = llmPayload.messages. In that path an explicit payload.preserveThinking is dropped before chatPayload is built, so instruction-driven callers cannot opt in to preservation even when they pass the flag directly.
557e17b to
501e267
Compare
|
I don't think this PR is ready to merge yet. The main remaining issue is the That means these cases still replay historical reasoning: preserveThinking === false
preserveThinking === undefinedFor Qwen, If the intended product behavior is "turn this off to avoid sending historical thinking", the adapter should only derive/send A separate smaller point: the |
|
@tjx666 Regarding the first point, I had also considered not passing the reasoning content when the switch is turned off. But after further thought, I believe that passing all historical reasoning content in full and letting the API upstream decide whether to use it based on parameters would better preserve the upstream's intended behavior without disrupting it. There is no need for LobeHub to pre-trim it. The second point has been fixed, thank you for raising it.
Original Content@tjx666 关于第一点,我此前也想过关闭开关时不传递推理内容。但后来经过思考,认为将历史推理内容全量传递,由 API 上游根据参数决定是否使用应当更能不破坏上游推荐意图行为。不必由 LobeHub 提前裁剪。 第二点已修复,感谢提出。 |
|
❤️ Great PR @sxjeru ❤️ The growth of project is inseparable from user feedback and contribution, thanks for your contribution! If you are interesting with the lobehub developer community, please join our discord and then dm @arvinxx or @canisminor1990. They will invite you to our private developer channel. We are talking about the lobe-chat development or sharing ai newsletter around the world. |
# 🚀 LobeHub Release (20260610) **Release Date:** June 10, 2026 **Since v2.2.2:** 131 merged PRs · 13 contributors > This weekly release strengthens agent collaboration across cloud, desktop, CLI, and workspace flows, with steadier runtime behavior and a broader foundation for workspace-scoped data. --- ## ✨ Highlights - **Agent execution across devices** — Unifies per-device working directories, project skill discovery, and sub-agent suspend/resume behavior across server, QStash, and device RPC flows. (#15543, #15566, #15481, #15620, #15591) - **Connector and sandbox platform** — Expands connector permissions, custom OAuth MCP connector onboarding, sandbox provider support, and user-uploaded file sync into cloud sandbox runs. (#15463, #15546, #15184, #15550) - **Desktop and CLI reliability** — Fixes desktop cold-start, auto-update, Windows build, CLI skill discovery, and `lh connect` agent dispatch paths. (#15547, #15525, #15527, #15562, #15632, #15634) - **Pages and sharing** — Refreshes topic sharing, improves Page Editor layout behavior, and routes Page Agent tool execution through the server-side editor path. (#15581, #15556, #15588, #15023, #15610) - **Model availability and provider updates** — Adds user-scoped LobeHub model availability, Claude Fable 5, Qwen thinking preservation, and MiniMax M3 updates. (#15590, #15639, #13494, #15376) --- ## 🏗️ Core Product & Architecture ### Agent Runtime & Heterogeneous Agents - Improves sub-agent lifecycle handling, including async suspend/resume, queue-mode QStash resume delivery, and blocking nested sub-agent calls. (#15481, #15620, #15575) - Stabilizes heterogeneous agent ingestion and streaming with raw stream dumps, per-turn usage, image forwarding on regenerate, and duplicate-text fixes. (#15602, #15577, #15592, #15585) - Adds execution-device and working-directory controls across device RPC, legacy defaults, and remote-spawned Claude Code sessions. (#15543, #15566, #15591, #15572) - Improves runtime diagnostics and compatibility, including Gemini multimodal output capture, abort stream semantics, and trace quality analysis. (#15535, #13677, #15508) --- ## 📱 Platforms, Integrations & UX ### Connectors, Sandbox & Tools - Ships API-level connector tool permissions, custom OAuth MCP connector onboarding, and connector-first runtime execution. (#15463, #15546) - Adds sandbox provider support, cloud sandbox file sync, and safer external URL file input handling with SSRF validation. (#15184, #15550, #12657) - Improves tool visibility and execution with pinned app-fixed tools, ANSI output rendering, gateway-tunneled MCP calls, and automatic headless tool runs. (#15509, #15516, #15469, #15492) ### Desktop, CLI & Web UX - Restores desktop startup and reload behavior, preserves IPC error causes, and keeps the tab bar new-tab action visible across routes. (#15547, #15597, #15638) - Fixes desktop update and build stability for browser quit guards, macOS update signing, and Windows Visual Studio detection. (#15525, #15527, #15562) - Shows the plan-limit upgrade UI on desktop builds. (#15628) - Adds the Agent Run delivery checker and fixes CLI device dispatch plus skill list/search output. (#15489, #15634, #15632) - Refreshes onboarding, auth source preservation, topic UI states, referral/Fable campaign copy, and chat-input control bar behavior. (#15629, #15544, #15573, #15614, #15616, #15617, #15622, #15643) --- ## 🔒 Security, Reliability & Rollout Notes - External URL file input now includes SSRF validation for safer Google file handling. (#12657) - Database workspace-scope migrations are part of this release; self-hosted operators should run the normal migration path before serving the updated app. (#15446, #15465, #15468, #15472) - The release branch was re-cut from `canary` and includes the latest `main` release-version commit so `v2.2.2` is the verified compare base. --- ## 👥 Contributors @ONLY-yours, @sxjeru, @hardy-one, @xujingli, @hezhijie0327, @Coooolfan, @arvinxx, @tjx666, @Innei, @rivertwilight, @rdmclin2, @cy948, @AmAzing129 **Full Changelog**: v2.2.2...release/weekly-20260610-recut-3
* 🐛 fix(agent-runtime): always persist assistant reasoning to DB PR #13494 gated message reasoning persistence behind preserveThinking (agent chatConfig + model extendParams / qwen|zhipu fallback). That gate is only meant to control whether reasoning is replayed into the next LLM payload — applying it to the DB write dropped thinking content for every non-qwen/zhipu reasoning model in server-side agent mode: reasoning streamed live via stream_end but vanished after refresh. Restore unconditional reasoning persistence in messageModel.update and keep the preserveThinking gate only for state.messages payload replay. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * 💄 style(i18n): localize callSubAgent tool labels Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

💻 Change Type
🔗 Related Issue
🔀 Description of Change
https://qwen.ai/blog?id=qwen3.6
已确认可用,当关闭开关不再传入历史推理内容,会发现输入 token 变少了。
🧪 How to Test
📸 Screenshots / Videos
📝 Additional Information