feat(core): Parse Minimax <thinking> tags as thought parts in streaming and non-streaming responses#1
Draft
Copilot wants to merge 2 commits into
Draft
feat(core): Parse Minimax <thinking> tags as thought parts in streaming and non-streaming responses#1Copilot wants to merge 2 commits into
<thinking> tags as thought parts in streaming and non-streaming responses#1Copilot wants to merge 2 commits into
Conversation
Copilot
AI
changed the title
[WIP] Update parsing for <thinking> tags in OpenAI response
feat(core): Parse Minimax Apr 14, 2026
<thinking> tags as thought parts in streaming and non-streaming responses
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Minimax models (m2.7/m2.5) return reasoning content as
<thinking>...</thinking>inline incontentrather than via the standardreasoning_contentfield. Without handling these tags, they appear as raw text to users instead of being collapsed into the existing "Thinking" UI block.TLDR
Adds
<thinking>tag detection and parsing toconverter.tsfor both streaming and non-streaming paths, mapping extracted content to{ text, thought: true }parts so the existing frontend collapse/render logic works unchanged.Screenshots / Video Demo
N/A — no user-facing change beyond Minimax reasoning content now rendering in the collapsible "Thinking" block instead of as raw
<thinking>...</thinking>text.Dive Deeper
Non-streaming (
convertOpenAIResponseToGemini): Regex loop over the fullcontentstring, extracting interleaved thinking blocks and plain text:Streaming (
convertOpenAIChunkToGemini): State machine using two class-level fields (isInsideThinkingBlock,thinkingTagBuffer). On each chunk, the buffer is prepended to the incoming content before processing, which handles tags split across chunk boundaries (e.g. one chunk ending with<think, next starting withing>). Content inside the block emits as{ thought: true }parts; the tags themselves are stripped.resetStreamingThinkingState()is added alongside the existingresetStreamingToolCalls()for cleanup between streams.Reviewer Test Plan
Configure a Minimax-compatible endpoint (m2.7 or m2.5) and send a prompt that triggers reasoning. Previously the raw
<thinking>…</thinking>text appeared in the response; after this change it should render as a collapsed "Thinking" block identical to DeepSeek/Qwen3 behavior.Unit tests cover:
Testing Matrix
Linked issues / bugs
Resolves QwenLM#3228
Original prompt
The Minimax model (m2.7/m2.5) returns its reasoning content enclosed in
<thinking>...</thinking>tags directly within the maincontentfield, rather than using the standardreasoning_contentfield. Currently, these tags are exposed as raw text to the user.We need to update
packages/core/src/core/openaiContentGenerator/converter.tsto detect and parse these<thinking>tags in both streaming (convertOpenAIChunkToGemini) and non-streaming (convertOpenAIResponseToGemini) responses.Requirements:
convertOpenAIResponseToGemini(non-streaming): Parse thecontentstring, extract any text inside<thinking>...</thinking>, and push it as a part with{ text: extractedText, thought: true }. Push the rest of the text as normal text parts.convertOpenAIChunkToGemini(streaming): Implement a state machine or buffer to handle<thinking>tags that might be split across multiple chunks. When inside a thinking block, emit the chunks with{ text: chunkText, thought: true }. Ensure the<thinking>and</thinking>tags themselves are stripped out and not displayed to the user.This will allow the existing frontend logic to properly collapse and render the thinking process.
The following is the prior conversation context from the user's chat exploration (may be truncated):
User: QwenLM#3228 难度评估:比较好解决
问题本质
minimax 模型(m2.7 / m2.5)的思考内容跟主流模型的处理方式不同。主流模型(DeepSeek、Qwen3 等)会把思考过程放在 API 响应的独立字段 reasoning_content 中,qwen-code 已经能正确识别并折叠显示。但 minimax 把 ... 标签直接内嵌在 content 正文里返回,qwen-code 没有做特殊处理,所以这些标签就原样暴露给了用户。
现有架构已经很完善
通读源码后发现,qwen-code 对 thinking 内容已经有完整的处理链路:
converter.ts(第 940-944 行):从 reasoning_content / reasoning 字段提取思考内容,标记为 { text: ..., thought: true }
thoughtUtils.ts:解析和提取 thought 部分
turn.ts:发射 Thought 事件
ThinkingMessage.tsx:前端展示为可折叠的 "Thinking" 区块
也就是说,只要你把 minimax 在 content 里的 内容提取出来、标记上 thought: true,后续的折叠渲染是现成的。
需要改动的地方
核心改动只需要修改 1 个文件,大约 10-20 行代码:
packages/core/src/core/openaiContentGenerator/converter.ts
在流式处理(convertOpenAIChunkToGemini,约第 948 行)和非流式处理(convertOpenAIResponseToGemini,约第 839 行)中,对 content 字段做一次 标签检测和拆分:
// 现有代码(第 948 行附近):
if (choice.delta?.content) {
if (typeof choice.delta.content === 'string') {
parts.push({ text: choice.delta.content });
}
}
// 改动思路:在 push 之前,检测并剥离 标签
// 将 ... 部分提取为 thought part
// 将剩余内容作为普通 text part
不过流式场景有个小麻烦: 标签可能被拆分在多个 chunk 里(比如一个 chunk 拿到 <think,下一个拿到 ing>),需要做一个小型的状态机来累积和判断。
Assistant: 这个评估非常准确,由于
<thinking>标签在流式输出中会被切分到不同的 chunk 中,确实需要引入一个状态机(或缓冲区)来拼接和剥离思考内容。以下是具体的改动建议,主要在
converter.ts中引入一个缓冲区来跟踪<thinking>标签的状态。1. 新增状态变量
在
OpenAIContentConverter类中添加流式处理需要的状态:2. 修改流式处理 (
convertOpenAIChunkToGemini)在处理
choice.delta?.content时,接管 content 并进行标签匹配:(注:为防止
<thinking>标签被拦腰截断(如<think和ing>),可以利用现有的streamingTextBuffer等机制先对 chunk 做短时间的 buffering,或者使用正则/状态机逐字符处理。)3. 修改非流式处理 (
convertOpenAIResponseToGemini)非流式可以直接使用正则替换: