Skip to content

feat(core): Parse Minimax <thinking> tags as thought parts in streaming and non-streaming responses#1

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/update-thinking-tag-parsing
Draft

feat(core): Parse Minimax <thinking> tags as thought parts in streaming and non-streaming responses#1
Copilot wants to merge 2 commits into
mainfrom
copilot/update-thinking-tag-parsing

Conversation

Copilot AI commented Apr 14, 2026

Copy link
Copy Markdown

Minimax models (m2.7/m2.5) return reasoning content as <thinking>...</thinking> inline in content rather than via the standard reasoning_content field. Without handling these tags, they appear as raw text to users instead of being collapsed into the existing "Thinking" UI block.

TLDR

Adds <thinking> tag detection and parsing to converter.ts for both streaming and non-streaming paths, mapping extracted content to { text, thought: true } parts so the existing frontend collapse/render logic works unchanged.

Screenshots / Video Demo

N/A — no user-facing change beyond Minimax reasoning content now rendering in the collapsible "Thinking" block instead of as raw <thinking>...</thinking> text.

Dive Deeper

Non-streaming (convertOpenAIResponseToGemini): Regex loop over the full content string, extracting interleaved thinking blocks and plain text:

THINKING_TAG_REGEX.lastIndex = 0;
while ((match = THINKING_TAG_REGEX.exec(content)) !== null) {
  if (match.index > lastIndex)
    parts.push({ text: content.substring(lastIndex, match.index) });
  if (match[1])
    parts.push({ text: match[1], thought: true });
  lastIndex = THINKING_TAG_REGEX.lastIndex;
}
if (lastIndex < content.length)
  parts.push({ text: content.substring(lastIndex) });

Streaming (convertOpenAIChunkToGemini): State machine using two class-level fields (isInsideThinkingBlock, thinkingTagBuffer). On each chunk, the buffer is prepended to the incoming content before processing, which handles tags split across chunk boundaries (e.g. one chunk ending with <think, next starting with ing>). Content inside the block emits as { thought: true } parts; the tags themselves are stripped.

resetStreamingThinkingState() is added alongside the existing resetStreamingToolCalls() for cleanup between streams.

Reviewer Test Plan

Configure a Minimax-compatible endpoint (m2.7 or m2.5) and send a prompt that triggers reasoning. Previously the raw <thinking>…</thinking> text appeared in the response; after this change it should render as a collapsed "Thinking" block identical to DeepSeek/Qwen3 behavior.

Unit tests cover:

  • Non-streaming: single block, multiple interleaved blocks, no tags
  • Streaming: full tag in one chunk, tag split across multiple chunks

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

Resolves QwenLM#3228

Original prompt

The Minimax model (m2.7/m2.5) returns its reasoning content enclosed in <thinking>...</thinking> tags directly within the main content field, rather than using the standard reasoning_content field. Currently, these tags are exposed as raw text to the user.

We need to update packages/core/src/core/openaiContentGenerator/converter.ts to detect and parse these <thinking> tags in both streaming (convertOpenAIChunkToGemini) and non-streaming (convertOpenAIResponseToGemini) responses.

Requirements:

  1. In convertOpenAIResponseToGemini (non-streaming): Parse the content string, extract any text inside <thinking>...</thinking>, and push it as a part with { text: extractedText, thought: true }. Push the rest of the text as normal text parts.
  2. In convertOpenAIChunkToGemini (streaming): Implement a state machine or buffer to handle <thinking> tags that might be split across multiple chunks. When inside a thinking block, emit the chunks with { text: chunkText, thought: true }. Ensure the <thinking> and </thinking> tags themselves are stripped out and not displayed to the user.

This will allow the existing frontend logic to properly collapse and render the thinking process.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: QwenLM#3228 难度评估:比较好解决
问题本质
minimax 模型(m2.7 / m2.5)的思考内容跟主流模型的处理方式不同。主流模型(DeepSeek、Qwen3 等)会把思考过程放在 API 响应的独立字段 reasoning_content 中,qwen-code 已经能正确识别并折叠显示。但 minimax 把 ... 标签直接内嵌在 content 正文里返回,qwen-code 没有做特殊处理,所以这些标签就原样暴露给了用户。

现有架构已经很完善
通读源码后发现,qwen-code 对 thinking 内容已经有完整的处理链路:

converter.ts(第 940-944 行):从 reasoning_content / reasoning 字段提取思考内容,标记为 { text: ..., thought: true }
thoughtUtils.ts:解析和提取 thought 部分
turn.ts:发射 Thought 事件
ThinkingMessage.tsx:前端展示为可折叠的 "Thinking" 区块
也就是说,只要你把 minimax 在 content 里的 内容提取出来、标记上 thought: true,后续的折叠渲染是现成的。

需要改动的地方
核心改动只需要修改 1 个文件,大约 10-20 行代码:

packages/core/src/core/openaiContentGenerator/converter.ts

在流式处理(convertOpenAIChunkToGemini,约第 948 行)和非流式处理(convertOpenAIResponseToGemini,约第 839 行)中,对 content 字段做一次 标签检测和拆分:

// 现有代码(第 948 行附近):
if (choice.delta?.content) {
if (typeof choice.delta.content === 'string') {
parts.push({ text: choice.delta.content });
}
}

// 改动思路:在 push 之前,检测并剥离 标签
// 将 ... 部分提取为 thought part
// 将剩余内容作为普通 text part
不过流式场景有个小麻烦: 标签可能被拆分在多个 chunk 里(比如一个 chunk 拿到 <think,下一个拿到 ing>),需要做一个小型的状态机来累积和判断。

Assistant: 这个评估非常准确,由于 <thinking> 标签在流式输出中会被切分到不同的 chunk 中,确实需要引入一个状态机(或缓冲区)来拼接和剥离思考内容。

以下是具体的改动建议,主要在 converter.ts 中引入一个缓冲区来跟踪 <thinking> 标签的状态。

1. 新增状态变量

OpenAIContentConverter 类中添加流式处理需要的状态:

private thinkingBuffer = '';
private isInsideThinking = false;

2. 修改流式处理 (convertOpenAIChunkToGemini)

在处理 choice.delta?.content 时,接管 content 并进行标签匹配:

if (choice.delta?.content) {
  if (typeof choice.delta.content === 'string') {
    let content = choice.delta.content;
    
    // 如果不在思考状态中,检测是否包含 <thinking>
    if (!this.isInsideThinking) {
      const startIdx = content.indexOf('<thinking>');
      if (startIdx !== -1) {
        this.isInsideThinking = true;
        // 把 <thinking> 之前的内容作为普通文本推入
        if (startIdx > 0) {
          parts.push({ text: content.substring(0, startIdx) });
        }
        content = content.substring(startIdx + '<thinking>'.length);
      }
    }

    // 如果当前处于思考状态,检测是否包含 </thinking>
    if (this.isInsideThinking) {
      const endIdx = content.indexOf('</thinking>');
      if (endIdx !== -1) {
        this.isInsideThinking = false;
        const thoughtContent = content.substring(0, endIdx);
        this.thinkingBuffer += thoughtContent;
        
        // 发送累积的 thought 
        if (this.thinkingBuffer) {
          parts.push({ text: this.thinkingBuffer, thought: true });
          this.thinkingBuffer = ''; // 清空 buffer
        }
        
        // 标签之后的内容作为普通文本推入
        const remainingText = content.substring(endIdx + '</thinking>'.length);
        if (remainingText) {
          parts.push({ text: remainingText });
        }
      } else {
        // 还没结束,继续累积思考内容,为了能够实时流式显示,也可以直接 push
        // parts.push({ text: content, thought: true }); 
        // 但如果要在内部去标签,直接 push 会更平滑:
        parts.push({ text: content, thought: true });
      }
    } else {
      // 纯普通文本
      parts.push({ text: content });
    }
  }
}

(注:为防止 <thinking> 标签被拦腰截断(如 <thinking>),可以利用现有的 streamingTextBuffer 等机制先对 chunk 做短时间的 buffering,或者使用正则/状态机逐字符处理。)

3. 修改非流式处理 (convertOpenAIResponseToGemini)

非流式可以直接使用正则替换:

if (choice.message?.content) {
  const content = choice.message.content;
  const thinkingRegex = /<thinking>([\s\S]*?)<\/thinking>/g;
  let lastIndex = 0;
  let match;

  while ((match = thinkingRegex.exec(content)) !== null) {
    if (match.index > lastIndex) {
      parts.push({ text: content.substring(lastIndex, match.index) });
    }
    parts.push({ text: matc...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Copilot AI changed the title [WIP] Update parsing for <thinking> tags in OpenAI response feat(core): Parse Minimax <thinking> tags as thought parts in streaming and non-streaming responses Apr 14, 2026
Copilot AI requested a review from chendongai April 14, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

minimaxi.com的minimax-m2.7和m2.5思考过程会添加<thinking></thinking>标签包裹思考内容

2 participants