fix(vlm): strip <think> reasoning tags from model responses by qin-ctx · Pull Request #690 · volcengine/OpenViking

qin-ctx · 2026-03-17T07:01:15Z

Description

When reasoning-capable LLMs (e.g. MiniMax-M2.5, DeepSeek-R1, QwQ) are used as the VLM backend via vLLM, <think>...</think> reasoning blocks in message.content are stored verbatim into file summaries, directory overviews, and abstracts. This pollutes semantic search results and wastes token budget during L0/L1 context loading.

This PR adds a centralized _clean_response() method in VLMBase that strips <think> tags from all VLM responses before they are returned to callers.

Related Issue

Closes #685

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

Added _clean_response() method in VLMBase (base.py) with a compiled regex to strip <think>...</think> blocks
Applied _clean_response() at all return points in OpenAI, VolcEngine, and LiteLLM backends (12 call sites total)
Added regression test suite (tests/models/test_vlm_strip_think_tags.py) covering: single/multiple/multiline think blocks, empty input, JSON with think prefix, and ensuring non-think HTML tags are preserved

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

The fix is applied at the VLM backend layer so all downstream consumers (semantic processor, media summary generator, JSON parser, etc.) receive clean content without needing individual fixes. For responses that don't contain <think> tags, the overhead is a single no-match regex scan which is negligible.

🤖 Generated with Claude Code

…orage Reasoning-capable LLMs (MiniMax-M2.5, DeepSeek-R1, QwQ) served via vLLM embed <think>...</think> blocks in message.content. These were stored verbatim in file summaries and directory overviews, polluting semantic search and wasting token budget during context loading. Add _clean_response() in VLMBase that strips <think> tags, and call it in all three backends (OpenAI, VolcEngine, LiteLLM) at every return point. Closes #685 Co-Authored-By: Claude Opus 4.6

CLAassistant · 2026-03-17T07:01:23Z

All committers have signed the CLA.

github-actions · 2026-03-17T07:02:40Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅ 685 - Fully compliant Compliant requirements: Strip `<think>...</think>` reasoning blocks from VLM model responses before storage Ensure stored summaries and abstracts contain only final output content Prevent pollution of semantic search results and waste of token budget
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Regex Scope Verify if the regex should handle case variations (e.g., , ) or tags with whitespace/attributes, based on actual model outputs from supported reasoning parsers. _THINK_TAG_RE = re.compile(r"<think>[\s\S]?</think>") Thinking Parameter* Confirm that stripping `<think>` tags unconditionally (even when `thinking=True`) aligns with the intended behavior for the VolcEngine backend. return self._clean_response(response.choices[0].message.content or "")

github-actions · 2026-03-17T07:04:16Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Respect thinking parameter for tag stripping Only strip tags when the `thinking` parameter is `False`. When `thinking` is `True`, preserve the reasoning tags as intended by the parameter. Apply this change to all completion methods. openviking/models/vlm/backends/litellm_vlm.py [227] -return self._clean_response(response.choices[0].message.content or "") +content = response.choices[0].message.content or "" +return self._clean_response(content) if not thinking else content Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that the `thinking` parameter should control whether `<think>` tags are stripped. The current PR strips tags unconditionally, which contradicts the purpose of the `thinking` parameter when set to True, so this change improves correctness by respecting the parameter's intent.	Medium

github-project-automation bot added this to OpenViking project Mar 17, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 17, 2026

github-actions bot added the Review effort 3/5 label Mar 17, 2026

zhoujh01 approved these changes Mar 17, 2026

View reviewed changes

qin-ctx merged commit 7ea7f35 into main Mar 17, 2026
5 of 6 checks passed

qin-ctx deleted the fix/strip-think-tags-from-vlm-responses branch March 17, 2026 07:25

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vlm): strip <think> reasoning tags from model responses#690

fix(vlm): strip <think> reasoning tags from model responses#690
qin-ctx merged 1 commit intomainfrom
fix/strip-think-tags-from-vlm-responses

qin-ctx commented Mar 17, 2026

Uh oh!

CLAassistant commented Mar 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qin-ctx commented Mar 17, 2026

Description

Related Issue

Type of Change

Changes Made

Testing

Checklist

Additional Notes

Uh oh!

CLAassistant commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 17, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Mar 17, 2026

PR Code Suggestions ✨

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 17, 2026 •

edited

Loading