Skip to content

fix(vlm): strip <think> reasoning tags from model responses#690

Merged
qin-ctx merged 1 commit intomainfrom
fix/strip-think-tags-from-vlm-responses
Mar 17, 2026
Merged

fix(vlm): strip <think> reasoning tags from model responses#690
qin-ctx merged 1 commit intomainfrom
fix/strip-think-tags-from-vlm-responses

Conversation

@qin-ctx
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx commented Mar 17, 2026

Description

When reasoning-capable LLMs (e.g. MiniMax-M2.5, DeepSeek-R1, QwQ) are used as the VLM backend via vLLM, <think>...</think> reasoning blocks in message.content are stored verbatim into file summaries, directory overviews, and abstracts. This pollutes semantic search results and wastes token budget during L0/L1 context loading.

This PR adds a centralized _clean_response() method in VLMBase that strips <think> tags from all VLM responses before they are returned to callers.

Related Issue

Closes #685

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Added _clean_response() method in VLMBase (base.py) with a compiled regex to strip <think>...</think> blocks
  • Applied _clean_response() at all return points in OpenAI, VolcEngine, and LiteLLM backends (12 call sites total)
  • Added regression test suite (tests/models/test_vlm_strip_think_tags.py) covering: single/multiple/multiline think blocks, empty input, JSON with think prefix, and ensuring non-think HTML tags are preserved

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

The fix is applied at the VLM backend layer so all downstream consumers (semantic processor, media summary generator, JSON parser, etc.) receive clean content without needing individual fixes. For responses that don't contain <think> tags, the overhead is a single no-match regex scan which is negligible.

🤖 Generated with Claude Code

…orage

Reasoning-capable LLMs (MiniMax-M2.5, DeepSeek-R1, QwQ) served via vLLM
embed <think>...</think> blocks in message.content. These were stored
verbatim in file summaries and directory overviews, polluting semantic
search and wasting token budget during context loading.

Add _clean_response() in VLMBase that strips <think> tags, and call it
in all three backends (OpenAI, VolcEngine, LiteLLM) at every return point.

Closes #685

Co-Authored-By: Claude Opus 4.6
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 17, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

685 - Fully compliant

Compliant requirements:

  • Strip <think>...</think> reasoning blocks from VLM model responses before storage
  • Ensure stored summaries and abstracts contain only final output content
  • Prevent pollution of semantic search results and waste of token budget
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Regex Scope

Verify if the regex should handle case variations (e.g., , ) or tags with whitespace/attributes, based on actual model outputs from supported reasoning parsers.

_THINK_TAG_RE = re.compile(r"<think>[\s\S]*?</think>")
Thinking Parameter

Confirm that stripping <think> tags unconditionally (even when thinking=True) aligns with the intended behavior for the VolcEngine backend.

return self._clean_response(response.choices[0].message.content or "")

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Respect thinking parameter for tag stripping

Only strip tags when the thinking parameter is False. When thinking is True,
preserve the reasoning tags as intended by the parameter. Apply this change to all
completion methods.

openviking/models/vlm/backends/litellm_vlm.py [227]

-return self._clean_response(response.choices[0].message.content or "")
+content = response.choices[0].message.content or ""
+return self._clean_response(content) if not thinking else content
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the thinking parameter should control whether <think> tags are stripped. The current PR strips tags unconditionally, which contradicts the purpose of the thinking parameter when set to True, so this change improves correctness by respecting the parameter's intent.

Medium

@qin-ctx qin-ctx merged commit 7ea7f35 into main Mar 17, 2026
5 of 6 checks passed
@qin-ctx qin-ctx deleted the fix/strip-think-tags-from-vlm-responses branch March 17, 2026 07:25
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: VLM responses containing <think> reasoning blocks are stored verbatim in summaries and abstracts

3 participants