fix(vlm): strip <think> reasoning tags from model responses#690
Conversation
…orage Reasoning-capable LLMs (MiniMax-M2.5, DeepSeek-R1, QwQ) served via vLLM embed <think>...</think> blocks in message.content. These were stored verbatim in file summaries and directory overviews, polluting semantic search and wasting token budget during context loading. Add _clean_response() in VLMBase that strips <think> tags, and call it in all three backends (OpenAI, VolcEngine, LiteLLM) at every return point. Closes #685 Co-Authored-By: Claude Opus 4.6
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Description
When reasoning-capable LLMs (e.g. MiniMax-M2.5, DeepSeek-R1, QwQ) are used as the VLM backend via vLLM,
<think>...</think>reasoning blocks inmessage.contentare stored verbatim into file summaries, directory overviews, and abstracts. This pollutes semantic search results and wastes token budget during L0/L1 context loading.This PR adds a centralized
_clean_response()method inVLMBasethat strips<think>tags from all VLM responses before they are returned to callers.Related Issue
Closes #685
Type of Change
Changes Made
_clean_response()method inVLMBase(base.py) with a compiled regex to strip<think>...</think>blocks_clean_response()at all return points in OpenAI, VolcEngine, and LiteLLM backends (12 call sites total)tests/models/test_vlm_strip_think_tags.py) covering: single/multiple/multiline think blocks, empty input, JSON with think prefix, and ensuring non-think HTML tags are preservedTesting
Checklist
Additional Notes
The fix is applied at the VLM backend layer so all downstream consumers (semantic processor, media summary generator, JSON parser, etc.) receive clean content without needing individual fixes. For responses that don't contain
<think>tags, the overhead is a single no-match regex scan which is negligible.🤖 Generated with Claude Code