-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Bug Description
When using reasoning-capable LLMs (e.g. MiniMax-M2.5, DeepSeek-R1, QwQ) as the VLM backend,
<think>...</think> reasoning blocks in model responses are stored verbatim into file summaries
and directory overviews/abstracts. This pollutes semantic search results and wastes token budget
during L0/L1 context loading.
Root cause: openviking/models/vlm/backends/openai_vlm.py returns message.content directly
without any post-processing. Many reasoning parsers (e.g. vLLM's minimax_m2_append_think,
deepseek_r1) keep <think> blocks inside message.content.
Steps to Reproduce
- Deploy a reasoning LLM with vLLM (e.g. MiniMax-M2.5-AWQ with
reasoning_parser: minimax_m2_append_think) - Configure it as the VLM backend in
ov.conf - Run
ov add-resource <directory>to import documents - Inspect generated summaries/abstracts via
ov searchor direct storage query
Environment
- vLLM: 0.17.0
- LLM: MiniMax-M2.5
- vLLM reasoning_parser: minimax_m2_append_think
- vLLM max_model_len: 65536
Expected Behavior
Stored summaries and abstracts contain only the final output content.
<think>...</think> reasoning blocks should be stripped before storage.
Actual Behavior
Summaries and abstracts contain the full <think> block
Minimal Reproducible Example
Error Logs
OpenViking Version
0.2.6
Python Version
3.11
Operating System
macOS
Model Backend
OpenAI
Additional Context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status