Skip to content

[Bug]: VLM responses containing <think> reasoning blocks are stored verbatim in summaries and abstracts #685

@sycoral

Description

@sycoral

Bug Description

When using reasoning-capable LLMs (e.g. MiniMax-M2.5, DeepSeek-R1, QwQ) as the VLM backend,
<think>...</think> reasoning blocks in model responses are stored verbatim into file summaries
and directory overviews/abstracts. This pollutes semantic search results and wastes token budget
during L0/L1 context loading.

Root cause: openviking/models/vlm/backends/openai_vlm.py returns message.content directly
without any post-processing. Many reasoning parsers (e.g. vLLM's minimax_m2_append_think,
deepseek_r1) keep <think> blocks inside message.content.

Steps to Reproduce

  1. Deploy a reasoning LLM with vLLM (e.g. MiniMax-M2.5-AWQ with
    reasoning_parser: minimax_m2_append_think)
  2. Configure it as the VLM backend in ov.conf
  3. Run ov add-resource <directory> to import documents
  4. Inspect generated summaries/abstracts via ov search or direct storage query

Environment

  • vLLM: 0.17.0
  • LLM: MiniMax-M2.5
  • vLLM reasoning_parser: minimax_m2_append_think
  • vLLM max_model_len: 65536

Expected Behavior

Stored summaries and abstracts contain only the final output content.
<think>...</think> reasoning blocks should be stripped before storage.

Actual Behavior

Summaries and abstracts contain the full <think> block

Minimal Reproducible Example

Error Logs

OpenViking Version

0.2.6

Python Version

3.11

Operating System

macOS

Model Backend

OpenAI

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions