fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends by deepakdevp · Pull Request #949 · volcengine/OpenViking

deepakdevp · 2026-03-25T02:35:48Z

Summary

Wire the unused thinking parameter to the OpenAI and LiteLLM VLM backends by passing enable_thinking: False via extra_body when thinking=False
Models like qwen3.5-plus (DashScope) default to thinking mode, causing 60+ second timeouts on every memory extraction call because thinking was never explicitly disabled
The VolcEngine backend already handles this correctly — this fix brings OpenAI and LiteLLM backends to parity

Changes

openai_vlm.py: All 4 completion methods now pass extra_body={"enable_thinking": False} when thinking=False
litellm_vlm.py: _build_kwargs() now accepts thinking param; all 4 callers pass it through
5 new tests covering both backends

Fixes #923.

Test plan

5 new tests pass (pytest tests/unit/test_vlm_thinking_param.py)
Ruff check and format clean
VolcEngine backend unchanged (already correct)

The thinking parameter was accepted but never passed to the API in OpenAI and LiteLLM VLM backends. Models like qwen3.5-plus that default to thinking mode would execute expensive chain-of-thought reasoning on every call, causing 60+ second timeouts. When thinking=False, the backends now pass enable_thinking=False via extra_body to explicitly disable thinking mode. The VolcEngine backend already handled this correctly. Fixes volcengine#923.

github-actions · 2026-03-25T02:36:41Z

Failed to generate code suggestions for PR

qin-ctx · 2026-03-25T05:26:52Z

Thanks for the PR. This is addressing the same root issue as #939 / #923, but in its current form I don't think we should merge it as-is.

A few concerns:

openai_vlm.py now overlaps with fix(vlm): pass thinking flag to dashscope openai backend #939, which was already merged on 2026-03-25. We should avoid merging the same fix twice.
enable_thinking is not a standard OpenAI parameter. It is a vendor-specific extension used by DashScope-compatible endpoints, so sending it unconditionally to OpenAI / Azure / arbitrary LiteLLM providers expands the risk surface.
The current tests prove the parameter is wired, but they do not yet prove we only send it to providers that actually support it.

Suggested next step:

drop the openai_vlm.py part from this PR
keep only the LiteLLM delta
scope extra_body.enable_thinking = false to DashScope / Qwen-compatible LiteLLM routes only
add negative tests showing official OpenAI / Azure / non-DashScope LiteLLM paths do not receive this field

If you'd like, please update this PR to be a narrow follow-up to #939 focused on LiteLLM + DashScope compatibility, and we can review that version.

github-project-automation bot added this to OpenViking project Mar 25, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 25, 2026

qin-ctx closed this Mar 25, 2026

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 25, 2026

deepakdevp mentioned this pull request Mar 25, 2026

fix(vlm): scope LiteLLM thinking param to DashScope providers only #958

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends#949

fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends#949
deepakdevp wants to merge 1 commit intovolcengine:mainfrom
deepakdevp:fix/vlm-thinking-param

deepakdevp commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

qin-ctx commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

deepakdevp commented Mar 25, 2026

Summary

Changes

Test plan

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

qin-ctx commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants