Skip to content

fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends#949

Closed
deepakdevp wants to merge 1 commit intovolcengine:mainfrom
deepakdevp:fix/vlm-thinking-param
Closed

fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends#949
deepakdevp wants to merge 1 commit intovolcengine:mainfrom
deepakdevp:fix/vlm-thinking-param

Conversation

@deepakdevp
Copy link
Copy Markdown
Contributor

Summary

  • Wire the unused thinking parameter to the OpenAI and LiteLLM VLM backends by passing enable_thinking: False via extra_body when thinking=False
  • Models like qwen3.5-plus (DashScope) default to thinking mode, causing 60+ second timeouts on every memory extraction call because thinking was never explicitly disabled
  • The VolcEngine backend already handles this correctly — this fix brings OpenAI and LiteLLM backends to parity

Changes

  • openai_vlm.py: All 4 completion methods now pass extra_body={"enable_thinking": False} when thinking=False
  • litellm_vlm.py: _build_kwargs() now accepts thinking param; all 4 callers pass it through
  • 5 new tests covering both backends

Fixes #923.

Test plan

  • 5 new tests pass (pytest tests/unit/test_vlm_thinking_param.py)
  • Ruff check and format clean
  • VolcEngine backend unchanged (already correct)

The thinking parameter was accepted but never passed to the API in
OpenAI and LiteLLM VLM backends. Models like qwen3.5-plus that default
to thinking mode would execute expensive chain-of-thought reasoning on
every call, causing 60+ second timeouts.

When thinking=False, the backends now pass enable_thinking=False via
extra_body to explicitly disable thinking mode. The VolcEngine backend
already handled this correctly.

Fixes volcengine#923.
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@qin-ctx
Copy link
Copy Markdown
Collaborator

qin-ctx commented Mar 25, 2026

Thanks for the PR. This is addressing the same root issue as #939 / #923, but in its current form I don't think we should merge it as-is.

A few concerns:

  1. openai_vlm.py now overlaps with fix(vlm): pass thinking flag to dashscope openai backend #939, which was already merged on 2026-03-25. We should avoid merging the same fix twice.
  2. enable_thinking is not a standard OpenAI parameter. It is a vendor-specific extension used by DashScope-compatible endpoints, so sending it unconditionally to OpenAI / Azure / arbitrary LiteLLM providers expands the risk surface.
  3. The current tests prove the parameter is wired, but they do not yet prove we only send it to providers that actually support it.

Suggested next step:

  • drop the openai_vlm.py part from this PR
  • keep only the LiteLLM delta
  • scope extra_body.enable_thinking = false to DashScope / Qwen-compatible LiteLLM routes only
  • add negative tests showing official OpenAI / Azure / non-DashScope LiteLLM paths do not receive this field

If you'd like, please update this PR to be a narrow follow-up to #939 focused on LiteLLM + DashScope compatibility, and we can review that version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: VLM backend thinking parameter defined but never passed to API (causes auto-capture timeout with thinking-enabled models)

2 participants