fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends#949
Closed
deepakdevp wants to merge 1 commit intovolcengine:mainfrom
Closed
fix(vlm): wire thinking parameter to OpenAI and LiteLLM backends#949deepakdevp wants to merge 1 commit intovolcengine:mainfrom
deepakdevp wants to merge 1 commit intovolcengine:mainfrom
Conversation
The thinking parameter was accepted but never passed to the API in OpenAI and LiteLLM VLM backends. Models like qwen3.5-plus that default to thinking mode would execute expensive chain-of-thought reasoning on every call, causing 60+ second timeouts. When thinking=False, the backends now pass enable_thinking=False via extra_body to explicitly disable thinking mode. The VolcEngine backend already handled this correctly. Fixes volcengine#923.
|
Failed to generate code suggestions for PR |
Collaborator
|
Thanks for the PR. This is addressing the same root issue as #939 / #923, but in its current form I don't think we should merge it as-is. A few concerns:
Suggested next step:
If you'd like, please update this PR to be a narrow follow-up to #939 focused on LiteLLM + DashScope compatibility, and we can review that version. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
thinkingparameter to the OpenAI and LiteLLM VLM backends by passingenable_thinking: Falseviaextra_bodywhenthinking=Falseqwen3.5-plus(DashScope) default to thinking mode, causing 60+ second timeouts on every memory extraction call because thinking was never explicitly disabledChanges
openai_vlm.py: All 4 completion methods now passextra_body={"enable_thinking": False}whenthinking=Falselitellm_vlm.py:_build_kwargs()now acceptsthinkingparam; all 4 callers pass it throughFixes #923.
Test plan
pytest tests/unit/test_vlm_thinking_param.py)