Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152)#42201
Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152)#42201Adelagric wants to merge 2 commits into
Conversation
On Ollama's OpenAI-compatible /v1 endpoint, qwen3-family models default to thinking ON. With tools present the reasoning swallows the tool call and the response comes back with no tool_calls (ollama/ollama NousResearch#10976, NousResearch#11381) — the agent narrates an intent but never executes. The only control Ollama honors on /v1 is reasoning_effort:"none" -> Think=false; chat_template_kwargs.enable_thinking is ignored (NousResearch#10809). The chat_completions transport emitted reasoning_effort only for Kimi/TokenHub/LM Studio, never for local Ollama, so `reasoning_effort: none` in config was a silent no-op. Emit it when reasoning is explicitly disabled and the endpoint is local — restoring tool-calling and leaving every other provider untouched. Fixes NousResearch#6152.
|
The fix is well scoped and the test coverage with four clear cases is a nice addition for a first contribution. One edge case: Could the condition be broadened to cover any endpoint where the host is identifiable as an Ollama instance (maybe via a |
|
The genuine gap is a public Ollama — a routable public IP or a public domain like For that case, rather than a per-request |
…ing too
The previous commit only auto-emitted reasoning_effort:"none" for local endpoints (is_local_endpoint), which does not match a public-IP or public-domain Ollama — the qwen3 thinking bug applies there too.
Forward a user-configured `model.extra_body` verbatim into the chat request (via the existing extra_body_additions channel, mirroring auxiliary models), so `model.extra_body: {reasoning_effort: none}` disables thinking on ANY Ollama host. An explicit extra_body reasoning_effort takes precedence over the local auto-path so no conflicting top-level value is sent.
Refs NousResearch#6152.
|
Done — pushed The local auto-path stays as zero-config convenience for the common case; an explicit |
mohamedorigami-jpg
left a comment
There was a problem hiding this comment.
Solid approach — extra_body opt-in for remote Ollama plus local auto-path, with the precedence guard so they never conflict. The 6 test cases cover the edge states well.
Problem
On Ollama's OpenAI-compatible
/v1/chat/completions, qwen3-family models default tothinking ON. With tools present, the reasoning swallows the tool call — the response
comes back with no
tool_callsand the agent narrates "I'll call X" without everexecuting it (matches ollama/ollama #10976, #11381).
agent/transports/chat_completions.py::build_kwargsemitsreasoning_effortonly forKimi / TokenHub / LM Studio — never for a local Ollama
customprovider. So Hermes sendsno reasoning control, Ollama keeps thinking on, and setting
reasoning_effort: noneinconfig is a silent no-op.
What Ollama actually honors
Measured against
huihui_ai/Qwen3.6-abliterated:35b, Ollama 0.30.6,temperature: 0,one tool defined, prompt that requires it, via
/v1:chat_template_kwargs: {enable_thinking: false}reasoning_effort: "medium"reasoning_effort: "none"Ollama maps
reasoning_effort: "none"→Think=false(openai/openai.go::FromChatRequest).Fix
Emit
reasoning_effort: "none"frombuild_kwargswhen reasoning is explicitly disabled(
reasoning_config["enabled"] is False) and the endpoint is local (is_local_endpoint,the same heuristic already used for
ollama_num_ctx). It is deliberately conservative — itonly adds behaviour for the explicit-disable + local case; the reasoning-enabled path and
every non-local provider are untouched.
Users then disable thinking with the existing knob:
Test
tests/agent/transports/test_chat_completions_ollama_thinking.py— 4 cases: disabled+local→
"none", enabled+local → unchanged, disabled+remote → unchanged, no config → unchanged.All pass.
Fixes #6152.