Skip to content

Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152)#42201

Open
Adelagric wants to merge 2 commits into
NousResearch:mainfrom
Adelagric:fix/ollama-qwen3-thinking-toolcalls
Open

Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152)#42201
Adelagric wants to merge 2 commits into
NousResearch:mainfrom
Adelagric:fix/ollama-qwen3-thinking-toolcalls

Conversation

@Adelagric

Copy link
Copy Markdown

Problem

On Ollama's OpenAI-compatible /v1/chat/completions, qwen3-family models default to
thinking ON. With tools present, the reasoning swallows the tool call — the response
comes back with no tool_calls and the agent narrates "I'll call X" without ever
executing it (matches ollama/ollama #10976, #11381).

agent/transports/chat_completions.py::build_kwargs emits reasoning_effort only for
Kimi / TokenHub / LM Studio — never for a local Ollama custom provider. So Hermes sends
no reasoning control, Ollama keeps thinking on, and setting reasoning_effort: none in
config is a silent no-op.

What Ollama actually honors

Measured against huihui_ai/Qwen3.6-abliterated:35b, Ollama 0.30.6, temperature: 0,
one tool defined, prompt that requires it, via /v1:

request tool_calls finish completion_tokens
(no reasoning field — current Hermes behaviour for Ollama) 0 stop 900 (cap)
chat_template_kwargs: {enable_thinking: false} 0 stop 900 (ignored, #10809)
reasoning_effort: "medium" 0 stop ~5500
reasoning_effort: "none" 1 tool_calls 28

Ollama maps reasoning_effort: "none"Think=false (openai/openai.go::FromChatRequest).

Fix

Emit reasoning_effort: "none" from build_kwargs when reasoning is explicitly disabled
(reasoning_config["enabled"] is False) and the endpoint is local (is_local_endpoint,
the same heuristic already used for ollama_num_ctx). It is deliberately conservative — it
only adds behaviour for the explicit-disable + local case; the reasoning-enabled path and
every non-local provider are untouched.

Users then disable thinking with the existing knob:

agent:
  reasoning_effort: none

Test

tests/agent/transports/test_chat_completions_ollama_thinking.py — 4 cases: disabled+local
"none", enabled+local → unchanged, disabled+remote → unchanged, no config → unchanged.
All pass.

Fixes #6152.

On Ollama's OpenAI-compatible /v1 endpoint, qwen3-family models default to thinking ON. With tools present the reasoning swallows the tool call and the response comes back with no tool_calls (ollama/ollama NousResearch#10976, NousResearch#11381) — the agent narrates an intent but never executes. The only control Ollama honors on /v1 is reasoning_effort:"none" -> Think=false; chat_template_kwargs.enable_thinking is ignored (NousResearch#10809).

The chat_completions transport emitted reasoning_effort only for Kimi/TokenHub/LM Studio, never for local Ollama, so `reasoning_effort: none` in config was a silent no-op. Emit it when reasoning is explicitly disabled and the endpoint is local — restoring tool-calling and leaving every other provider untouched.

Fixes NousResearch#6152.
@alt-glitch alt-glitch added type/bug Something isn't working provider/ollama Ollama / local models P2 Medium — degraded but workaround exists labels Jun 8, 2026
@mohamedorigami-jpg

Copy link
Copy Markdown
Contributor

The fix is well scoped and the test coverage with four clear cases is a nice addition for a first contribution.

One edge case: is_local_endpoint() typically checks for 127.0.0.1, localhost, or 0.0.0.0. Users running Ollama on a LAN server (like http://192.168.1.50:11434/v1) or via a Tailscale IP would not match the local check, so the workaround would not apply. The qwen3 thinking bug is an Ollama-side issue regardless of whether the instance runs on localhost or elsewhere on the network.

Could the condition be broadened to cover any endpoint where the host is identifiable as an Ollama instance (maybe via a /api/tags probe, or a configurable flag) rather than just local IPs? Or if that is too invasive for this fix, a comment noting the LAN gap would help future readers understand the limitation.

@Adelagric

Copy link
Copy Markdown
Author

is_local_endpoint() already matches more than loopback: it returns True for RFC-1918 private ranges (10/8, 172.16/12, 192.168/16) via addr.is_private, plus Tailscale CGNAT (100.64.0.0/10) — see agent/model_metadata.py L448-472. So http://192.168.1.50:11434/v1 and Tailscale-mesh Ollama hosts are covered by the condition as written.

The genuine gap is a public Ollama — a routable public IP or a public domain like https://ollama.example.com/v1 — which is_local_endpoint() intentionally treats as non-local. The thinking bug does apply there too.

For that case, rather than a per-request /api/tags probe (detect_local_server_type() right below already does endpoint probing if we wanted to go that route, but it's a network call per request), the lighter path is an explicit opt-in: forward a user-set reasoning_effort via an extra_body-style model config, so remote-Ollama users can disable thinking regardless of host. I can add that here, or at minimum a comment documenting the public-host limitation. Which would you prefer?

…ing too

The previous commit only auto-emitted reasoning_effort:"none" for local endpoints (is_local_endpoint), which does not match a public-IP or public-domain Ollama — the qwen3 thinking bug applies there too.

Forward a user-configured `model.extra_body` verbatim into the chat request (via the existing extra_body_additions channel, mirroring auxiliary models), so `model.extra_body: {reasoning_effort: none}` disables thinking on ANY Ollama host. An explicit extra_body reasoning_effort takes precedence over the local auto-path so no conflicting top-level value is sent.

Refs NousResearch#6152.
@Adelagric

Copy link
Copy Markdown
Author

Done — pushed 72c3fba. Added a model.extra_body opt-in, forwarded verbatim into the request via the existing extra_body_additions path (the same mechanism the auxiliary models use), so model.extra_body: {reasoning_effort: none} disables thinking on any Ollama host — public IP or public domain included, not just is_local_endpoint() matches.

The local auto-path stays as zero-config convenience for the common case; an explicit extra_body reasoning_effort takes precedence over it, so a conflicting top-level value is never sent. Tests now cover local auto-emit, remote-via-extra_body, and precedence (6 cases).

@mohamedorigami-jpg mohamedorigami-jpg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid approach — extra_body opt-in for remote Ollama plus local auto-path, with the precedence guard so they never conflict. The 6 test cases cover the edge states well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists provider/ollama Ollama / local models type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Pass think: false to Ollama for non-reasoning models

3 participants