Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152) by Adelagric · Pull Request #42201 · NousResearch/hermes-agent

Adelagric · 2026-06-08T15:34:43Z

Problem

On Ollama's OpenAI-compatible /v1/chat/completions, qwen3-family models default to
thinking ON. With tools present, the reasoning swallows the tool call — the response
comes back with no tool_calls and the agent narrates "I'll call X" without ever
executing it (matches ollama/ollama #10976, #11381).

agent/transports/chat_completions.py::build_kwargs emits reasoning_effort only for
Kimi / TokenHub / LM Studio — never for a local Ollama custom provider. So Hermes sends
no reasoning control, Ollama keeps thinking on, and setting reasoning_effort: none in
config is a silent no-op.

What Ollama actually honors

Measured against huihui_ai/Qwen3.6-abliterated:35b, Ollama 0.30.6, temperature: 0,
one tool defined, prompt that requires it, via /v1:

request	tool_calls	finish	completion_tokens
(no reasoning field — current Hermes behaviour for Ollama)	0	stop	900 (cap)
`chat_template_kwargs: {enable_thinking: false}`	0	stop	900 (ignored, #10809)
`reasoning_effort: "medium"`	0	stop	~5500
`reasoning_effort: "none"`	1	tool_calls	28

Ollama maps reasoning_effort: "none" → Think=false (openai/openai.go::FromChatRequest).

Fix

Emit reasoning_effort: "none" from build_kwargs when reasoning is explicitly disabled
(reasoning_config["enabled"] is False) and the endpoint is local (is_local_endpoint,
the same heuristic already used for ollama_num_ctx). It is deliberately conservative — it
only adds behaviour for the explicit-disable + local case; the reasoning-enabled path and
every non-local provider are untouched.

Users then disable thinking with the existing knob:

agent:
  reasoning_effort: none

Test

tests/agent/transports/test_chat_completions_ollama_thinking.py — 4 cases: disabled+local
→ "none", enabled+local → unchanged, disabled+remote → unchanged, no config → unchanged.
All pass.

Fixes #6152.

On Ollama's OpenAI-compatible /v1 endpoint, qwen3-family models default to thinking ON. With tools present the reasoning swallows the tool call and the response comes back with no tool_calls (ollama/ollama NousResearch#10976, NousResearch#11381) — the agent narrates an intent but never executes. The only control Ollama honors on /v1 is reasoning_effort:"none" -> Think=false; chat_template_kwargs.enable_thinking is ignored (NousResearch#10809). The chat_completions transport emitted reasoning_effort only for Kimi/TokenHub/LM Studio, never for local Ollama, so `reasoning_effort: none` in config was a silent no-op. Emit it when reasoning is explicitly disabled and the endpoint is local — restoring tool-calling and leaving every other provider untouched. Fixes NousResearch#6152.

mohamedorigami-jpg · 2026-06-08T16:06:14Z

The fix is well scoped and the test coverage with four clear cases is a nice addition for a first contribution.

One edge case: is_local_endpoint() typically checks for 127.0.0.1, localhost, or 0.0.0.0. Users running Ollama on a LAN server (like http://192.168.1.50:11434/v1) or via a Tailscale IP would not match the local check, so the workaround would not apply. The qwen3 thinking bug is an Ollama-side issue regardless of whether the instance runs on localhost or elsewhere on the network.

Could the condition be broadened to cover any endpoint where the host is identifiable as an Ollama instance (maybe via a /api/tags probe, or a configurable flag) rather than just local IPs? Or if that is too invasive for this fix, a comment noting the LAN gap would help future readers understand the limitation.

Adelagric · 2026-06-08T16:13:40Z

is_local_endpoint() already matches more than loopback: it returns True for RFC-1918 private ranges (10/8, 172.16/12, 192.168/16) via addr.is_private, plus Tailscale CGNAT (100.64.0.0/10) — see agent/model_metadata.py L448-472. So http://192.168.1.50:11434/v1 and Tailscale-mesh Ollama hosts are covered by the condition as written.

The genuine gap is a public Ollama — a routable public IP or a public domain like https://ollama.example.com/v1 — which is_local_endpoint() intentionally treats as non-local. The thinking bug does apply there too.

For that case, rather than a per-request /api/tags probe (detect_local_server_type() right below already does endpoint probing if we wanted to go that route, but it's a network call per request), the lighter path is an explicit opt-in: forward a user-set reasoning_effort via an extra_body-style model config, so remote-Ollama users can disable thinking regardless of host. I can add that here, or at minimum a comment documenting the public-host limitation. Which would you prefer?

…ing too The previous commit only auto-emitted reasoning_effort:"none" for local endpoints (is_local_endpoint), which does not match a public-IP or public-domain Ollama — the qwen3 thinking bug applies there too. Forward a user-configured `model.extra_body` verbatim into the chat request (via the existing extra_body_additions channel, mirroring auxiliary models), so `model.extra_body: {reasoning_effort: none}` disables thinking on ANY Ollama host. An explicit extra_body reasoning_effort takes precedence over the local auto-path so no conflicting top-level value is sent. Refs NousResearch#6152.

Adelagric · 2026-06-08T16:54:30Z

Done — pushed 72c3fba. Added a model.extra_body opt-in, forwarded verbatim into the request via the existing extra_body_additions path (the same mechanism the auxiliary models use), so model.extra_body: {reasoning_effort: none} disables thinking on any Ollama host — public IP or public domain included, not just is_local_endpoint() matches.

The local auto-path stays as zero-config convenience for the common case; an explicit extra_body reasoning_effort takes precedence over it, so a conflicting top-level value is never sent. Tests now cover local auto-emit, remote-via-extra_body, and precedence (6 cases).

mohamedorigami-jpg

Solid approach — extra_body opt-in for remote Ollama plus local auto-path, with the precedence guard so they never conflict. The 6 test cases cover the edge states well.

alt-glitch added type/bug Something isn't working provider/ollama Ollama / local models P2 Medium — degraded but workaround exists labels Jun 8, 2026

mohamedorigami-jpg approved these changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152)#42201

Disable qwen3 thinking on local Ollama so tool-calling works (fixes #6152)#42201
Adelagric wants to merge 2 commits into
NousResearch:mainfrom
Adelagric:fix/ollama-qwen3-thinking-toolcalls

Adelagric commented Jun 8, 2026

Uh oh!

mohamedorigami-jpg commented Jun 8, 2026

Uh oh!

Adelagric commented Jun 8, 2026

Uh oh!

Adelagric commented Jun 8, 2026

Uh oh!

mohamedorigami-jpg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Adelagric commented Jun 8, 2026

Problem

What Ollama actually honors

Fix

Test

Uh oh!

mohamedorigami-jpg commented Jun 8, 2026

Uh oh!

Adelagric commented Jun 8, 2026

Uh oh!

Adelagric commented Jun 8, 2026

Uh oh!

mohamedorigami-jpg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants