Skip to content

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/complet…#29820

Open
Epoxidex wants to merge 1 commit into
NousResearch:mainfrom
Epoxidex:fix/ollama-reasoning-effort-none
Open

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/complet…#29820
Epoxidex wants to merge 1 commit into
NousResearch:mainfrom
Epoxidex:fix/ollama-reasoning-effort-none

Conversation

@Epoxidex

Copy link
Copy Markdown

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/completions + propagate reasoning_config to bg-review fork

Fixes #6152
Fixes #25758

What does this PR do?

When reasoning_effort: none is set (via /reasoning none or config.yaml), Hermes should suppress thinking on Ollama. It doesn't — the model always runs its full reasoning chain regardless of the user's config.

Two root causes:

Defect 1 (main agent): The custom provider plugin emits extra_body["think"] = False to disable thinking. This works on Ollama's native /api/chat endpoint but is silently ignored on /v1/chat/completions — the OpenAI-compat path Hermes actually uses. The field that /v1/chat/completions honours is a top-level reasoning_effort = "none", which was never emitted.

Defect 2 (bg-review fork): _spawn_background_review() creates a new AIAgent without passing reasoning_config. Even when Defect 1 is fixed, the review fork always defaults to reasoning_effort=medium and can produce 200k+ token reasoning traces taking up to 28 minutes per turn.

Related Issue

Fixes #6152
Fixes #25758

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • plugins/model-providers/custom/__init__.py — emit top_level["reasoning_effort"] = "none" alongside the existing extra_body["think"] = False when reasoning is disabled. The top-level field is what Ollama's /v1/chat/completions actually processes (confirmed in openai/openai.go); think=False is kept for /api/chat backward-compat and proxies.

  • agent/background_review.py — pass reasoning_config=getattr(agent, "reasoning_config", None) when constructing the forked AIAgent, so the review fork inherits the parent's reasoning settings.

  • tests/plugins/model_providers/test_custom_profile.py — new test file covering the CustomProfile.build_api_kwargs_extras() wire shape, following the pattern in test_deepseek_profile.py.

How to Test

  1. Configure Hermes with a local Ollama endpoint running a thinking-capable model (e.g. qwen3.6:35b)
  2. Run hermes and set /reasoning none
  3. Send any message — observe that the response arrives in ~2-5s with out= token count matching only the answer (no reasoning chain)
  4. Compare with /reasoning medium — response takes 30-60s+ with significantly higher out= token count
  5. Run pytest tests/plugins/model_providers/test_custom_profile.py -v — all 15 tests pass

Checklist

Code

Documentation & Housekeeping

  • I've updated relevant documentation — N/A
  • I've updated cli-config.yaml.example — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md — N/A
  • I've considered cross-platform impact — fix is provider-level, platform-agnostic
  • I've updated tool descriptions/schemas — N/A

Screenshots / Logs

Without fix (reasoning_effort: none silently ignored — model thinks):

API call #1: model=qwen3.6:35b in=15964 out=26 latency=68.9s

With fix (thinking correctly disabled):

API call #1: model=qwen3.6:35b in=19018 out=8 latency=47.3s

Direct Ollama API verification (test_ollama_thinking.py):

Default (no reasoning_effort) — Time: 17.3s, Thinking: YES (589 chars)
reasoning_effort="none"       — Time:  1.6s, Thinking: NO
think=False (extra_body)      — Time: 21.7s, Thinking: YES (ignored as expected)
reasoning_effort="low"        — Time: 14.3s, Thinking: YES (low effort, not disabled)

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins provider/ollama Ollama / local models labels May 21, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #25866 — same two defects: (1) custom provider emits extra_body.think=False which Ollama ignores on /v1/chat/completions instead of top-level reasoning_effort="none", (2) _spawn_background_review() doesn't propagate reasoning_config, fork defaults to medium. See also #25758 and #6152.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins P2 Medium — degraded but workaround exists provider/ollama Ollama / local models type/bug Something isn't working

Projects

None yet

2 participants