fix(ollama): emit top-level reasoning_effort=none on /v1/chat/complet… by Epoxidex · Pull Request #29820 · NousResearch/hermes-agent

Epoxidex · 2026-05-21T12:52:13Z

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/completions + propagate reasoning_config to bg-review fork

Fixes #6152
Fixes #25758

What does this PR do?

When reasoning_effort: none is set (via /reasoning none or config.yaml), Hermes should suppress thinking on Ollama. It doesn't — the model always runs its full reasoning chain regardless of the user's config.

Two root causes:

Defect 1 (main agent): The custom provider plugin emits extra_body["think"] = False to disable thinking. This works on Ollama's native /api/chat endpoint but is silently ignored on /v1/chat/completions — the OpenAI-compat path Hermes actually uses. The field that /v1/chat/completions honours is a top-level reasoning_effort = "none", which was never emitted.

Defect 2 (bg-review fork): _spawn_background_review() creates a new AIAgent without passing reasoning_config. Even when Defect 1 is fixed, the review fork always defaults to reasoning_effort=medium and can produce 200k+ token reasoning traces taking up to 28 minutes per turn.

Related Issue

Fixes #6152
Fixes #25758

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Changes Made

plugins/model-providers/custom/__init__.py — emit top_level["reasoning_effort"] = "none" alongside the existing extra_body["think"] = False when reasoning is disabled. The top-level field is what Ollama's /v1/chat/completions actually processes (confirmed in openai/openai.go); think=False is kept for /api/chat backward-compat and proxies.
agent/background_review.py — pass reasoning_config=getattr(agent, "reasoning_config", None) when constructing the forked AIAgent, so the review fork inherits the parent's reasoning settings.
tests/plugins/model_providers/test_custom_profile.py — new test file covering the CustomProfile.build_api_kwargs_extras() wire shape, following the pattern in test_deepseek_profile.py.

How to Test

Configure Hermes with a local Ollama endpoint running a thinking-capable model (e.g. qwen3.6:35b)
Run hermes and set /reasoning none
Send any message — observe that the response arrives in ~2-5s with out= token count matching only the answer (no reasoning chain)
Compare with /reasoning medium — response takes 30-60s+ with significantly higher out= token count
Run pytest tests/plugins/model_providers/test_custom_profile.py -v — all 15 tests pass

Checklist

Code

I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/)
I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run pytest tests/ -q and all tests pass
I've added tests for my changes
I've tested on my platform: Ubuntu 24.04 (WSL2)

Documentation & Housekeeping

I've updated relevant documentation — N/A
I've updated cli-config.yaml.example — N/A
I've updated CONTRIBUTING.md or AGENTS.md — N/A
I've considered cross-platform impact — fix is provider-level, platform-agnostic
I've updated tool descriptions/schemas — N/A

Screenshots / Logs

Without fix (reasoning_effort: none silently ignored — model thinks):

API call #1: model=qwen3.6:35b in=15964 out=26 latency=68.9s

With fix (thinking correctly disabled):

API call #1: model=qwen3.6:35b in=19018 out=8 latency=47.3s

Direct Ollama API verification (test_ollama_thinking.py):

Default (no reasoning_effort) — Time: 17.3s, Thinking: YES (589 chars)
reasoning_effort="none"       — Time:  1.6s, Thinking: NO
think=False (extra_body)      — Time: 21.7s, Thinking: YES (ignored as expected)
reasoning_effort="low"        — Time: 14.3s, Thinking: YES (low effort, not disabled)

…ions + propagate reasoning_config to bg-review fork Fixes NousResearch#6152 Fixes NousResearch#25758

alt-glitch · 2026-05-21T13:15:06Z

Duplicate of #25866 — same two defects: (1) custom provider emits extra_body.think=False which Ollama ignores on /v1/chat/completions instead of top-level reasoning_effort="none", (2) _spawn_background_review() doesn't propagate reasoning_config, fork defaults to medium. See also #25758 and #6152.

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/complet…

a1ae6de

…ions + propagate reasoning_config to bg-review fork Fixes NousResearch#6152 Fixes NousResearch#25758

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins provider/ollama Ollama / local models labels May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/complet…#29820

fix(ollama): emit top-level reasoning_effort=none on /v1/chat/complet…#29820
Epoxidex wants to merge 1 commit into
NousResearch:mainfrom
Epoxidex:fix/ollama-reasoning-effort-none

Epoxidex commented May 21, 2026

Uh oh!

alt-glitch commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Epoxidex commented May 21, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

alt-glitch commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants