You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug]: agent.reasoning_effort: none silently ignored on Ollama — main agent stuck in medium mode, bg-review fork can spiral (up to 65k tokens / 28 min) #25758
When using Hermès with a custom provider pointing at a local Ollama instance and a thinking-capable model (Qwen3.x, DeepSeek-style), agent.reasoning_effort: none is silently ignored — both for the main agent and for the background-review fork. The model thinks anyway, sometimes catastrophically (we observed up to 209,538 chars of reasoning_content and 65,056 output tokens in a single tour, blocking the GPU for 28 minutes).
The root cause is in two distinct places in run_agent.py, and both need to be addressed:
For the main agent on Ollama: Hermès emits extra_body.think=False via the custom provider plugin (the fix for [Feature]: Pass think: false to Ollama for non-reasoning models #6152), but Ollama's /v1/chat/completions endpoint silently ignores think:false. The top-level reasoning_effort=none field — which Ollama does respect — is never emitted.
For the background-review fork: _spawn_background_review() creates a new AIAgent without propagating self.reasoning_config. Even when (1) is fixed, the fork still runs with the default reasoning_effort=medium, because extra_body.think=False is also never produced for it.
Defect (2) is structurally similar to #15543 (fork missing api_key/base_url/api_mode until that was fixed) — the fork loses inherited state.
Ollama upstream: ignores think:false on /v1/chat/completions (ollama#14820)
Steps to Reproduce
Configure Hermès with a custom provider pointing at a local Ollama instance running a thinking-capable model.
Set agent.reasoning_effort: none in config.yaml.
Send any prompt to the main agent — observe in the session JSON (/opt/data/sessions/session_*.json) that assistant messages still carry non-empty reasoning_content.
Run a non-trivial session (5+ tool calls, file edits) so that _should_review_memory or _should_review_skills triggers a bg-review at the end.
Quit the session.
On some workloads — especially when the skill content contains internal contradictions the model tries to resolve — the bg-review enters a reasoning loop:
Hermès' _empty_recovery_synthetic mechanism correctly catches the empty response and nudges the model, but 28 minutes of GPU decode are already gone.
Affected Component
CLI (interactive chat)
Messaging Platform (if gateway-related)
No response
Debug Report
## Root Cause### Defect 1 — main agent on Ollama
The `custom` provider plugin sets `extra_body["think"] = False` when `reasoning_config.effort == "none"` or `enabled is False`. But Ollama silently ignores `extra_body.think` on `/v1/chat/completions` (it only honors it on `/api/chat`). The top-level `reasoning_effort` field that Ollama **does** support is never emitted from `_build_api_kwargs()`for`custom` providers.### Defect 2 — bg-review forkIn `run_agent.py`, `_spawn_background_review()` (around line 4117):review_agent = AIAgent( model=self.model, max_iterations=16, quiet_mode=True, platform=self.platform, provider=self.provider, api_mode=_parent_runtime.get("api_mode") or None, base_url=_parent_runtime.get("base_url") or None, api_key=_parent_runtime.get("api_key") or None, credential_pool=getattr(self, "_credential_pool", None), parent_session_id=self.session_id, enabled_toolsets=["memory", "skills"],)review_agent._memory_write_origin = "background_review"review_agent._memory_write_context = "background_review"review_agent._memory_store = self._memory_storereview_agent._memory_enabled = self._memory_enabledreview_agent._user_profile_enabled = self._user_profile_enabledreview_agent._memory_nudge_interval = 0review_agent._skill_nudge_interval = 0`self.reasoning_config` is never propagated. Since `AIAgent.__init__` defaults `reasoning_config=None` (= medium), the fork runs in medium mode regardless of the parent's effective config — even after Defect 1 is fixed for the parent.## Proposed Fix### Fix 1 — emit top-level `reasoning_effort` for OllamaIn `_build_api_kwargs()` in `run_agent.py`, after the existing `extra_body["think"]=False` block, mirror the value at the top level when the target is Ollama-style:if isinstance(api_kwargs, dict): _eb = api_kwargs.get("extra_body") if isinstance(_eb, dict) and _eb.get("think") is False: api_kwargs["reasoning_effort"] = "none"Rationale: Ollama's `/v1/chat/completions` accepts `reasoning_effort` at the top level (it's a standard OpenAI-style field for some upstream models) and uses it to suppress thinking. Other Ollama-compatible servers that don't recognize the field will simply ignore it. This was reported separately at [ollama#14820](https://github.com/ollama/ollama/issues/14820).### Fix 2 — propagate `reasoning_config` to the forkIn `_spawn_background_review()`, after the existing `review_agent.X = self.X` assignments:review_agent.reasoning_config = self.reasoning_config or {"enabled": False, "effort": "none"}The `or {...}` fallback handles the case where the parent itself has `reasoning_config=None` (default). For non-Ollama setups, this is effectively a no-op for models that don't expose reasoning toggling — the provider plugins ignore the field.## ValidationAfter applying both fixes in our deployment, on the same workload that previously spiraled:| Metric | Before fixes | After fixes ||---|---|---|| `reasoning_content` per main-agent message | non-empty | **0 chars** || `reasoning_content` per bg-review message | up to 209,538 chars | **0 chars** || Max `out=` tokens on bg-review tour | 65,056 | **3,894** || Max latency on bg-review tour | 1,724s (28 min) | **88s** || `_empty_recovery_synthetic` triggered | yes | no || Bg-review still produces useful `tool_calls` | yes (eclipsed by reasoning) | **yes (clean)** |The bg-review continues to do real work — `skill_manage` patches, `execute_code` blocks of 10–14k chars — so the self-improvement loop stays fully functional. The main agent also stops paying the medium-reasoning tax.## References- #6152 — initial Ollama `think:false` support (resolved, but the emitted field is silently dropped by `/v1/chat/completions`)- #15543 — earlier instance of bg-review fork losing inherited state (auth credentials)- [ollama/ollama#14820](https://github.com/ollama/ollama/issues/14820) — upstream Ollama bug: `think:false` ignored on `/v1/chat/completions`
Operating System
MacOS 26.4.1
Python Version
3.13.5
Hermes Version
v0.13.0
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
Root Cause
Defect 1 — main agent on Ollama
The custom provider plugin sets extra_body["think"] = False when reasoning_config.effort == "none" or enabled is False. But Ollama silently ignores extra_body.think on /v1/chat/completions (it only honors it on /api/chat). The top-level reasoning_effort field that Ollama does support is never emitted from _build_api_kwargs() for custom providers.
Defect 2 — bg-review fork
In run_agent.py, _spawn_background_review() (around line 4117):
self.reasoning_config is never propagated. Since AIAgent.__init__ defaults reasoning_config=None (= medium), the fork runs in medium mode regardless of the parent's effective config — even after Defect 1 is fixed for the parent.
Proposed Fix (optional)
Fix 1 — emit top-level reasoning_effort for Ollama
In _build_api_kwargs() in run_agent.py, after the existing extra_body["think"]=False block, mirror the value at the top level when the target is Ollama-style:
Rationale: Ollama's /v1/chat/completions accepts reasoning_effort at the top level (it's a standard OpenAI-style field for some upstream models) and uses it to suppress thinking. Other Ollama-compatible servers that don't recognize the field will simply ignore it. This was reported separately at ollama#14820.
Fix 2 — propagate reasoning_config to the fork
In _spawn_background_review(), after the existing review_agent.X = self.X assignments:
The or {...} fallback handles the case where the parent itself has reasoning_config=None (default). For non-Ollama setups, this is effectively a no-op for models that don't expose reasoning toggling — the provider plugins ignore the field.
Validation
After applying both fixes in our deployment, on the same workload that previously spiraled:
Metric
Before fixes
After fixes
reasoning_content per main-agent message
non-empty
0 chars
reasoning_content per bg-review message
up to 209,538 chars
0 chars
Max out= tokens on bg-review tour
65,056
3,894
Max latency on bg-review tour
1,724s (28 min)
88s
_empty_recovery_synthetic triggered
yes
no
Bg-review still produces useful tool_calls
yes (eclipsed by reasoning)
yes (clean)
The bg-review continues to do real work — skill_manage patches, execute_code blocks of 10–14k chars — so the self-improvement loop stays fully functional. The main agent also stops paying the medium-reasoning tax.
Bug Description
When using Hermès with a
customprovider pointing at a local Ollama instance and a thinking-capable model (Qwen3.x, DeepSeek-style),agent.reasoning_effort: noneis silently ignored — both for the main agent and for the background-review fork. The model thinks anyway, sometimes catastrophically (we observed up to 209,538 chars ofreasoning_contentand 65,056 output tokens in a single tour, blocking the GPU for 28 minutes).The root cause is in two distinct places in
run_agent.py, and both need to be addressed:For the main agent on Ollama: Hermès emits
extra_body.think=Falsevia thecustomprovider plugin (the fix for [Feature]: Passthink: falseto Ollama for non-reasoning models #6152), but Ollama's/v1/chat/completionsendpoint silently ignoresthink:false. The top-levelreasoning_effort=nonefield — which Ollama does respect — is never emitted.For the background-review fork:
_spawn_background_review()creates a newAIAgentwithout propagatingself.reasoning_config. Even when (1) is fixed, the fork still runs with the defaultreasoning_effort=medium, becauseextra_body.think=Falseis also never produced for it.Defect (2) is structurally similar to #15543 (fork missing
api_key/base_url/api_modeuntil that was fixed) — the fork loses inherited state.Environment
nousresearch/hermes-agent:latest(Docker)custom(Ollama 0.19, local, MLX backend)qwen3.6:35b-a3b-coding-nvfp4(MoE, MLX-accelerated, thinking-capable)agent.reasoning_effort: nonethink:falseon/v1/chat/completions(ollama#14820)Steps to Reproduce
customprovider pointing at a local Ollama instance running a thinking-capable model.agent.reasoning_effort: noneinconfig.yaml./opt/data/sessions/session_*.json) that assistant messages still carry non-emptyreasoning_content._should_review_memoryor_should_review_skillstriggers a bg-review at the end.Expected Behavior
No reasoning loop.
Actual Behavior
Evidence
From a real bg-review session log:
The offending assistant message in the session JSON:
{ "role": "assistant", "content": "(empty)", "reasoning_content": "...209538 chars of WAIT / OH WAIT cascade...", "finish_reason": "stop", "_empty_recovery_synthetic": true }Hermès'
_empty_recovery_syntheticmechanism correctly catches the empty response and nudges the model, but 28 minutes of GPU decode are already gone.Affected Component
CLI (interactive chat)
Messaging Platform (if gateway-related)
No response
Debug Report
Operating System
MacOS 26.4.1
Python Version
3.13.5
Hermes Version
v0.13.0
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
Root Cause
Defect 1 — main agent on Ollama
The
customprovider plugin setsextra_body["think"] = Falsewhenreasoning_config.effort == "none"orenabled is False. But Ollama silently ignoresextra_body.thinkon/v1/chat/completions(it only honors it on/api/chat). The top-levelreasoning_effortfield that Ollama does support is never emitted from_build_api_kwargs()forcustomproviders.Defect 2 — bg-review fork
In
run_agent.py,_spawn_background_review()(around line 4117):self.reasoning_configis never propagated. SinceAIAgent.__init__defaultsreasoning_config=None(= medium), the fork runs in medium mode regardless of the parent's effective config — even after Defect 1 is fixed for the parent.Proposed Fix (optional)
Fix 1 — emit top-level
reasoning_effortfor OllamaIn
_build_api_kwargs()inrun_agent.py, after the existingextra_body["think"]=Falseblock, mirror the value at the top level when the target is Ollama-style:Rationale: Ollama's
/v1/chat/completionsacceptsreasoning_effortat the top level (it's a standard OpenAI-style field for some upstream models) and uses it to suppress thinking. Other Ollama-compatible servers that don't recognize the field will simply ignore it. This was reported separately at ollama#14820.Fix 2 — propagate
reasoning_configto the forkIn
_spawn_background_review(), after the existingreview_agent.X = self.Xassignments:The
or {...}fallback handles the case where the parent itself hasreasoning_config=None(default). For non-Ollama setups, this is effectively a no-op for models that don't expose reasoning toggling — the provider plugins ignore the field.Validation
After applying both fixes in our deployment, on the same workload that previously spiraled:
reasoning_contentper main-agent messagereasoning_contentper bg-review messageout=tokens on bg-review tour_empty_recovery_synthetictriggeredtool_callsThe bg-review continues to do real work —
skill_managepatches,execute_codeblocks of 10–14k chars — so the self-improvement loop stays fully functional. The main agent also stops paying the medium-reasoning tax.References
think: falseto Ollama for non-reasoning models #6152 — initial Ollamathink:falsesupport (resolved, but the emitted field is silently dropped by/v1/chat/completions)think:falseignored on/v1/chat/completionsHappy to submit a PR with both fixes if the maintainers want.
Are you willing to submit a PR for this?