Bug: Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)
Summary
context_compressor.py reads only response.choices[0].message.content (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.
Root Cause
Ollama 0.22.x changed how thinking models return responses. The reasoning field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without think: true. With limited max_tokens, the model's reasoning tokens consume the entire budget, and content comes back empty.
Hermes already has extract_content_or_reasoning() in auxiliary_client.py (line 3561) that handles this exact case — it checks content first, then falls back to reasoning, reasoning_content, and reasoning_details. But context_compressor.py doesn't use it.
Affected Code
agent/context_compressor.py line 871:
content = response.choices[0].message.content # BUG: ignores reasoning field
Should be:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)
Reproduction
- Configure Hermes with Ollama 0.22.1+ and a thinking model (e.g.,
deepseek-v4-flash:cloud or glm-5.1:cloud) as the compression auxiliary
- Have a conversation long enough to trigger context compression
- Observe that the compression summary is empty — the model's output went entirely to
reasoning while content is ""
- Compressor falls back to a static context marker: "Summary generation failed — inserting static fallback context marker"
Evidence
In our deployment, 3 of 25 compression events produced fallback markers (12% failure rate). After upgrading to Ollama 0.22.1, testing with deepseek-v4-flash:cloud and max_tokens: 100 returns:
content: "" (empty)
reasoning: "..." (928 chars of actual summary content)
With max_tokens: 300, content returns 75 chars while reasoning takes 929 chars — the reasoning field holds the actual compressed content the compressor needs.
Impact
- All thinking models used for compression produce empty summaries when
max_tokens is insufficient to cover both reasoning and content
- The Caveman Compressor plugin also inherits this bug via
super()._generate_summary()
- No recovery path — the fallback marker is static text, not a summary
Fix
One-line change + one import:
# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# Line 871:
content = extract_content_or_reasoning(response)
The extract_content_or_reasoning() function already exists and handles content, reasoning, reasoning_content, and reasoning_details with appropriate fallback logic and inline think-tag stripping.
Environment
Machine 1 (Hazel):
- Hermes Agent:
0b76d23 (2026-04-30), 39 commits behind upstream main
- OS: Ubuntu 25.10 (Questing Quokka), kernel 6.17.0-23-generic, x86_64
- CPU: 12th Gen Intel i7-12700H (14 cores: 6P+8E)
- GPU: NVIDIA GeForce RTX 3060 Laptop 6GB, driver 595.58.03
- RAM: 32 GB (2x16 GB DDR5, 30.5 GiB usable after kernel reservation)
- Python: 3.13.7
- Model:
glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
- Ollama: 0.22.1
- Models affected in testing:
deepseek-v4-flash:cloud, glm-5.1:cloud, qwen3.5-397b-cn-think:latest
Machine 2 (Ember/Nova):
- Hermes Agent:
0b76d23 (2026-04-30), 39 commits behind upstream main
- OS: Ubuntu 26.04 LTS, kernel 7.0.0-15-generic, x86_64
- CPU: AMD Ryzen 9 5900XT 16-Core
- GPU: NVIDIA RTX PRO 4000 Blackwell 24GB, driver 595.58.03
- RAM: 64 GB (60.7 GiB usable after kernel reservation)
- Python: 3.14.4
- Model:
glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
- Ollama: 0.22.1
- Same models affected
Both machines have the local fix applied and verified. The bug is reproducible on stock Hermes Agent without the patch.
Related Issues
Bug: Context compressor ignores
reasoningfield — empty summaries with thinking models (Ollama 0.22+)Summary
context_compressor.pyreads onlyresponse.choices[0].message.content(line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in thereasoningfield withcontentas an empty string — especially whenmax_tokensis constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.Root Cause
Ollama 0.22.x changed how thinking models return responses. The
reasoningfield is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even withoutthink: true. With limitedmax_tokens, the model's reasoning tokens consume the entire budget, andcontentcomes back empty.Hermes already has
extract_content_or_reasoning()inauxiliary_client.py(line 3561) that handles this exact case — it checkscontentfirst, then falls back toreasoning,reasoning_content, andreasoning_details. Butcontext_compressor.pydoesn't use it.Affected Code
agent/context_compressor.pyline 871:Should be:
Reproduction
deepseek-v4-flash:cloudorglm-5.1:cloud) as the compression auxiliaryreasoningwhilecontentis""Evidence
In our deployment, 3 of 25 compression events produced fallback markers (12% failure rate). After upgrading to Ollama 0.22.1, testing with
deepseek-v4-flash:cloudandmax_tokens: 100returns:content: ""(empty)reasoning: "..."(928 chars of actual summary content)With
max_tokens: 300, content returns 75 chars while reasoning takes 929 chars — the reasoning field holds the actual compressed content the compressor needs.Impact
max_tokensis insufficient to cover both reasoning and contentsuper()._generate_summary()Fix
One-line change + one import:
The
extract_content_or_reasoning()function already exists and handlescontent,reasoning,reasoning_content, andreasoning_detailswith appropriate fallback logic and inline think-tag stripping.Environment
Machine 1 (Hazel):
0b76d23(2026-04-30), 39 commits behind upstreammainglm-5.1:cloudvia custom provider (Ollama,http://127.0.0.1:11434/v1)deepseek-v4-flash:cloud,glm-5.1:cloud,qwen3.5-397b-cn-think:latestMachine 2 (Ember/Nova):
0b76d23(2026-04-30), 39 commits behind upstreammainglm-5.1:cloudvia custom provider (Ollama,http://127.0.0.1:11434/v1)Both machines have the local fix applied and verified. The bug is reproducible on stock Hermes Agent without the patch.
Related Issues