Skip to content

[Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) #19003

@mikronn2

Description

@mikronn2

Bug: Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)

Summary

context_compressor.py reads only response.choices[0].message.content (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Root Cause

Ollama 0.22.x changed how thinking models return responses. The reasoning field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without think: true. With limited max_tokens, the model's reasoning tokens consume the entire budget, and content comes back empty.

Hermes already has extract_content_or_reasoning() in auxiliary_client.py (line 3561) that handles this exact case — it checks content first, then falls back to reasoning, reasoning_content, and reasoning_details. But context_compressor.py doesn't use it.

Affected Code

agent/context_compressor.py line 871:

content = response.choices[0].message.content  # BUG: ignores reasoning field

Should be:

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

Reproduction

  1. Configure Hermes with Ollama 0.22.1+ and a thinking model (e.g., deepseek-v4-flash:cloud or glm-5.1:cloud) as the compression auxiliary
  2. Have a conversation long enough to trigger context compression
  3. Observe that the compression summary is empty — the model's output went entirely to reasoning while content is ""
  4. Compressor falls back to a static context marker: "Summary generation failed — inserting static fallback context marker"

Evidence

In our deployment, 3 of 25 compression events produced fallback markers (12% failure rate). After upgrading to Ollama 0.22.1, testing with deepseek-v4-flash:cloud and max_tokens: 100 returns:

  • content: "" (empty)
  • reasoning: "..." (928 chars of actual summary content)

With max_tokens: 300, content returns 75 chars while reasoning takes 929 chars — the reasoning field holds the actual compressed content the compressor needs.

Impact

  • All thinking models used for compression produce empty summaries when max_tokens is insufficient to cover both reasoning and content
  • The Caveman Compressor plugin also inherits this bug via super()._generate_summary()
  • No recovery path — the fallback marker is static text, not a summary

Fix

One-line change + one import:

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and handles content, reasoning, reasoning_content, and reasoning_details with appropriate fallback logic and inline think-tag stripping.

Environment

Machine 1 (Hazel):

  • Hermes Agent: 0b76d23 (2026-04-30), 39 commits behind upstream main
  • OS: Ubuntu 25.10 (Questing Quokka), kernel 6.17.0-23-generic, x86_64
  • CPU: 12th Gen Intel i7-12700H (14 cores: 6P+8E)
  • GPU: NVIDIA GeForce RTX 3060 Laptop 6GB, driver 595.58.03
  • RAM: 32 GB (2x16 GB DDR5, 30.5 GiB usable after kernel reservation)
  • Python: 3.13.7
  • Model: glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
  • Ollama: 0.22.1
  • Models affected in testing: deepseek-v4-flash:cloud, glm-5.1:cloud, qwen3.5-397b-cn-think:latest

Machine 2 (Ember/Nova):

  • Hermes Agent: 0b76d23 (2026-04-30), 39 commits behind upstream main
  • OS: Ubuntu 26.04 LTS, kernel 7.0.0-15-generic, x86_64
  • CPU: AMD Ryzen 9 5900XT 16-Core
  • GPU: NVIDIA RTX PRO 4000 Blackwell 24GB, driver 595.58.03
  • RAM: 64 GB (60.7 GiB usable after kernel reservation)
  • Python: 3.14.4
  • Model: glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
  • Ollama: 0.22.1
  • Same models affected

Both machines have the local fix applied and verified. The bug is reproducible on stock Hermes Agent without the patch.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderprovider/ollamaOllama / local modelstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions