[Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)

## Bug: Context compressor ignores `reasoning` field — empty summaries with thinking models (Ollama 0.22+)

### Summary

`context_compressor.py` reads only `response.choices[0].message.content` (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the `reasoning` field with `content` as an empty string — especially when `max_tokens` is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

### Root Cause

Ollama 0.22.x changed how thinking models return responses. The `reasoning` field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without `think: true`. With limited `max_tokens`, the model's reasoning tokens consume the entire budget, and `content` comes back empty.

Hermes already has `extract_content_or_reasoning()` in `auxiliary_client.py` (line 3561) that handles this exact case — it checks `content` first, then falls back to `reasoning`, `reasoning_content`, and `reasoning_details`. But `context_compressor.py` doesn't use it.

### Affected Code

`agent/context_compressor.py` line 871:
```python
content = response.choices[0].message.content  # BUG: ignores reasoning field
```

Should be:
```python
from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)
```

### Reproduction

1. Configure Hermes with Ollama 0.22.1+ and a thinking model (e.g., `deepseek-v4-flash:cloud` or `glm-5.1:cloud`) as the compression auxiliary
2. Have a conversation long enough to trigger context compression
3. Observe that the compression summary is empty — the model's output went entirely to `reasoning` while `content` is `""`
4. Compressor falls back to a static context marker: "Summary generation failed — inserting static fallback context marker"

### Evidence

In our deployment, 3 of 25 compression events produced fallback markers (12% failure rate). After upgrading to Ollama 0.22.1, testing with `deepseek-v4-flash:cloud` and `max_tokens: 100` returns:
- `content: ""` (empty)
- `reasoning: "..."` (928 chars of actual summary content)

With `max_tokens: 300`, content returns 75 chars while reasoning takes 929 chars — the reasoning field holds the actual compressed content the compressor needs.

### Impact

- **All thinking models used for compression** produce empty summaries when `max_tokens` is insufficient to cover both reasoning and content
- The Caveman Compressor plugin also inherits this bug via `super()._generate_summary()`
- No recovery path — the fallback marker is static text, not a summary

### Fix

One-line change + one import:

```python
# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)
```

The `extract_content_or_reasoning()` function already exists and handles `content`, `reasoning`, `reasoning_content`, and `reasoning_details` with appropriate fallback logic and inline think-tag stripping.

### Environment

**Machine 1 (Hazel):**
- Hermes Agent: `0b76d23` (2026-04-30), 39 commits behind upstream `main`
- OS: Ubuntu 25.10 (Questing Quokka), kernel 6.17.0-23-generic, x86_64
- CPU: 12th Gen Intel i7-12700H (14 cores: 6P+8E)
- GPU: NVIDIA GeForce RTX 3060 Laptop 6GB, driver 595.58.03
- RAM: 32 GB (2x16 GB DDR5, 30.5 GiB usable after kernel reservation)
- Python: 3.13.7
- Model: `glm-5.1:cloud` via custom provider (Ollama, `http://127.0.0.1:11434/v1`)
- Ollama: 0.22.1
- Models affected in testing: `deepseek-v4-flash:cloud`, `glm-5.1:cloud`, `qwen3.5-397b-cn-think:latest`

**Machine 2 (Ember/Nova):**
- Hermes Agent: `0b76d23` (2026-04-30), 39 commits behind upstream `main`
- OS: Ubuntu 26.04 LTS, kernel 7.0.0-15-generic, x86_64
- CPU: AMD Ryzen 9 5900XT 16-Core
- GPU: NVIDIA RTX PRO 4000 Blackwell 24GB, driver 595.58.03
- RAM: 64 GB (60.7 GiB usable after kernel reservation)
- Python: 3.14.4
- Model: `glm-5.1:cloud` via custom provider (Ollama, `http://127.0.0.1:11434/v1`)
- Ollama: 0.22.1
- Same models affected

Both machines have the local fix applied and verified. The bug is reproducible on stock Hermes Agent without the patch.

### Related Issues

- #9344 — Broader "thinking model reasoning tokens exhaust output budget" bug in the main agent loop. This issue (#19003) is a specific instance of that pattern in the compressor subsystem.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) #19003

Bug: Context compressor ignores `reasoning` field — empty summaries with thinking models (Ollama 0.22+)

Summary

Root Cause

Affected Code

Reproduction

Evidence

Impact

Fix

Environment

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) #19003

Description

Bug: Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)

Summary

Root Cause

Affected Code

Reproduction

Evidence

Impact

Fix

Environment

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: Context compressor ignores `reasoning` field — empty summaries with thinking models (Ollama 0.22+)