Skip to content

fix(compression): use extract_content_or_reasoning for thinking models#19007

Closed
shellybotmoyer wants to merge 1 commit into
NousResearch:mainfrom
shellybotmoyer:fix/19003-compression-reasoning-field
Closed

fix(compression): use extract_content_or_reasoning for thinking models#19007
shellybotmoyer wants to merge 1 commit into
NousResearch:mainfrom
shellybotmoyer:fix/19003-compression-reasoning-field

Conversation

@shellybotmoyer

Copy link
Copy Markdown
Contributor

Summary

Fixes #19003

Context compressor ignores the reasoning field when extracting summary content from the auxiliary LLM response. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Bug

agent/context_compressor.py line 871:

content = response.choices[0].message.content  # BUG: ignores reasoning field

When a thinking model returns content="" and reasoning="...", this line produces an empty string. The compressor then falls back to: "Summary generation failed — inserting static fallback context marker"

In our deployment, 3 of 25 compression events (12%) produced fallback markers.

Fix

Replace raw .message.content access with the existing extract_content_or_reasoning() helper from auxiliary_client.py, which checks content first, then falls back to reasoning, reasoning_content, and reasoning_details.

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and is used by the main agent loop — this just applies the same reasoning-field handling to the compression path.

Testing

  • Tested with Ollama 0.22.1+ and deepseek-v4-flash:cloud as compression auxiliary
  • Before fix: content="", reasoning held the actual summary → fallback marker inserted
  • After fix: extract_content_or_reasoning() returns the reasoning content → proper summary

Impact

  • One-line change + one import addition
  • No behavior change for models that return content normally (the common path)
  • Fixes 12% compression failure rate with thinking models

When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3) via

Ollama 0.22+, the compressor reads only response.choices[0].message.content

which is empty when reasoning tokens consume the budget. The reasoning

field holds the actual summary content. Fixes NousResearch#19003.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/ollama Ollama / local models labels May 2, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #4603 and #14847 — same fix (use extract_content_or_reasoning in context_compressor.py). Also see #19003 for the issue.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #4603 and #14847

@shellybotmoyer

Copy link
Copy Markdown
Contributor Author

Superseded by #23047, which rebases this fix onto current main. Closing in favor of the clean replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/ollama Ollama / local models type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)

2 participants