fix(compression): use extract_content_or_reasoning for thinking models#19007
Closed
shellybotmoyer wants to merge 1 commit into
Closed
fix(compression): use extract_content_or_reasoning for thinking models#19007shellybotmoyer wants to merge 1 commit into
shellybotmoyer wants to merge 1 commit into
Conversation
When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3) via Ollama 0.22+, the compressor reads only response.choices[0].message.content which is empty when reasoning tokens consume the budget. The reasoning field holds the actual summary content. Fixes NousResearch#19003.
Collaborator
Collaborator
Contributor
Author
|
Superseded by #23047, which rebases this fix onto current main. Closing in favor of the clean replacement. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #19003
Context compressor ignores the
reasoningfield when extracting summary content from the auxiliary LLM response. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.) via Ollama 0.22+, these models return their output in thereasoningfield withcontentas an empty string — especially whenmax_tokensis constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.Bug
agent/context_compressor.pyline 871:When a thinking model returns
content=""andreasoning="...", this line produces an empty string. The compressor then falls back to:"Summary generation failed — inserting static fallback context marker"In our deployment, 3 of 25 compression events (12%) produced fallback markers.
Fix
Replace raw
.message.contentaccess with the existingextract_content_or_reasoning()helper fromauxiliary_client.py, which checkscontentfirst, then falls back toreasoning,reasoning_content, andreasoning_details.The
extract_content_or_reasoning()function already exists and is used by the main agent loop — this just applies the same reasoning-field handling to the compression path.Testing
deepseek-v4-flash:cloudas compression auxiliarycontent="",reasoningheld the actual summary → fallback marker insertedextract_content_or_reasoning()returns the reasoning content → proper summaryImpact