fix(compression): use extract_content_or_reasoning for reasoning model summaries#4603
Open
airudotsh wants to merge 1 commit into
Open
fix(compression): use extract_content_or_reasoning for reasoning model summaries#4603airudotsh wants to merge 1 commit into
airudotsh wants to merge 1 commit into
Conversation
…l summaries Reasoning models (DeepSeek-R1, Qwen-QwQ, glm-5-turbo) sometimes put all output inside think/reasoning blocks with an empty content field. The compressor was reading raw response.choices[0].message.content directly, getting an empty string, and silently dropping middle turns without a meaningful summary. Use the existing extract_content_or_reasoning() helper (from auxiliary_client) which already handles: - Empty content + structured reasoning field fallback - XML-style think/thinking/reasoning tag stripping Also normalize dict content (llama.cpp) before extraction to prevent type errors. Tests: 3 new cases covering reasoning-only, think-tag, and normal content extraction paths.
19 tasks
This was referenced Apr 24, 2026
Collaborator
|
Likely duplicate of #14847 — same fix: use extract_content_or_reasoning() in context compressor _generate_summary() for reasoning-only model responses. |
1 similar comment
Collaborator
|
Likely duplicate of #14847 — same fix: use extract_content_or_reasoning() in context compressor _generate_summary() for reasoning-only model responses. |
This was referenced May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
Use
extract_content_or_reasoning()(fromauxiliary_client) instead of rawresponse.choices[0].message.contentin the context compressor's_generate_summary().Why
Reasoning models (DeepSeek-R1, Qwen-QwQ, glm-5-turbo) sometimes put all output inside think/reasoning blocks with an empty
contentfield. The compressor was reading raw content directly, getting an empty string, and silently dropping middle conversation turns without a meaningful summary — causing context continuity loss.How it fixes it
extract_content_or_reasoning()already handles:message.reasoning/message.reasoning_contentwhen content is empty<think/>,<thinking/>,<reasoning/>blocks from contentAlso normalizes dict content (llama.cpp style responses) before extraction to prevent type errors.
Related
Complements #4243 (summary fallback on provider failure) — these fix different bugs and do not conflict. This PR fixes the case where the summary call succeeds but the model puts output in reasoning-only mode.
Tests
3 new test cases in
TestReasoningOnlyExtraction:All 37 tests pass.