fix: exclude completion_tokens from compression trigger for reasoning models by nightq · Pull Request #12071 · NousResearch/hermes-agent

nightq · 2026-04-18T08:44:59Z

Summary

Fixes premature context compression for reasoning models (GLM-5.1, QwQ, etc.) by excluding completion_tokens from the compression trigger calculation.

Root Cause

The compression trigger was summing to determine when to compress context. For reasoning models, includes internal thinking/reasoning tokens that are ephemeral output and don't consume context window space for the next API call.

This caused compression to fire when the model had only used ~42% of its actual context window, because reasoning tokens inflated the calculated token count past the 50% threshold.

Example:

Actual prompt: 85,000 tokens (42% of 202K context)
Completion: 20,000 tokens (15K reasoning + 5K visible)
Old calculation: 85K + 20K = 105K → exceeds 101K threshold → premature compression!

Fix

Use only for the compression trigger. The prompt already represents actual context window consumption — it's what the provider charges for and what determines whether the next request will fit.

Impact

Prevents cascading premature session splits for reasoning models
Preserves conversation continuity
Reduces wasted tokens from unnecessary compression/replay cycles

Test Plan

Relevant unit tests pass (, )
Verified compression still triggers correctly based on prompt size

Closes #12026

…emature compression for reasoning models Fixes NousResearch#12026 Root cause: Compression trigger was summing prompt_tokens + completion_tokens, but completion_tokens for reasoning models includes internal thinking tokens that don't consume context window space. This caused premature compression when models like GLM-5.1, QwQ, etc. used ~42% of actual context. Fix: Use only prompt_tokens for compression trigger calculation.

teknium1 · 2026-04-20T12:12:25Z

Closed in favor of PR #13006 #13006 which fixes the same issue with tests. Thanks @nightq!

teknium1 mentioned this pull request Apr 20, 2026

fix(compression): exclude completion tokens from compression trigger (#12026) #13006

Merged

teknium1 closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: exclude completion_tokens from compression trigger for reasoning models#12071

fix: exclude completion_tokens from compression trigger for reasoning models#12071
nightq wants to merge 1 commit into
NousResearch:mainfrom
nightq:fix/issue-12026-compression-reasoning-tokens

nightq commented Apr 18, 2026

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nightq commented Apr 18, 2026

Summary

Root Cause

Fix

Impact

Test Plan

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants