Skip to content

fix: exclude reasoning tokens from compression trigger threshold#12481

Closed
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/compression-reasoning-tokens-12026
Closed

fix: exclude reasoning tokens from compression trigger threshold#12481
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/compression-reasoning-tokens-12026

Conversation

@Sanjays2402

Copy link
Copy Markdown
Contributor

Problem

Fixes #12026

The compression trigger at run_agent.py:11274 sums last_prompt_tokens + last_completion_tokens to determine whether to compress context. However, thinking models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with reasoning/thinking tokens that do not consume context window space.

This causes premature compression — e.g. 40k prompt + 80k reasoning = 120k exceeds a 100k threshold, even though the actual context window usage is only 40k.

Fix

Use only last_prompt_tokens for the compression threshold check. Completion tokens are the model's output and don't contribute to context window pressure. The existing fallback path (estimate when prompt_tokens is 0) is preserved.

Tests

Added tests/run_agent/test_compression_trigger_excludes_reasoning.py with 3 tests:

  • High reasoning tokens should NOT trigger compression
  • High prompt tokens SHOULD trigger compression
  • Zero prompt tokens falls back correctly

teknium1 pushed a commit that referenced this pull request Apr 20, 2026
…12026)

Cherry-picked from PR #12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes #12026
teknium1 pushed a commit that referenced this pull request Apr 20, 2026
…12026)

Cherry-picked from PR #12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes #12026
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #13006 #13006 — your commit was cherry-picked onto current main with your authorship preserved. Cleanest fix of the four submissions. Thanks @Sanjays2402!

@teknium1 teknium1 closed this Apr 20, 2026
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…ousResearch#12026)

Cherry-picked from PR NousResearch#12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes NousResearch#12026
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…ousResearch#12026)

Cherry-picked from PR NousResearch#12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes NousResearch#12026
Luminet2023 pushed a commit to Luminet2023/hermes-agent that referenced this pull request May 1, 2026
…ousResearch#12026)

Cherry-picked from PR NousResearch#12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes NousResearch#12026
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#12026)

Cherry-picked from PR NousResearch#12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes NousResearch#12026
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#12026)

Cherry-picked from PR NousResearch#12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes NousResearch#12026
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#12026)

Cherry-picked from PR NousResearch#12481 by @Sanjays2402.

Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens
with internal thinking tokens. The compression trigger summed
prompt_tokens + completion_tokens, causing premature compression at ~42%
actual context usage instead of the configured 50% threshold.

Now uses only prompt_tokens — completion tokens don't consume context
window space for the next API call.

- 3 new regression tests
- Added AUTHOR_MAP entry for @Sanjays2402

Closes NousResearch#12026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Compression trigger includes reasoning tokens, causing premature session splits for thinking models (GLM-5.1, QwQ, etc.)

2 participants