fix: exclude reasoning tokens from compression trigger threshold by Sanjays2402 · Pull Request #12481 · NousResearch/hermes-agent

Sanjays2402 · 2026-04-19T09:13:09Z

Problem

Fixes #12026

The compression trigger at run_agent.py:11274 sums last_prompt_tokens + last_completion_tokens to determine whether to compress context. However, thinking models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with reasoning/thinking tokens that do not consume context window space.

This causes premature compression — e.g. 40k prompt + 80k reasoning = 120k exceeds a 100k threshold, even though the actual context window usage is only 40k.

Fix

Use only last_prompt_tokens for the compression threshold check. Completion tokens are the model's output and don't contribute to context window pressure. The existing fallback path (estimate when prompt_tokens is 0) is preserved.

Tests

Added tests/run_agent/test_compression_trigger_excludes_reasoning.py with 3 tests:

High reasoning tokens should NOT trigger compression
High prompt tokens SHOULD trigger compression
Zero prompt tokens falls back correctly

…sResearch#12026)

@Sanjays2402

…12026) Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026

@Sanjays2402

…12026) Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026

teknium1 · 2026-04-20T12:12:21Z

Merged via PR #13006 #13006 — your commit was cherry-picked onto current main with your authorship preserved. Cleanest fix of the four submissions. Thanks @Sanjays2402!

@Sanjays2402

…ousResearch#12026) Cherry-picked from PR NousResearch#12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes NousResearch#12026

@Sanjays2402

…ousResearch#12026) Cherry-picked from PR NousResearch#12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes NousResearch#12026

@Sanjays2402

…ousResearch#12026) Cherry-picked from PR NousResearch#12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes NousResearch#12026

@Sanjays2402

…ousResearch#12026) Cherry-picked from PR NousResearch#12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes NousResearch#12026

@Sanjays2402

…ousResearch#12026) Cherry-picked from PR NousResearch#12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes NousResearch#12026

@Sanjays2402

…ousResearch#12026) Cherry-picked from PR NousResearch#12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes NousResearch#12026

fix: exclude reasoning tokens from compression trigger threshold (Nou…

2ad5ed9

…sResearch#12026)

teknium1 mentioned this pull request Apr 20, 2026

fix(compression): exclude completion tokens from compression trigger (#12026) #13006

Merged

teknium1 closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: exclude reasoning tokens from compression trigger threshold#12481

fix: exclude reasoning tokens from compression trigger threshold#12481
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/compression-reasoning-tokens-12026

Sanjays2402 commented Apr 19, 2026

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sanjays2402 commented Apr 19, 2026

Problem

Fix

Tests

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants