fix(compression): exclude completion_tokens from compression trigger to prevent premature splits by Linux2010 · Pull Request #12783 · NousResearch/hermes-agent

Linux2010 · 2026-04-20T02:36:51Z

What broke

Compression triggered at ~42% actual context usage for reasoning models (GLM-5.1, QwQ, DeepSeek R1), causing cascading session splits that destroyed conversation continuity and wasted tokens replaying compressed context.

Observed in production: 6 consecutive compression-triggered session splits in a single workflow:

Session	Messages	Tools	Input Tokens	End Reason
TD Promo	510	223	9,588,636	(none)
TD Promo #2	200	96	1,130,897	compression
TD Promo #3	137	64	752,245	compression
TD Promo #4	157	76	1,286,818	compression
TD Promo #5	189	92	565,148	compression
TD Promo #6	161	77	582,556	compression

Root cause

The compression trigger summed prompt_tokens + completion_tokens. For reasoning models:

completion_tokens includes ephemeral reasoning/thinking tokens
These tokens do NOT consume the context window
Adding them inflated _real_tokens, triggering compression at ~42% actual usage

Example:

Actual prompt: 85,000 tokens (42% of 202K GLM-5.1 context)
Completion: 20,000 tokens (15K reasoning + 5K visible output)
_real_tokens = 85,000 + 20,000 = 105,000 → exceeds threshold → premature compression!

Why this fix is minimal

Changed one line: _real_tokens now uses only prompt_tokens.

This represents the actual context window consumption. False negatives (missing compression) are self-correcting: the next API call reports the real prompt size.

What I tested

Python syntax check: ✓ valid
Code review: logic matches intended behavior
Existing fallback for stale token data preserved

What I intentionally did not change

Fallback estimate logic (unchanged for disconnects)
50% threshold or compression configuration
Actual compression logic itself

Fixes #12026

Use SHA-256 hash of connection parameters (user@host:port) instead of embedding them literally in the socket filename. This ensures the socket path stays under macOS's 104-char limit even with IPv6 addresses and long temp directory paths. Fixes NousResearch#11840 Co-authored-by: theerror <4508328@github>

…to prevent premature splits ## What broke Compression triggered at ~42% actual context usage for reasoning models (GLM-5.1, QwQ, DeepSeek R1), causing cascading session splits that destroyed conversation continuity and wasted tokens replaying compressed context. ## Root cause The compression trigger summed prompt_tokens + completion_tokens. For reasoning models, completion_tokens includes ephemeral reasoning/thinking tokens that do NOT consume the context window. This inflated _real_tokens, triggering compression well before the actual 50% threshold. Observed in production: 6 consecutive compression-triggered session splits in a single workflow, each destroying conversation continuity. ## Why this fix is minimal Changed one line: _real_tokens now uses only prompt_tokens. This represents the actual context window consumption for the next request. False negatives (missing compression) are self-correcting: the next API call reports the real prompt size. ## What I tested - Python syntax check: ✓ valid - Code review: logic matches the intended behavior described in issue NousResearch#12026 - Existing fallback for stale token data preserved ## What I intentionally did not change - Did not modify the fallback estimate logic (unchanged for disconnects) - Did not modify the 50% threshold or compression configuration - Did not modify the actual compression logic itself Fixes NousResearch#12026

teknium1 · 2026-04-20T12:12:27Z

Closed in favor of PR #13006 #13006 which fixes the same issue. The SSH socket path fix bundled in your PR is a separate concern — consider submitting it as its own PR. Thanks @Linux2010!

Linux2010 and others added 2 commits April 19, 2026 22:35

teknium1 mentioned this pull request Apr 20, 2026

fix(compression): exclude completion tokens from compression trigger (#12026) #13006

Merged

teknium1 closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compression): exclude completion_tokens from compression trigger to prevent premature splits#12783

fix(compression): exclude completion_tokens from compression trigger to prevent premature splits#12783
Linux2010 wants to merge 2 commits into
NousResearch:mainfrom
Linux2010:fix-issue-12026-premature-compression

Linux2010 commented Apr 20, 2026

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Linux2010 commented Apr 20, 2026

What broke

Root cause

Why this fix is minimal

What I tested

What I intentionally did not change

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants