fix(gateway): fix compression threshold direction and add hard message limit by ygd58 · Pull Request #2157 · NousResearch/hermes-agent

ygd58 · 2026-03-20T11:56:47Z

Root Cause

Two bugs in gateway session hygiene (gateway/run.py):

Inverted threshold direction: When last_prompt_tokens = 0 (API disconnected without returning usage), the fallback rough estimate path multiplied the threshold by 1.4x, making compression harder to trigger — the exact opposite of what's needed when we don't have accurate token data.
No hard safety valve: Sessions could grow to 700+ messages with no forced compression trigger, causing a death spiral where API disconnects prevent token data collection, which prevents compression, which causes more disconnects.

Fix

Fixed threshold direction: Changed * 1.4 to * 0.7 for rough estimate fallback. When we don't have accurate token data, be more aggressive about compression, not less.
Hard message count limit: Added _HARD_MSG_LIMIT = 400 — if a session exceeds 400 messages, force compression regardless of token estimates. This breaks the death spiral even when all token-based checks fail.

Update: All 3 Layers Fixed

Layer 1 fix (run_agent.py in-loop compression): When last_prompt_tokens = 0 (stale after API disconnect), now falls back to rough token estimate instead of using the stale value.

Layer 2 fix (run_agent.py error handler): Server disconnect errors (ReadError, RemoteProtocolError) on large sessions (>60% context or >200 messages) are now treated as context-length errors and trigger compression before retry.

Layer 3 fix (gateway/run.py hygiene): Fixed inverted threshold direction (1.4x → 0.7x) and added 400-message hard limit.

ygd58 · 2026-03-20T12:00:06Z

The failing tests (test_cli_new_session.py) are pre-existing failures unrelated to this PR. They fail on reset_session_state attribute in _FakeAgent — a test infrastructure issue not caused by gateway/run.py changes.

…revent compression death spiral

…isconnect death spiral

Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR #2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.

…4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR #2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.

teknium1 · 2026-04-03T09:18:02Z

The remaining fixes from this PR were reimplemented fresh on current main in PR #4750 (merged). The gateway threshold direction fix (Layer 3 — the * 1.4 multiplier) had already been resolved independently on main. Credit to @ygd58 for the original analysis of the death spiral and the three-layer fix approach. Thanks!

…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.

…rch#2153) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.

…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.

ygd58 mentioned this pull request Mar 20, 2026

Bug: Context compression fails to trigger on API disconnect, causing death spiral in gateway sessions #2153

Closed

ygd58 added 2 commits March 21, 2026 00:15

fix(gateway): fix threshold direction and add hard message limit to p…

6d88422

…revent compression death spiral

fix(agent): fix Layer 1 and Layer 2 compression failures for server d…

50e6f1f

…isconnect death spiral

ygd58 force-pushed the fix/gateway-context-compression-death-spiral branch from 541894c to 50e6f1f Compare March 20, 2026 23:19

teknium1 mentioned this pull request Apr 3, 2026

fix: prevent compression death spiral from API disconnects (#2153) #4750

Merged

teknium1 closed this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): fix compression threshold direction and add hard message limit#2157

fix(gateway): fix compression threshold direction and add hard message limit#2157
ygd58 wants to merge 2 commits into
NousResearch:mainfrom
ygd58:fix/gateway-context-compression-death-spiral

ygd58 commented Mar 20, 2026 •

edited

Loading

Uh oh!

ygd58 commented Mar 20, 2026

Uh oh!

teknium1 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ygd58 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Fix

Update: All 3 Layers Fixed

Uh oh!

ygd58 commented Mar 20, 2026

Uh oh!

teknium1 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ygd58 commented Mar 20, 2026 •

edited

Loading