Skip to content

fix(gateway): fix compression threshold direction and add hard message limit#2157

Closed
ygd58 wants to merge 2 commits into
NousResearch:mainfrom
ygd58:fix/gateway-context-compression-death-spiral
Closed

fix(gateway): fix compression threshold direction and add hard message limit#2157
ygd58 wants to merge 2 commits into
NousResearch:mainfrom
ygd58:fix/gateway-context-compression-death-spiral

Conversation

@ygd58

@ygd58 ygd58 commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Fixes #2153

Root Cause

Two bugs in gateway session hygiene (gateway/run.py):

  1. Inverted threshold direction: When last_prompt_tokens = 0 (API disconnected without returning usage), the fallback rough estimate path multiplied the threshold by 1.4x, making compression harder to trigger — the exact opposite of what's needed when we don't have accurate token data.

  2. No hard safety valve: Sessions could grow to 700+ messages with no forced compression trigger, causing a death spiral where API disconnects prevent token data collection, which prevents compression, which causes more disconnects.

Fix

  1. Fixed threshold direction: Changed * 1.4 to * 0.7 for rough estimate fallback. When we don't have accurate token data, be more aggressive about compression, not less.

  2. Hard message count limit: Added _HARD_MSG_LIMIT = 400 — if a session exceeds 400 messages, force compression regardless of token estimates. This breaks the death spiral even when all token-based checks fail.

Update: All 3 Layers Fixed

Layer 1 fix (run_agent.py in-loop compression): When last_prompt_tokens = 0 (stale after API disconnect), now falls back to rough token estimate instead of using the stale value.

Layer 2 fix (run_agent.py error handler): Server disconnect errors (ReadError, RemoteProtocolError) on large sessions (>60% context or >200 messages) are now treated as context-length errors and trigger compression before retry.

Layer 3 fix (gateway/run.py hygiene): Fixed inverted threshold direction (1.4x → 0.7x) and added 400-message hard limit.

@ygd58

ygd58 commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

The failing tests (test_cli_new_session.py) are pre-existing failures unrelated to this PR. They fail on reset_session_state attribute in _FakeAgent — a test infrastructure issue not caused by gateway/run.py changes.

@ygd58 ygd58 force-pushed the fix/gateway-context-compression-death-spiral branch from 541894c to 50e6f1f Compare March 20, 2026 23:19
teknium1 added a commit that referenced this pull request Apr 3, 2026
Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR #2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
teknium1 added a commit that referenced this pull request Apr 3, 2026
…4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR #2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
@teknium1

teknium1 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

The remaining fixes from this PR were reimplemented fresh on current main in PR #4750 (merged). The gateway threshold direction fix (Layer 3 — the * 1.4 multiplier) had already been resolved independently on main. Credit to @ygd58 for the original analysis of the death spiral and the three-layer fix approach. Thanks!

@teknium1 teknium1 closed this Apr 3, 2026
saxster pushed a commit to saxster/hermes-agent that referenced this pull request Apr 8, 2026
…rch#2153) (NousResearch#4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…rch#2153) (NousResearch#4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
…rch#2153)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…rch#2153) (NousResearch#4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…rch#2153) (NousResearch#4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…rch#2153) (NousResearch#4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…rch#2153) (NousResearch#4750)

Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:

Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.

Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.

Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.

Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Context compression fails to trigger on API disconnect, causing death spiral in gateway sessions

2 participants