fix(gateway): fix compression threshold direction and add hard message limit#2157
Closed
ygd58 wants to merge 2 commits into
Closed
fix(gateway): fix compression threshold direction and add hard message limit#2157ygd58 wants to merge 2 commits into
ygd58 wants to merge 2 commits into
Conversation
Contributor
Author
|
The failing tests (test_cli_new_session.py) are pre-existing failures unrelated to this PR. They fail on reset_session_state attribute in _FakeAgent — a test infrastructure issue not caused by gateway/run.py changes. |
…revent compression death spiral
…isconnect death spiral
541894c to
50e6f1f
Compare
teknium1
added a commit
that referenced
this pull request
Apr 3, 2026
Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR #2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
teknium1
added a commit
that referenced
this pull request
Apr 3, 2026
…4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR #2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
Contributor
|
The remaining fixes from this PR were reimplemented fresh on current main in PR #4750 (merged). The gateway threshold direction fix (Layer 3 — the |
saxster
pushed a commit
to saxster/hermes-agent
that referenced
this pull request
Apr 8, 2026
…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
…rch#2153) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…rch#2153) (NousResearch#4750) Three fixes for long-running gateway sessions that enter a death spiral when API disconnects prevent token data collection, which prevents compression, which causes more disconnects: Layer 1 — Stale token counter fallback (run_agent.py in-loop): When last_prompt_tokens is 0 (stale after API disconnect or provider returned no usage data), fall back to estimate_messages_tokens_rough() instead of passing 0 to should_compress(), which would never fire. Layer 2 — Server disconnect heuristic (run_agent.py error handler): When ReadError/RemoteProtocolError hits a large session (>60% context or >200 messages), treat it as a context-length error and trigger compression rather than burning through retries that all fail the same way. Layer 3 — Hard message count limit (gateway/run.py hygiene): Force compression when a session exceeds 400 messages, regardless of token estimates. This catches runaway growth even when all token-based checks fail due to missing API data. Based on the analysis from PR NousResearch#2157 by ygd58 — the gateway threshold direction fix (1.4x multiplier) was already resolved on main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2153
Root Cause
Two bugs in gateway session hygiene (
gateway/run.py):Inverted threshold direction: When
last_prompt_tokens = 0(API disconnected without returning usage), the fallback rough estimate path multiplied the threshold by1.4x, making compression harder to trigger — the exact opposite of what's needed when we don't have accurate token data.No hard safety valve: Sessions could grow to 700+ messages with no forced compression trigger, causing a death spiral where API disconnects prevent token data collection, which prevents compression, which causes more disconnects.
Fix
Fixed threshold direction: Changed
* 1.4to* 0.7for rough estimate fallback. When we don't have accurate token data, be more aggressive about compression, not less.Hard message count limit: Added
_HARD_MSG_LIMIT = 400— if a session exceeds 400 messages, force compression regardless of token estimates. This breaks the death spiral even when all token-based checks fail.Update: All 3 Layers Fixed
Layer 1 fix (run_agent.py in-loop compression): When
last_prompt_tokens = 0(stale after API disconnect), now falls back to rough token estimate instead of using the stale value.Layer 2 fix (run_agent.py error handler): Server disconnect errors (
ReadError,RemoteProtocolError) on large sessions (>60% context or >200 messages) are now treated as context-length errors and trigger compression before retry.Layer 3 fix (gateway/run.py hygiene): Fixed inverted threshold direction (1.4x → 0.7x) and added 400-message hard limit.