fix(context_compressor): treat httpcore streaming premature-close as transient error by wesleysimplicio · Pull Request #22653 · NousResearch/hermes-agent

wesleysimplicio · 2026-05-09T15:33:48Z

Problem

When a provider or reverse proxy drops a streaming response mid-flight, httpcore raises RemoteProtocolError with messages like:

incomplete chunked read
peer closed connection without sending complete message body
response ended prematurely
unexpected eof

These surface in _generate_summary as plain Exceptions. Because _is_connection_error didn't recognise these substrings, and _generate_summary never called _is_connection_error at all, they fell through to the generic 60-second cooldown instead of retrying on the main model. Context kept growing unbounded until the cooldown expired.

Closes #18458.

Root Cause

Two gaps:

auxiliary_client._is_connection_error — no keyword coverage for httpcore's streaming premature-close error messages.
context_compressor._generate_summary except block — never called _is_connection_error, so streaming-close errors were invisible to the fallback logic.

Fix

agent/auxiliary_client.py

Extended _is_connection_error keyword list with "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror".
Guarded the from openai import APIConnectionError, APITimeoutError with try/except ImportError so the function works in environments without the openai package installed (e.g. test venvs).

agent/context_compressor.py

Import _is_connection_error from auxiliary_client.
In _generate_summary's except block, evaluate _is_streaming_closed = _is_connection_error(e).
Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode), with reason string "closed stream prematurely".
Use the shorter 30s transient cooldown for streaming-closed errors (same as JSON decode) instead of the default 60s.

Tests

4 new regression tests in TestStreamingClosedFallback:

Test	What it verifies
`test_incomplete_chunked_read_falls_back_to_main`	`_is_connection_error=True` → retry on main, result returned
`test_peer_closed_connection_falls_back_to_main`	same path, different error message
`test_streaming_closed_on_main_uses_short_cooldown`	30s cooldown when already on main model (stash-verified: fails without fix)
`test_non_streaming_unknown_error_still_uses_long_cooldown`	unclassified errors retain 60s cooldown

All 76 tests in test_context_compressor.py pass.

… error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue NousResearch#18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Improves context compression resilience when streaming responses are dropped mid-flight (e.g., httpcore/httpx “incomplete chunked read”), by classifying these as transient connection-like failures and adjusting fallback/cooldown behavior during _generate_summary.

Changes:

Extend agent/auxiliary_client._is_connection_error() to recognize common streaming premature-close substrings and to work when openai isn’t installed.
Wire connection-error classification into ContextCompressor._generate_summary() to influence fallback-to-main and cooldown duration.
Add regression tests covering fallback and cooldown behavior for streaming-close vs unknown errors.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`agent/auxiliary_client.py`	Broadens connection error detection (incl. streaming premature-close) and guards `openai` imports.
`agent/context_compressor.py`	Uses connection-error classification to trigger fallback-to-main and shorter cooldown in some cases.
`tests/agent/test_context_compressor.py`	Adds regression tests for streaming-close fallback and cooldown behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1012,7 +1020,7 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi
                    e,
                )
            if (
-                (_is_model_not_found or _is_timeout or _is_json_decode)
+                (_is_model_not_found or _is_timeout or _is_json_decode or _is_streaming_closed)
                and self.summary_model
                and self.summary_model != self.model
                and not getattr(self, "_summary_model_fallen_back", False)
@@ -1021,6 +1029,8 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi
                    _reason = "returned invalid JSON"
                elif _is_model_not_found:
                    _reason = "unavailable"
+                elif _is_streaming_closed:
+                    _reason = "closed stream prematurely"
                else:
                    _reason = "timed out"
                self._fallback_to_main_for_compression(e, _reason)
@@ -1043,10 +1053,10 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi
                self._fallback_to_main_for_compression(e, "failed")
                return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)

-            # Transient errors (timeout, rate limit, network, JSON decode) —
-            # shorter cooldown for JSON decode since the body shape can flip
-            # back to valid quickly when an upstream proxy recovers.
-            _transient_cooldown = 30 if _is_json_decode else 60
+            # Transient errors (timeout, rate limit, network, JSON decode,
+            # streaming premature-close) — shorter cooldown for JSON decode and
+            # streaming-closed since those conditions can self-resolve quickly.
+            _transient_cooldown = 30 if (_is_json_decode or _is_streaming_closed) else 60
            self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown


+    ``_is_connection_error`` is patched here because the test venv may not
+    have ``openai`` installed (the real function does ``from openai import ...``
+    inside its body).  We test the *wiring* — that `_generate_summary` calls
+    ``_is_connection_error`` and acts on its result — not the classifier itself
+    (that's covered in ``test_auxiliary_client.py::TestIsConnectionError``).
+    """


teknium1 · 2026-05-10T01:09:52Z

Merged via salvage PR #22846. salvage cherry-picked your commit; authorship preserved. Thanks for the contribution!

Copilot AI review requested due to automatic review settings May 9, 2026 15:33

Copilot started reviewing on behalf of wesleysimplicio May 9, 2026 15:34 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

teknium1 mentioned this pull request May 9, 2026

fix(context_compressor): treat httpcore streaming premature-close as transient (salvage #22653) #22846

Merged

teknium1 closed this May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context_compressor): treat httpcore streaming premature-close as transient error#22653

fix(context_compressor): treat httpcore streaming premature-close as transient error#22653
wesleysimplicio wants to merge 1 commit into
NousResearch:mainfrom
wesleysimplicio:fix/ag02-compressor-streaming-errors

wesleysimplicio commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

teknium1 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wesleysimplicio commented May 9, 2026

Problem

Root Cause

Fix

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

teknium1 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants