Skip to content

fix(context_compressor): treat httpcore streaming premature-close as transient error#22653

Closed
wesleysimplicio wants to merge 1 commit into
NousResearch:mainfrom
wesleysimplicio:fix/ag02-compressor-streaming-errors
Closed

fix(context_compressor): treat httpcore streaming premature-close as transient error#22653
wesleysimplicio wants to merge 1 commit into
NousResearch:mainfrom
wesleysimplicio:fix/ag02-compressor-streaming-errors

Conversation

@wesleysimplicio

Copy link
Copy Markdown
Contributor

Problem

When a provider or reverse proxy drops a streaming response mid-flight, httpcore raises RemoteProtocolError with messages like:

  • incomplete chunked read
  • peer closed connection without sending complete message body
  • response ended prematurely
  • unexpected eof

These surface in _generate_summary as plain Exceptions. Because _is_connection_error didn't recognise these substrings, and _generate_summary never called _is_connection_error at all, they fell through to the generic 60-second cooldown instead of retrying on the main model. Context kept growing unbounded until the cooldown expired.

Closes #18458.

Root Cause

Two gaps:

  1. auxiliary_client._is_connection_error — no keyword coverage for httpcore's streaming premature-close error messages.
  2. context_compressor._generate_summary except block — never called _is_connection_error, so streaming-close errors were invisible to the fallback logic.

Fix

agent/auxiliary_client.py

  • Extended _is_connection_error keyword list with "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror".
  • Guarded the from openai import APIConnectionError, APITimeoutError with try/except ImportError so the function works in environments without the openai package installed (e.g. test venvs).

agent/context_compressor.py

  • Import _is_connection_error from auxiliary_client.
  • In _generate_summary's except block, evaluate _is_streaming_closed = _is_connection_error(e).
  • Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode), with reason string "closed stream prematurely".
  • Use the shorter 30s transient cooldown for streaming-closed errors (same as JSON decode) instead of the default 60s.

Tests

4 new regression tests in TestStreamingClosedFallback:

Test What it verifies
test_incomplete_chunked_read_falls_back_to_main _is_connection_error=True → retry on main, result returned
test_peer_closed_connection_falls_back_to_main same path, different error message
test_streaming_closed_on_main_uses_short_cooldown 30s cooldown when already on main model (stash-verified: fails without fix)
test_non_streaming_unknown_error_still_uses_long_cooldown unclassified errors retain 60s cooldown

All 76 tests in test_context_compressor.py pass.

… error

Problem:
When a provider or proxy drops a streaming response mid-flight (httpcore
raises RemoteProtocolError: "incomplete chunked read", "peer closed
connection", "response ended prematurely", etc.), _generate_summary
would not classify it as a transient error.  Instead of retrying on the
main model, it entered the generic 60-second cooldown, leaving context
growing unbounded until the cooldown expired.  Issue NousResearch#18458.

Root cause:
_is_connection_error in auxiliary_client.py did not match httpcore's
streaming premature-close error substrings.  context_compressor.py's
_generate_summary except block never called _is_connection_error, so
those errors fell through to the 60-second generic cooldown rather than
triggering the retry-on-main fallback path used for timeouts.

Fix:
1. auxiliary_client.py — extend _is_connection_error keyword list with:
   "incomplete chunked read", "peer closed connection",
   "response ended prematurely", "unexpected eof",
   "remoteprotocolerror", "localprotocolerror".
   Also guard the `from openai import ...` with try/except ImportError
   so the function works in environments without the openai package.
2. context_compressor.py — import _is_connection_error and call it in
   _generate_summary's except block as _is_streaming_closed.  Include
   _is_streaming_closed in the fallback-to-main condition (alongside
   _is_model_not_found, _is_timeout, _is_json_decode) and use the
   shorter 30s transient cooldown for streaming-closed errors.

Tests:
4 new regression tests in TestStreamingClosedFallback:
- test_incomplete_chunked_read_falls_back_to_main
- test_peer_closed_connection_falls_back_to_main
- test_streaming_closed_on_main_uses_short_cooldown  (stash-verified)
- test_non_streaming_unknown_error_still_uses_long_cooldown

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 9, 2026 15:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves context compression resilience when streaming responses are dropped mid-flight (e.g., httpcore/httpx “incomplete chunked read”), by classifying these as transient connection-like failures and adjusting fallback/cooldown behavior during _generate_summary.

Changes:

  • Extend agent/auxiliary_client._is_connection_error() to recognize common streaming premature-close substrings and to work when openai isn’t installed.
  • Wire connection-error classification into ContextCompressor._generate_summary() to influence fallback-to-main and cooldown duration.
  • Add regression tests covering fallback and cooldown behavior for streaming-close vs unknown errors.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
agent/auxiliary_client.py Broadens connection error detection (incl. streaming premature-close) and guards openai imports.
agent/context_compressor.py Uses connection-error classification to trigger fallback-to-main and shorter cooldown in some cases.
tests/agent/test_context_compressor.py Adds regression tests for streaming-close fallback and cooldown behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1003 to 1060
@@ -1012,7 +1020,7 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi
e,
)
if (
(_is_model_not_found or _is_timeout or _is_json_decode)
(_is_model_not_found or _is_timeout or _is_json_decode or _is_streaming_closed)
and self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
@@ -1021,6 +1029,8 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi
_reason = "returned invalid JSON"
elif _is_model_not_found:
_reason = "unavailable"
elif _is_streaming_closed:
_reason = "closed stream prematurely"
else:
_reason = "timed out"
self._fallback_to_main_for_compression(e, _reason)
@@ -1043,10 +1053,10 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi
self._fallback_to_main_for_compression(e, "failed")
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)

# Transient errors (timeout, rate limit, network, JSON decode) —
# shorter cooldown for JSON decode since the body shape can flip
# back to valid quickly when an upstream proxy recovers.
_transient_cooldown = 30 if _is_json_decode else 60
# Transient errors (timeout, rate limit, network, JSON decode,
# streaming premature-close) — shorter cooldown for JSON decode and
# streaming-closed since those conditions can self-resolve quickly.
_transient_cooldown = 30 if (_is_json_decode or _is_streaming_closed) else 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
Comment on lines +507 to +512
``_is_connection_error`` is patched here because the test venv may not
have ``openai`` installed (the real function does ``from openai import ...``
inside its body). We test the *wiring* — that `_generate_summary` calls
``_is_connection_error`` and acts on its result — not the classifier itself
(that's covered in ``test_auxiliary_client.py::TestIsConnectionError``).
"""
@teknium1

Copy link
Copy Markdown
Contributor

Merged via salvage PR #22846. salvage cherry-picked your commit; authorship preserved. Thanks for the contribution!

@teknium1 teknium1 closed this May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve context compression retry/fallback for incomplete chunked reads

3 participants