fix(context_compressor): treat httpcore streaming premature-close as transient error#22653
Closed
wesleysimplicio wants to merge 1 commit into
Closed
Conversation
… error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue NousResearch#18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves context compression resilience when streaming responses are dropped mid-flight (e.g., httpcore/httpx “incomplete chunked read”), by classifying these as transient connection-like failures and adjusting fallback/cooldown behavior during _generate_summary.
Changes:
- Extend
agent/auxiliary_client._is_connection_error()to recognize common streaming premature-close substrings and to work whenopenaiisn’t installed. - Wire connection-error classification into
ContextCompressor._generate_summary()to influence fallback-to-main and cooldown duration. - Add regression tests covering fallback and cooldown behavior for streaming-close vs unknown errors.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
agent/auxiliary_client.py |
Broadens connection error detection (incl. streaming premature-close) and guards openai imports. |
agent/context_compressor.py |
Uses connection-error classification to trigger fallback-to-main and shorter cooldown in some cases. |
tests/agent/test_context_compressor.py |
Adds regression tests for streaming-close fallback and cooldown behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
1003
to
1060
| @@ -1012,7 +1020,7 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi | |||
| e, | |||
| ) | |||
| if ( | |||
| (_is_model_not_found or _is_timeout or _is_json_decode) | |||
| (_is_model_not_found or _is_timeout or _is_json_decode or _is_streaming_closed) | |||
| and self.summary_model | |||
| and self.summary_model != self.model | |||
| and not getattr(self, "_summary_model_fallen_back", False) | |||
| @@ -1021,6 +1029,8 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi | |||
| _reason = "returned invalid JSON" | |||
| elif _is_model_not_found: | |||
| _reason = "unavailable" | |||
| elif _is_streaming_closed: | |||
| _reason = "closed stream prematurely" | |||
| else: | |||
| _reason = "timed out" | |||
| self._fallback_to_main_for_compression(e, _reason) | |||
| @@ -1043,10 +1053,10 @@ def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topi | |||
| self._fallback_to_main_for_compression(e, "failed") | |||
| return self._generate_summary(turns_to_summarize, focus_topic=focus_topic) | |||
|
|
|||
| # Transient errors (timeout, rate limit, network, JSON decode) — | |||
| # shorter cooldown for JSON decode since the body shape can flip | |||
| # back to valid quickly when an upstream proxy recovers. | |||
| _transient_cooldown = 30 if _is_json_decode else 60 | |||
| # Transient errors (timeout, rate limit, network, JSON decode, | |||
| # streaming premature-close) — shorter cooldown for JSON decode and | |||
| # streaming-closed since those conditions can self-resolve quickly. | |||
| _transient_cooldown = 30 if (_is_json_decode or _is_streaming_closed) else 60 | |||
| self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown | |||
Comment on lines
+507
to
+512
| ``_is_connection_error`` is patched here because the test venv may not | ||
| have ``openai`` installed (the real function does ``from openai import ...`` | ||
| inside its body). We test the *wiring* — that `_generate_summary` calls | ||
| ``_is_connection_error`` and acts on its result — not the classifier itself | ||
| (that's covered in ``test_auxiliary_client.py::TestIsConnectionError``). | ||
| """ |
Contributor
|
Merged via salvage PR #22846. salvage cherry-picked your commit; authorship preserved. Thanks for the contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a provider or reverse proxy drops a streaming response mid-flight,
httpcoreraisesRemoteProtocolErrorwith messages like:incomplete chunked readpeer closed connection without sending complete message bodyresponse ended prematurelyunexpected eofThese surface in
_generate_summaryas plainExceptions. Because_is_connection_errordidn't recognise these substrings, and_generate_summarynever called_is_connection_errorat all, they fell through to the generic 60-second cooldown instead of retrying on the main model. Context kept growing unbounded until the cooldown expired.Closes #18458.
Root Cause
Two gaps:
auxiliary_client._is_connection_error— no keyword coverage for httpcore's streaming premature-close error messages.context_compressor._generate_summaryexcept block — never called_is_connection_error, so streaming-close errors were invisible to the fallback logic.Fix
agent/auxiliary_client.py_is_connection_errorkeyword list with"incomplete chunked read","peer closed connection","response ended prematurely","unexpected eof","remoteprotocolerror","localprotocolerror".from openai import APIConnectionError, APITimeoutErrorwithtry/except ImportErrorso the function works in environments without theopenaipackage installed (e.g. test venvs).agent/context_compressor.py_is_connection_errorfromauxiliary_client._generate_summary's except block, evaluate_is_streaming_closed = _is_connection_error(e)._is_streaming_closedin the fallback-to-main condition (alongside_is_model_not_found,_is_timeout,_is_json_decode), with reason string"closed stream prematurely".Tests
4 new regression tests in
TestStreamingClosedFallback:test_incomplete_chunked_read_falls_back_to_main_is_connection_error=True→ retry on main, result returnedtest_peer_closed_connection_falls_back_to_maintest_streaming_closed_on_main_uses_short_cooldowntest_non_streaming_unknown_error_still_uses_long_cooldownAll 76 tests in
test_context_compressor.pypass.