Bug Description
During long-running Hermes Agent sessions, automatic context compression can fail when the auxiliary summary request is interrupted by a transient streaming/network error such as:
peer closed connection without sending complete message body (incomplete chunked read)
When this happens, agent/context_compressor.py inserts a static fallback context marker and removes the middle conversation turns without a real summary. The session survives, but the compaction is lossy and the next assistant may need to recover context from files, logs, or session search.
Observed Behavior
In a real long-running gateway session, logs showed repeated compression attempts and intermittent failures:
Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds.
Summary generation failed — inserting static fallback context marker
Auxiliary compression: using auto (...) at ...
This is more likely in very long, tool-heavy tasks because the compression prompt can be large and the auxiliary response is a long-lived request.
Expected Behavior
Transient compression-summary failures should be retried and/or routed through fallback providers before Hermes drops the compressed middle turns without a real summary.
Suggested behavior:
- Classify common premature-response/streaming close errors as connection errors, including strings such as:
incomplete chunked read
peer closed connection
unexpected eof
response ended prematurely
connection was closed
- Add a short retry policy around compression summary generation, e.g. 1-2 retries with small exponential backoff for transient network/timeout/read errors.
- If
auxiliary.compression.provider is auto, allow the existing auxiliary provider fallback chain to run for those transient errors.
- Only insert the static fallback marker after retry/fallback attempts are exhausted.
- Log enough detail to distinguish:
- summary retry succeeded
- fallback provider succeeded
- final static marker fallback was used
Why This Matters
The current static marker is better than crashing, but it is still lossy. In long tasks that run for hours, losing the middle handoff summary can make the agent repeat work, miss decisions, or require manual recovery.
Relevant Code Areas
agent/context_compressor.py
_generate_summary(...)
compress(...) static fallback marker path
agent/auxiliary_client.py
_is_connection_error(...)
call_llm(...) retry/fallback path
Possible Tests
- Unit test that
_is_connection_error() returns true for incomplete chunked read / peer closed connection / premature EOF strings.
- Compression test where the first summary call raises a transient incomplete-chunked-read error and the second call succeeds; assert no static fallback marker is inserted.
- Compression test where auto provider A raises a transient connection error and provider B succeeds; assert the real summary is used.
- Compression test where all retry/fallback attempts fail; assert the existing static fallback marker is still inserted.
Environment Notes
This was observed in a gateway/Feishu long-running workflow with compression.enabled: true and auxiliary.compression.provider: auto. The underlying cause is likely transient upstream/proxy/network interruption, but Hermes can make this path much more robust.
Bug Description
During long-running Hermes Agent sessions, automatic context compression can fail when the auxiliary summary request is interrupted by a transient streaming/network error such as:
When this happens,
agent/context_compressor.pyinserts a static fallback context marker and removes the middle conversation turns without a real summary. The session survives, but the compaction is lossy and the next assistant may need to recover context from files, logs, or session search.Observed Behavior
In a real long-running gateway session, logs showed repeated compression attempts and intermittent failures:
This is more likely in very long, tool-heavy tasks because the compression prompt can be large and the auxiliary response is a long-lived request.
Expected Behavior
Transient compression-summary failures should be retried and/or routed through fallback providers before Hermes drops the compressed middle turns without a real summary.
Suggested behavior:
incomplete chunked readpeer closed connectionunexpected eofresponse ended prematurelyconnection was closedauxiliary.compression.providerisauto, allow the existing auxiliary provider fallback chain to run for those transient errors.Why This Matters
The current static marker is better than crashing, but it is still lossy. In long tasks that run for hours, losing the middle handoff summary can make the agent repeat work, miss decisions, or require manual recovery.
Relevant Code Areas
agent/context_compressor.py_generate_summary(...)compress(...)static fallback marker pathagent/auxiliary_client.py_is_connection_error(...)call_llm(...)retry/fallback pathPossible Tests
_is_connection_error()returns true forincomplete chunked read/peer closed connection/ premature EOF strings.Environment Notes
This was observed in a gateway/Feishu long-running workflow with
compression.enabled: trueandauxiliary.compression.provider: auto. The underlying cause is likely transient upstream/proxy/network interruption, but Hermes can make this path much more robust.