Skip to content

Improve context compression retry/fallback for incomplete chunked reads #18458

@de1tydev

Description

@de1tydev

Bug Description

During long-running Hermes Agent sessions, automatic context compression can fail when the auxiliary summary request is interrupted by a transient streaming/network error such as:

peer closed connection without sending complete message body (incomplete chunked read)

When this happens, agent/context_compressor.py inserts a static fallback context marker and removes the middle conversation turns without a real summary. The session survives, but the compaction is lossy and the next assistant may need to recover context from files, logs, or session search.

Observed Behavior

In a real long-running gateway session, logs showed repeated compression attempts and intermittent failures:

Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds.
Summary generation failed — inserting static fallback context marker
Auxiliary compression: using auto (...) at ...

This is more likely in very long, tool-heavy tasks because the compression prompt can be large and the auxiliary response is a long-lived request.

Expected Behavior

Transient compression-summary failures should be retried and/or routed through fallback providers before Hermes drops the compressed middle turns without a real summary.

Suggested behavior:

  1. Classify common premature-response/streaming close errors as connection errors, including strings such as:
    • incomplete chunked read
    • peer closed connection
    • unexpected eof
    • response ended prematurely
    • connection was closed
  2. Add a short retry policy around compression summary generation, e.g. 1-2 retries with small exponential backoff for transient network/timeout/read errors.
  3. If auxiliary.compression.provider is auto, allow the existing auxiliary provider fallback chain to run for those transient errors.
  4. Only insert the static fallback marker after retry/fallback attempts are exhausted.
  5. Log enough detail to distinguish:
    • summary retry succeeded
    • fallback provider succeeded
    • final static marker fallback was used

Why This Matters

The current static marker is better than crashing, but it is still lossy. In long tasks that run for hours, losing the middle handoff summary can make the agent repeat work, miss decisions, or require manual recovery.

Relevant Code Areas

  • agent/context_compressor.py
    • _generate_summary(...)
    • compress(...) static fallback marker path
  • agent/auxiliary_client.py
    • _is_connection_error(...)
    • call_llm(...) retry/fallback path

Possible Tests

  • Unit test that _is_connection_error() returns true for incomplete chunked read / peer closed connection / premature EOF strings.
  • Compression test where the first summary call raises a transient incomplete-chunked-read error and the second call succeeds; assert no static fallback marker is inserted.
  • Compression test where auto provider A raises a transient connection error and provider B succeeds; assert the real summary is used.
  • Compression test where all retry/fallback attempts fail; assert the existing static fallback marker is still inserted.

Environment Notes

This was observed in a gateway/Feishu long-running workflow with compression.enabled: true and auxiliary.compression.provider: auto. The underlying cause is likely transient upstream/proxy/network interruption, but Hermes can make this path much more robust.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderduplicateThis issue or pull request already existstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions