Skip to content

Compression fallback marker after incomplete chunked read loses useful context in long sessions #16670

@yuzilongleif-collab

Description

@yuzilongleif-collab

Summary

Context compression can fail when the auxiliary compression API call is interrupted with an incomplete chunked read. Hermes inserts a fallback context marker instead of a real summary:

⚠️ Compression summary failed: peer closed connection without sending complete message body (incomplete chunked read). Inserted a fallback context marker.

This is especially visible in long Telegram sessions because context compaction is frequent.

Observed log evidence

Local logs show repeated failures from auxiliary compression:

agent.auxiliary_client: Auxiliary compression: using auto (gpt-5.5) at https://chatgpt.com/backend-api/codex/
WARNING root: Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds.

Recent examples occurred repeatedly in one long-running Telegram workflow, e.g.:

2026-04-28 01:52:55 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:14:57 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:20:45 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:23:20 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).

User impact

Not usually data-destructive, but it is operationally serious for long sessions:

  • context is compacted without a useful generated summary;
  • the fallback marker preserves that something happened, but useful prior-turn details can be lost;
  • long Telegram sessions become less reliable exactly when compaction is needed most.

Local mitigation tried

I applied local config mitigations to reduce frequency/severity:

auxiliary:
  compression:
    timeout: 360

compression:
  threshold: 0.55

This should give the compression call more time and trigger compaction earlier with smaller context chunks. It does not address the underlying bug.

Suggested fix direction

Compression should handle incomplete chunked read/peer-closed transport failures more robustly:

  1. Treat incomplete chunked read as retryable for auxiliary compression, not as immediate fallback-marker finalization.
  2. Retry with backoff before inserting fallback marker.
  3. If the primary auxiliary provider fails, try configured fallback provider/model if available.
  4. Consider a smaller emergency compression prompt/chunked summarization fallback before giving up.
  5. Improve the fallback marker to include a minimal deterministic local summary such as message count, timestamp range, and last N user/assistant snippets, so continuity loss is less severe.

Environment notes

  • Gateway platform: Telegram
  • Auxiliary compression provider: auto, resolving to the main openai-codex provider against https://chatgpt.com/backend-api/codex/
  • Model observed: gpt-5.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions