Summary
Context compression can fail when the auxiliary compression API call is interrupted with an incomplete chunked read. Hermes inserts a fallback context marker instead of a real summary:
⚠️ Compression summary failed: peer closed connection without sending complete message body (incomplete chunked read). Inserted a fallback context marker.
This is especially visible in long Telegram sessions because context compaction is frequent.
Observed log evidence
Local logs show repeated failures from auxiliary compression:
agent.auxiliary_client: Auxiliary compression: using auto (gpt-5.5) at https://chatgpt.com/backend-api/codex/
WARNING root: Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds.
Recent examples occurred repeatedly in one long-running Telegram workflow, e.g.:
2026-04-28 01:52:55 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:14:57 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:20:45 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:23:20 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
User impact
Not usually data-destructive, but it is operationally serious for long sessions:
- context is compacted without a useful generated summary;
- the fallback marker preserves that something happened, but useful prior-turn details can be lost;
- long Telegram sessions become less reliable exactly when compaction is needed most.
Local mitigation tried
I applied local config mitigations to reduce frequency/severity:
auxiliary:
compression:
timeout: 360
compression:
threshold: 0.55
This should give the compression call more time and trigger compaction earlier with smaller context chunks. It does not address the underlying bug.
Suggested fix direction
Compression should handle incomplete chunked read/peer-closed transport failures more robustly:
- Treat incomplete chunked read as retryable for auxiliary compression, not as immediate fallback-marker finalization.
- Retry with backoff before inserting fallback marker.
- If the primary auxiliary provider fails, try configured fallback provider/model if available.
- Consider a smaller emergency compression prompt/chunked summarization fallback before giving up.
- Improve the fallback marker to include a minimal deterministic local summary such as message count, timestamp range, and last N user/assistant snippets, so continuity loss is less severe.
Environment notes
- Gateway platform: Telegram
- Auxiliary compression provider:
auto, resolving to the main openai-codex provider against https://chatgpt.com/backend-api/codex/
- Model observed:
gpt-5.5
Summary
Context compression can fail when the auxiliary compression API call is interrupted with an incomplete chunked read. Hermes inserts a fallback context marker instead of a real summary:
This is especially visible in long Telegram sessions because context compaction is frequent.
Observed log evidence
Local logs show repeated failures from auxiliary compression:
Recent examples occurred repeatedly in one long-running Telegram workflow, e.g.:
User impact
Not usually data-destructive, but it is operationally serious for long sessions:
Local mitigation tried
I applied local config mitigations to reduce frequency/severity:
This should give the compression call more time and trigger compaction earlier with smaller context chunks. It does not address the underlying bug.
Suggested fix direction
Compression should handle
incomplete chunked read/peer-closed transport failures more robustly:Environment notes
auto, resolving to the mainopenai-codexprovider againsthttps://chatgpt.com/backend-api/codex/gpt-5.5