Bug Description
Context compression can fail with Codex auxiliary Responses stream interrupted when a new gateway message / process watch-pattern notification arrives while the auxiliary compression summary is running.
The active conversation then continues with a fallback context marker instead of a useful compression summary, so the middle of the session history is effectively lost from the model context even though raw logs remain on disk.
Observed Logs
From a Telegram gateway session using provider: openai-codex, main model gpt-5.5, auxiliary compression openai-codex/gpt-5.4-mini:
2026-05-11 21:18:13,000 INFO gateway.run: inbound message: platform=telegram user=... msg='[IMPORTANT: Background process proc_11508d9d1e67 matched watch pattern "DevTools'
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: Preflight compression: ~136,264 tokens >= 136,000 threshold (model gpt-5.5, ctx 272,000)
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: context compression started: session=20260511_205207_8a7dc8 messages=169 tokens=~136,264 model=gpt-5.5 focus=None
2026-05-11 21:18:13,099 INFO [20260511_205207_8a7dc8] agent.auxiliary_client: Auxiliary compression: using openai-codex (gpt-5.4-mini) at https://chatgpt.com/backend-api/codex/
2026-05-11 21:18:43,092 WARNING [20260511_205207_8a7dc8] root: Failed to generate context summary: Codex auxiliary Responses stream interrupted. Further summary attempts paused for 60 seconds.
2026-05-11 21:18:43,131 INFO [20260511_205207_8a7dc8] run_agent: context compression done: session=20260511_211843_103fc8 messages=169->8 tokens=~22,523
2026-05-11 21:18:43,139 INFO [20260511_205207_8a7dc8] run_agent: Turn ended: reason=interrupted_by_user model=gpt-5.5 api_calls=0/90 budget=0/90 tool_turns=2 last_msg_role=user response_len=0 session=20260511_211843_103fc8
2026-05-11 21:18:43,246 INFO [20260511_211843_103fc8] run_agent: conversation turn: session=20260511_211843_103fc8 model=gpt-5.5 provider=openai-codex platform=telegram history=8 msg='...next user message...'
The user-facing marker was:
⚠ Compression summary failed: Codex auxiliary Responses stream interrupted. Inserted a fallback context marker.
Root Cause Hypothesis
agent/auxiliary_client.py checks the global/per-thread interrupt flag while streaming Codex auxiliary responses:
from tools.interrupt import is_interrupted
if is_interrupted():
raise InterruptedError("Codex auxiliary Responses stream interrupted")
For normal model/tool turns this makes sense. For context compression it is brittle: compression is infrastructure needed to preserve continuity. If Telegram receives another user message or an injected watch-pattern notification while the summarizer is running, the interrupt aborts the summary and Hermes falls back to a generic context marker.
In this case the compression timeout was already set to 360s, and the failure happened after ~30s, so this was not a timeout. Auth was also healthy. It was an interrupt.
Expected Behavior
Context compression should be robust against user/gateway interrupts:
- Once preflight compression starts, the summary generation should complete atomically, or
- incoming gateway messages should be queued/deferred until compression finishes, or
- compression auxiliary calls should ignore/defer interrupt checks specifically for the compression task.
The next user message should be processed after the compressed session has a real summary, not after a fallback marker.
Actual Behavior
A message/watch notification arriving during compression interrupts the auxiliary Codex Responses stream. Hermes inserts a fallback context marker and proceeds with only a generic compaction reference.
Proposed Fix Direction
A few possible approaches:
- Treat compression as a critical section in the gateway/session runner: queue new messages until compression returns.
- Add an auxiliary-client option like
allow_interrupt=False for task="compression" and keep interrupt behavior for other auxiliary tasks.
- Special-case watch-pattern/process notifications so they don't interrupt a preflight compression turn.
- If compression is interrupted, retry once after clearing/defering the interrupt before falling back to the marker.
I lean toward (1) or (2): compression is not optional UX output; it protects conversation continuity.
Environment
- Platform: Telegram gateway
- Provider:
openai-codex
- Main model:
gpt-5.5
- Auxiliary compression provider/model:
openai-codex / gpt-5.4-mini
- Compression config at the time:
compression:
enabled: true
threshold: 0.5
target_ratio: 0.2
protect_last_n: 20
auxiliary:
compression:
provider: openai-codex
model: gpt-5.4-mini
timeout: 360
Bug Description
Context compression can fail with
Codex auxiliary Responses stream interruptedwhen a new gateway message / process watch-pattern notification arrives while the auxiliary compression summary is running.The active conversation then continues with a fallback context marker instead of a useful compression summary, so the middle of the session history is effectively lost from the model context even though raw logs remain on disk.
Observed Logs
From a Telegram gateway session using
provider: openai-codex, main modelgpt-5.5, auxiliary compressionopenai-codex/gpt-5.4-mini:The user-facing marker was:
Root Cause Hypothesis
agent/auxiliary_client.pychecks the global/per-thread interrupt flag while streaming Codex auxiliary responses:For normal model/tool turns this makes sense. For context compression it is brittle: compression is infrastructure needed to preserve continuity. If Telegram receives another user message or an injected watch-pattern notification while the summarizer is running, the interrupt aborts the summary and Hermes falls back to a generic context marker.
In this case the compression timeout was already set to 360s, and the failure happened after ~30s, so this was not a timeout. Auth was also healthy. It was an interrupt.
Expected Behavior
Context compression should be robust against user/gateway interrupts:
The next user message should be processed after the compressed session has a real summary, not after a fallback marker.
Actual Behavior
A message/watch notification arriving during compression interrupts the auxiliary Codex Responses stream. Hermes inserts a fallback context marker and proceeds with only a generic compaction reference.
Proposed Fix Direction
A few possible approaches:
allow_interrupt=Falsefortask="compression"and keep interrupt behavior for other auxiliary tasks.I lean toward (1) or (2): compression is not optional UX output; it protects conversation continuity.
Environment
openai-codexgpt-5.5openai-codex/gpt-5.4-mini