Context compression can be interrupted by gateway messages, causing fallback summary marker

## Bug Description
Context compression can fail with `Codex auxiliary Responses stream interrupted` when a new gateway message / process watch-pattern notification arrives while the auxiliary compression summary is running.

The active conversation then continues with a fallback context marker instead of a useful compression summary, so the middle of the session history is effectively lost from the model context even though raw logs remain on disk.

## Observed Logs
From a Telegram gateway session using `provider: openai-codex`, main model `gpt-5.5`, auxiliary compression `openai-codex/gpt-5.4-mini`:

```text
2026-05-11 21:18:13,000 INFO gateway.run: inbound message: platform=telegram user=... msg='[IMPORTANT: Background process proc_11508d9d1e67 matched watch pattern "DevTools'
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: Preflight compression: ~136,264 tokens >= 136,000 threshold (model gpt-5.5, ctx 272,000)
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: context compression started: session=20260511_205207_8a7dc8 messages=169 tokens=~136,264 model=gpt-5.5 focus=None
2026-05-11 21:18:13,099 INFO [20260511_205207_8a7dc8] agent.auxiliary_client: Auxiliary compression: using openai-codex (gpt-5.4-mini) at https://chatgpt.com/backend-api/codex/
2026-05-11 21:18:43,092 WARNING [20260511_205207_8a7dc8] root: Failed to generate context summary: Codex auxiliary Responses stream interrupted. Further summary attempts paused for 60 seconds.
2026-05-11 21:18:43,131 INFO [20260511_205207_8a7dc8] run_agent: context compression done: session=20260511_211843_103fc8 messages=169->8 tokens=~22,523
2026-05-11 21:18:43,139 INFO [20260511_205207_8a7dc8] run_agent: Turn ended: reason=interrupted_by_user model=gpt-5.5 api_calls=0/90 budget=0/90 tool_turns=2 last_msg_role=user response_len=0 session=20260511_211843_103fc8
2026-05-11 21:18:43,246 INFO [20260511_211843_103fc8] run_agent: conversation turn: session=20260511_211843_103fc8 model=gpt-5.5 provider=openai-codex platform=telegram history=8 msg='...next user message...'
```

The user-facing marker was:

```text
⚠ Compression summary failed: Codex auxiliary Responses stream interrupted. Inserted a fallback context marker.
```

## Root Cause Hypothesis
`agent/auxiliary_client.py` checks the global/per-thread interrupt flag while streaming Codex auxiliary responses:

```py
from tools.interrupt import is_interrupted
if is_interrupted():
    raise InterruptedError("Codex auxiliary Responses stream interrupted")
```

For normal model/tool turns this makes sense. For context compression it is brittle: compression is infrastructure needed to preserve continuity. If Telegram receives another user message or an injected watch-pattern notification while the summarizer is running, the interrupt aborts the summary and Hermes falls back to a generic context marker.

In this case the compression timeout was already set to 360s, and the failure happened after ~30s, so this was not a timeout. Auth was also healthy. It was an interrupt.

## Expected Behavior
Context compression should be robust against user/gateway interrupts:

- Once preflight compression starts, the summary generation should complete atomically, or
- incoming gateway messages should be queued/deferred until compression finishes, or
- compression auxiliary calls should ignore/defer interrupt checks specifically for the compression task.

The next user message should be processed after the compressed session has a real summary, not after a fallback marker.

## Actual Behavior
A message/watch notification arriving during compression interrupts the auxiliary Codex Responses stream. Hermes inserts a fallback context marker and proceeds with only a generic compaction reference.

## Proposed Fix Direction
A few possible approaches:

1. Treat compression as a critical section in the gateway/session runner: queue new messages until compression returns.
2. Add an auxiliary-client option like `allow_interrupt=False` for `task="compression"` and keep interrupt behavior for other auxiliary tasks.
3. Special-case watch-pattern/process notifications so they don't interrupt a preflight compression turn.
4. If compression is interrupted, retry once after clearing/defering the interrupt before falling back to the marker.

I lean toward (1) or (2): compression is not optional UX output; it protects conversation continuity.

## Environment
- Platform: Telegram gateway
- Provider: `openai-codex`
- Main model: `gpt-5.5`
- Auxiliary compression provider/model: `openai-codex` / `gpt-5.4-mini`
- Compression config at the time:

```yaml
compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.4-mini
    timeout: 360
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context compression can be interrupted by gateway messages, causing fallback summary marker #23975

Bug Description

Observed Logs

Root Cause Hypothesis

Expected Behavior

Actual Behavior

Proposed Fix Direction

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Context compression can be interrupted by gateway messages, causing fallback summary marker #23975

Description

Bug Description

Observed Logs

Root Cause Hypothesis

Expected Behavior

Actual Behavior

Proposed Fix Direction

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions