Gateway auto-continue note can be persisted and amplified by interrupt-triggered preflight compression

## Summary

Gateway interrupt + tool-tail auto-continue + preflight compression can turn a one-time recovery hint into durable session poison.

When a gateway turn is interrupted after tool output has been appended, the next user turn sees a trailing `role="tool"` and gateway prepends:

```text
[System note: Your previous turn was interrupted before you could process the last tool result(s)...]
```

If that next turn immediately triggers preflight compression, the synthetic note and stale tool-tail content can be serialized/compacted into the child session. Later turns then keep seeing the old task/tool output as if it were fresh context.

Related but not identical: #23975 covers compression being interrupted and falling back to a weak marker. This issue covers **auto-continue note persistence/replay plus missing consumed-state for inferred tool-tail recovery**.

## Affected code paths

- `gateway/run.py`
  - `_has_fresh_tool_tail` is inferred from transcript shape:

    ```python
    _has_fresh_tool_tail = bool(
        agent_history
        and agent_history[-1].get("role") == "tool"
        and _interruption_is_fresh
    )
    ```

  - The recovery note is prepended directly into `message`.
  - `agent.run_conversation(...)` is called without `persist_user_message=...`, so the synthetic note can become persisted user content.
  - Session split handling happens after final-response handling, so interrupted/no-response compression splits are fragile.

- `run_agent.py`
  - preflight compression runs before the main tool loop checks `_interrupt_requested`.
  - `_compress_context()` can rotate `self.session_id` even if the turn later exits `interrupted_by_user` with `api_calls=0`.

- `gateway/session.py`
  - `resume_pending` has durable state and cleanup.
  - inferred tool-tail auto-continue has no equivalent "this exact tool tail was already delivered" ack.

## Reproduction steps

No personal data is required. This can be reproduced with any gateway platform that supports interrupting an active turn.

1. Configure gateway input handling to interrupt active work:

   ```yaml
   display:
     busy_input_mode: interrupt
   compression:
     enabled: true
   ```

2. Use a long-running gateway session whose transcript is near the preflight compression threshold.

   For a deterministic test, lower the compression threshold or use a fixture history large enough that adding one recovery turn crosses the threshold.

3. Start a turn that calls at least one tool and then requires a follow-up model call to summarize/process the tool result.

   The important transcript tail shape is:

   ```python
   [
       {"role": "assistant", "tool_calls": [{"id": "call_1", ...}]},
       {"role": "tool", "tool_call_id": "call_1", "content": "...tool output..."},
   ]
   ```

4. While the agent is in the next model/API call after that tool output, send a new gateway message in the same session.

5. Observe the first turn exits with an interrupted tool tail, e.g. anonymized log shape:

   ```text
   T+00.000 Turn ended: reason=interrupted_during_api_call ... last_msg_role=tool ... response_len=65
   ```

6. The pending/new user message is processed immediately. Because history ends in `role="tool"`, gateway prepends the auto-continue note:

   ```text
   T+00.075 conversation turn: history=N msg='[System note: Your previous turn was interrupted before you could proc...'
   ```

7. Because the transcript is near threshold, preflight compression runs before the turn reaches the normal interrupt check:

   ```text
   T+00.082 Preflight compression: ~218,403 tokens >= 217,600 threshold
   T+00.082 context compression started: messages=173
   T+178.130 context compression done: messages=173->10
   T+178.200 Turn ended: reason=interrupted_by_user api_calls=0 last_msg_role=user response_len=0
   ```

8. Continue sending messages in the same session.

## Actual behavior

- The synthetic auto-continue note appears in persisted/compacted user-message history.
- The stale tool output remains semantically active and can be repeatedly summarized or obeyed.
- The same old task/tool result can reappear across later session splits, even after the model has already responded to it.
- Since tool-tail recovery is inferred only from transcript shape, there is no durable marker saying "this tool tail was already handed back to the model".

## Expected behavior

- A trailing tool result may trigger at most one recovery attempt.
- The recovery instruction should be API-only context, not user-authored transcript text.
- Preflight compression should not commit a session split while a turn is already interrupted, unless the split is safely propagated and the synthetic recovery note is not serialized.
- Once a specific trailing tool batch is delivered to the model, subsequent user turns should not receive the same auto-continue note for that same batch.

## Proposed fix direction

1. Use the existing `AIAgent.run_conversation(..., persist_user_message=...)` support for the gateway auto-continue prefix:

   ```python
   agent.run_conversation(
       run_message_with_recovery_note,
       conversation_history=agent_history,
       task_id=session_id,
       persist_user_message=clean_user_message,
   )
   ```

2. Add durable consumed state for inferred tool-tail recovery, similar in spirit to `resume_pending`:

   ```python
   auto_continue_tool_tail_key: Optional[str]
   auto_continue_tool_tail_ack_at: Optional[datetime]
   ```

   Compute the key from the trailing consecutive `tool` messages, e.g. `tool_call_id`, `tool_name`, and a content hash.

3. Gate `_has_fresh_tool_tail` on that key not already being acknowledged.

4. Mark the key acknowledged once the recovery turn has actually reached the model.

5. Make preflight compression interrupt-aware:
   - check `_interrupt_requested` before starting compression;
   - re-check before committing a session split;
   - or defer gateway interrupts while compression is in a critical section.

6. Ensure compression-induced `agent.session_id` changes are propagated back to gateway/session store even for interrupted/no-final-response turns, or avoid committing the split in that path.

## Notes

This is not fixed by switching to `busy_input_mode: queue`; interrupt mode is a valid required mode. The bug is that interrupt recovery has no one-shot acknowledgement for inferred tool tails, and its synthetic instruction can become durable context during compression.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway auto-continue note can be persisted and amplified by interrupt-triggered preflight compression #25242

Summary

Affected code paths

Reproduction steps

Actual behavior

Expected behavior

Proposed fix direction

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gateway auto-continue note can be persisted and amplified by interrupt-triggered preflight compression #25242

Description

Summary

Affected code paths

Reproduction steps

Actual behavior

Expected behavior

Proposed fix direction

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions