Skip to content

Gateway auto-continue note can be persisted and amplified by interrupt-triggered preflight compression #25242

@bizyumov

Description

@bizyumov

Summary

Gateway interrupt + tool-tail auto-continue + preflight compression can turn a one-time recovery hint into durable session poison.

When a gateway turn is interrupted after tool output has been appended, the next user turn sees a trailing role="tool" and gateway prepends:

[System note: Your previous turn was interrupted before you could process the last tool result(s)...]

If that next turn immediately triggers preflight compression, the synthetic note and stale tool-tail content can be serialized/compacted into the child session. Later turns then keep seeing the old task/tool output as if it were fresh context.

Related but not identical: #23975 covers compression being interrupted and falling back to a weak marker. This issue covers auto-continue note persistence/replay plus missing consumed-state for inferred tool-tail recovery.

Affected code paths

  • gateway/run.py

    • _has_fresh_tool_tail is inferred from transcript shape:

      _has_fresh_tool_tail = bool(
          agent_history
          and agent_history[-1].get("role") == "tool"
          and _interruption_is_fresh
      )
    • The recovery note is prepended directly into message.

    • agent.run_conversation(...) is called without persist_user_message=..., so the synthetic note can become persisted user content.

    • Session split handling happens after final-response handling, so interrupted/no-response compression splits are fragile.

  • run_agent.py

    • preflight compression runs before the main tool loop checks _interrupt_requested.
    • _compress_context() can rotate self.session_id even if the turn later exits interrupted_by_user with api_calls=0.
  • gateway/session.py

    • resume_pending has durable state and cleanup.
    • inferred tool-tail auto-continue has no equivalent "this exact tool tail was already delivered" ack.

Reproduction steps

No personal data is required. This can be reproduced with any gateway platform that supports interrupting an active turn.

  1. Configure gateway input handling to interrupt active work:

    display:
      busy_input_mode: interrupt
    compression:
      enabled: true
  2. Use a long-running gateway session whose transcript is near the preflight compression threshold.

    For a deterministic test, lower the compression threshold or use a fixture history large enough that adding one recovery turn crosses the threshold.

  3. Start a turn that calls at least one tool and then requires a follow-up model call to summarize/process the tool result.

    The important transcript tail shape is:

    [
        {"role": "assistant", "tool_calls": [{"id": "call_1", ...}]},
        {"role": "tool", "tool_call_id": "call_1", "content": "...tool output..."},
    ]
  4. While the agent is in the next model/API call after that tool output, send a new gateway message in the same session.

  5. Observe the first turn exits with an interrupted tool tail, e.g. anonymized log shape:

    T+00.000 Turn ended: reason=interrupted_during_api_call ... last_msg_role=tool ... response_len=65
    
  6. The pending/new user message is processed immediately. Because history ends in role="tool", gateway prepends the auto-continue note:

    T+00.075 conversation turn: history=N msg='[System note: Your previous turn was interrupted before you could proc...'
    
  7. Because the transcript is near threshold, preflight compression runs before the turn reaches the normal interrupt check:

    T+00.082 Preflight compression: ~218,403 tokens >= 217,600 threshold
    T+00.082 context compression started: messages=173
    T+178.130 context compression done: messages=173->10
    T+178.200 Turn ended: reason=interrupted_by_user api_calls=0 last_msg_role=user response_len=0
    
  8. Continue sending messages in the same session.

Actual behavior

  • The synthetic auto-continue note appears in persisted/compacted user-message history.
  • The stale tool output remains semantically active and can be repeatedly summarized or obeyed.
  • The same old task/tool result can reappear across later session splits, even after the model has already responded to it.
  • Since tool-tail recovery is inferred only from transcript shape, there is no durable marker saying "this tool tail was already handed back to the model".

Expected behavior

  • A trailing tool result may trigger at most one recovery attempt.
  • The recovery instruction should be API-only context, not user-authored transcript text.
  • Preflight compression should not commit a session split while a turn is already interrupted, unless the split is safely propagated and the synthetic recovery note is not serialized.
  • Once a specific trailing tool batch is delivered to the model, subsequent user turns should not receive the same auto-continue note for that same batch.

Proposed fix direction

  1. Use the existing AIAgent.run_conversation(..., persist_user_message=...) support for the gateway auto-continue prefix:

    agent.run_conversation(
        run_message_with_recovery_note,
        conversation_history=agent_history,
        task_id=session_id,
        persist_user_message=clean_user_message,
    )
  2. Add durable consumed state for inferred tool-tail recovery, similar in spirit to resume_pending:

    auto_continue_tool_tail_key: Optional[str]
    auto_continue_tool_tail_ack_at: Optional[datetime]

    Compute the key from the trailing consecutive tool messages, e.g. tool_call_id, tool_name, and a content hash.

  3. Gate _has_fresh_tool_tail on that key not already being acknowledged.

  4. Mark the key acknowledged once the recovery turn has actually reached the model.

  5. Make preflight compression interrupt-aware:

    • check _interrupt_requested before starting compression;
    • re-check before committing a session split;
    • or defer gateway interrupts while compression is in a critical section.
  6. Ensure compression-induced agent.session_id changes are propagated back to gateway/session store even for interrupted/no-final-response turns, or avoid committing the split in that path.

Notes

This is not fixed by switching to busy_input_mode: queue; interrupt mode is a valid required mode. The bug is that interrupt recovery has no one-shot acknowledgement for inferred tool tails, and its synthetic instruction can become durable context during compression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildercomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions