Skip to content

Context compaction can misread preserved todo/tool state as current user intent and leak MEDIA directives #14665

@zqchris

Description

@zqchris

Bug description

context compression can preserve cross-session/tool state in a way that looks like a fresh user request in the new session.

In the failure mode I hit, three things stack together:

  1. the compaction summary carries forward an old ## Active Task
  2. the preserved todo list is injected as a normal user message
  3. tool outputs such as memory / session_search are serialized verbatim into the summarizer input, including strings like MEDIA:

That can cause the resumed assistant to follow an old task instead of the latest real user message, and can also make MEDIA: directives leak back into normal assistant text.

Why this matters

There are two separate bad outcomes here:

1) Wrong task resumption after compaction

The post-compaction todo injection currently looks like ordinary conversation text, so the model can treat it as the current user ask.

2) MEDIA: directive contamination

If memory / session_search / other tool results contain text like MEDIA:/tmp/foo.png, that text can be preserved in the compaction chain and later echoed by the model as plain content.

On gateway integrations that parse MEDIA: tags for file delivery, this can lead to bogus attachment attempts (for example trying to send a non-existent file path extracted from quoted prose or preference text).

Minimal repro shape

A deterministic repro can be built with a compressed conversation containing:

  • a compaction summary with an old ## Active Task
  • a preserved active todo snapshot
  • a memory or session_search tool result containing MEDIA: text
  • a latest real user message that should be the only active request

Observed behavior:

  • the assistant may resume the old task / preserved todo state instead of the latest real user message
  • MEDIA: text from tool state can survive into later assistant-visible context as if it were ordinary text

Suspect locations

  • agent/context_compressor.py
    • _serialize_for_summary() currently serializes tool result content and tool-call args directly into the summarizer input
  • tools/todo_tool.py
    • format_for_injection() renders preserved todo state as natural-language text
  • run_agent.py
    • _compress_context() injects the todo snapshot back into the compressed message list as a user message

Why the existing gateway-side MEDIA: hardening is not enough

I know there was already work around stricter MEDIA: extraction in gateway parsing, but this bug happens earlier in the pipeline:

  • summary contamination / stale task carry-over
  • todo state being injected as if it were a user utterance
  • tool-state text containing control directives being preserved and resurfaced

So even if gateway extraction is stricter, the conversation state can still get semantically polluted after compaction.

Suggested fix directions

  1. Treat memory, session_search, todo and similar tool state as non-intent state, not current user intent, when building summary input
  2. Mask control directives like MEDIA: before tool outputs are fed into compaction summaries
  3. Do not inject preserved todo state as natural-language text that looks like a fresh user message
  4. Ensure preserved todo state does not outrank the latest real user message after compaction

Regression coverage that would be useful

  • summary input containing memory / session_search results with MEDIA: should not preserve raw MEDIA: tokens
  • preserved todo state should be clearly machine-generated state, not look like a new user request
  • after compaction, the latest real user message should remain the active request even when summary + preserved todo state are both present

If helpful, I can turn the local repro/fix into a PR next.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions