Bug description
context compression can preserve cross-session/tool state in a way that looks like a fresh user request in the new session.
In the failure mode I hit, three things stack together:
- the compaction summary carries forward an old
## Active Task
- the preserved todo list is injected as a normal
user message
- tool outputs such as
memory / session_search are serialized verbatim into the summarizer input, including strings like MEDIA:
That can cause the resumed assistant to follow an old task instead of the latest real user message, and can also make MEDIA: directives leak back into normal assistant text.
Why this matters
There are two separate bad outcomes here:
1) Wrong task resumption after compaction
The post-compaction todo injection currently looks like ordinary conversation text, so the model can treat it as the current user ask.
2) MEDIA: directive contamination
If memory / session_search / other tool results contain text like MEDIA:/tmp/foo.png, that text can be preserved in the compaction chain and later echoed by the model as plain content.
On gateway integrations that parse MEDIA: tags for file delivery, this can lead to bogus attachment attempts (for example trying to send a non-existent file path extracted from quoted prose or preference text).
Minimal repro shape
A deterministic repro can be built with a compressed conversation containing:
- a compaction summary with an old
## Active Task
- a preserved active todo snapshot
- a
memory or session_search tool result containing MEDIA: text
- a latest real user message that should be the only active request
Observed behavior:
- the assistant may resume the old task / preserved todo state instead of the latest real user message
MEDIA: text from tool state can survive into later assistant-visible context as if it were ordinary text
Suspect locations
agent/context_compressor.py
_serialize_for_summary() currently serializes tool result content and tool-call args directly into the summarizer input
tools/todo_tool.py
format_for_injection() renders preserved todo state as natural-language text
run_agent.py
_compress_context() injects the todo snapshot back into the compressed message list as a user message
Why the existing gateway-side MEDIA: hardening is not enough
I know there was already work around stricter MEDIA: extraction in gateway parsing, but this bug happens earlier in the pipeline:
- summary contamination / stale task carry-over
- todo state being injected as if it were a user utterance
- tool-state text containing control directives being preserved and resurfaced
So even if gateway extraction is stricter, the conversation state can still get semantically polluted after compaction.
Suggested fix directions
- Treat
memory, session_search, todo and similar tool state as non-intent state, not current user intent, when building summary input
- Mask control directives like
MEDIA: before tool outputs are fed into compaction summaries
- Do not inject preserved todo state as natural-language text that looks like a fresh
user message
- Ensure preserved todo state does not outrank the latest real user message after compaction
Regression coverage that would be useful
- summary input containing
memory / session_search results with MEDIA: should not preserve raw MEDIA: tokens
- preserved todo state should be clearly machine-generated state, not look like a new user request
- after compaction, the latest real user message should remain the active request even when summary + preserved todo state are both present
If helpful, I can turn the local repro/fix into a PR next.
Bug description
context compressioncan preserve cross-session/tool state in a way that looks like a fresh user request in the new session.In the failure mode I hit, three things stack together:
## Active Taskusermessagememory/session_searchare serialized verbatim into the summarizer input, including strings likeMEDIA:That can cause the resumed assistant to follow an old task instead of the latest real user message, and can also make
MEDIA:directives leak back into normal assistant text.Why this matters
There are two separate bad outcomes here:
1) Wrong task resumption after compaction
The post-compaction todo injection currently looks like ordinary conversation text, so the model can treat it as the current user ask.
2)
MEDIA:directive contaminationIf
memory/session_search/ other tool results contain text likeMEDIA:/tmp/foo.png, that text can be preserved in the compaction chain and later echoed by the model as plain content.On gateway integrations that parse
MEDIA:tags for file delivery, this can lead to bogus attachment attempts (for example trying to send a non-existent file path extracted from quoted prose or preference text).Minimal repro shape
A deterministic repro can be built with a compressed conversation containing:
## Active Taskmemoryorsession_searchtool result containingMEDIA:textObserved behavior:
MEDIA:text from tool state can survive into later assistant-visible context as if it were ordinary textSuspect locations
agent/context_compressor.py_serialize_for_summary()currently serializes tool result content and tool-call args directly into the summarizer inputtools/todo_tool.pyformat_for_injection()renders preserved todo state as natural-language textrun_agent.py_compress_context()injects the todo snapshot back into the compressed message list as ausermessageWhy the existing gateway-side
MEDIA:hardening is not enoughI know there was already work around stricter
MEDIA:extraction in gateway parsing, but this bug happens earlier in the pipeline:So even if gateway extraction is stricter, the conversation state can still get semantically polluted after compaction.
Suggested fix directions
memory,session_search,todoand similar tool state as non-intent state, not current user intent, when building summary inputMEDIA:before tool outputs are fed into compaction summariesusermessageRegression coverage that would be useful
memory/session_searchresults withMEDIA:should not preserve rawMEDIA:tokensIf helpful, I can turn the local repro/fix into a PR next.