Bug Description
After context compaction, the model treats the summary's ## Remaining Work / ## Pending User Asks sections as implicit scope expansion when the latest user message is topically consistent but vague. The SUMMARY_PREFIX (fixed in #35383) correctly handles contradictions and overt topic changes, but when the user says something like "fix the issue following the new steps" — vague but on-topic — the compaction body's actionable-sounding sections fill in the blanks. The model then executes a conflated task that exceeds what the user asked and may violate active skill filters.
Specific failure chain observed:
- A session lineage compacts, producing a summary with
## Remaining Work listing genuinely unfinished items (e.g. "check 8 PRs, investigate #89183").
- The user returns with a new instruction that is topically consistent but vague (e.g. "fix the issue following these new steps").
- The
SUMMARY_PREFIX says "if consistent, use as background" — which the model interprets as permission to use the ## Remaining Work items as the concrete task definition.
- The model conflates separate items into one query, violating skill-level filters (e.g.
--draft=false) because the compaction's scope expansion overrides the skill's explicit filter.
The bug produces a report that appears correct to the user (all results are real) but violates the skill's explicit command — a silent scope expansion that's hard to catch because nothing visibly errors.
Steps to Reproduce
- Start a long session working on multiple PRs in a repo. Let the session produce genuinely unfinished items (e.g. "check PRs, investigate draft PR").
- Let automatic context compaction fire, producing a summary with
## Remaining Work and ## Pending User Asks sections listing those items.
- Return to the session and send a new message about the same general topic but with a vague instruction that could encompass the old items (e.g. "fix the issue following the new steps").
- Load a skill that has a specific filter (e.g.
--draft=false) and execute it.
- Observe: the skill's explicit filter is violated because the compaction's "Remaining Work" expanded the scope before the skill was loaded.
Expected Behavior
SUMMARY_PREFIX's "latest message WINS" rule should cover not just overt contradictions but also scope creep. When the latest user message is vague, the model should not use ## Remaining Work / ## Pending User Asks compaction sections to fill in the specifics. If the user's instruction is ambiguous, the model should either ask for clarification or limit execution to only what the skill explicitly commands. The skill's filter (e.g. --draft=false) must always win over compaction scope.
Actual Behavior
Compaction ## Remaining Work sections silently expand the task scope. The model runs a broader query than the skill specifies, and the violation goes undetected because the output looks correct (it just includes more than it should).
Affected Component
- Agent Core (conversation loop, context compression, memory)
Debug Report
N/A — internal model behavior, not a gateway deployment issue. The fix for #35344 updated the prefix but the body sections still carry actionable weight.
Operating System
N/A — cross-platform (observed on WSL Ubuntu)
Root Cause Analysis
The fix in #35383 correctly rewrote SUMMARY_PREFIX to say "latest message WINS — discard those stale items entirely." However, the prefix's carveout for consistency ("If the latest user message is consistent... you may use the summary as background") creates a trap:
- When the user changes topic → WINS triggers correctly, old items discarded.
- When the user gives a vague but topically consistent instruction → the
## Remaining Work sections look like "consistent background" and the model uses them to flesh out the goal.
The root cause is twofold:
- The compaction summary's body sections are generated by the LLM summarizer with actionable framing (
## Remaining Work, ## Pending User Asks) that mirrors real task-management language.
- The model's natural behavior is to treat structured task-like sections as actionable — format beats framing. The SUMMARY_PREFIX's meta-instruction is weaker than the structured body that follows it.
Related Issues — Non-overlap explanation
Proposed Fix
Options (not mutually exclusive):
-
Remove ## Remaining Work and ## Pending User Asks from the compaction summary template — these are the sections that trigger the misleading framing. Replace them with less actionable phrasing like ## Historical Context or ## Previous Work.
-
Strengthen the compaction handoff body itself — add a structural marker after ## Remaining Work / ## Pending User Asks that the model cannot easily override, e.g. appending a separator like --- These items are historical and should not be treated as active tasks --- directly after each actionable section.
-
Add a pre-flight assertion in the agent loop — before executing any skill, check whether the model's planned query scope matches the skill's explicit filters. If the scope is wider, flag it.
The cleanest fix is option 1: rename the summary body sections to purely historical framing so the model cannot read them as active instructions even when the user's message is vague.
Bug Description
After context compaction, the model treats the summary's
## Remaining Work/## Pending User Askssections as implicit scope expansion when the latest user message is topically consistent but vague. TheSUMMARY_PREFIX(fixed in #35383) correctly handles contradictions and overt topic changes, but when the user says something like "fix the issue following the new steps" — vague but on-topic — the compaction body's actionable-sounding sections fill in the blanks. The model then executes a conflated task that exceeds what the user asked and may violate active skill filters.Specific failure chain observed:
## Remaining Worklisting genuinely unfinished items (e.g. "check 8 PRs, investigate #89183").SUMMARY_PREFIXsays "if consistent, use as background" — which the model interprets as permission to use the## Remaining Workitems as the concrete task definition.--draft=false) because the compaction's scope expansion overrides the skill's explicit filter.The bug produces a report that appears correct to the user (all results are real) but violates the skill's explicit command — a silent scope expansion that's hard to catch because nothing visibly errors.
Steps to Reproduce
## Remaining Workand## Pending User Askssections listing those items.--draft=false) and execute it.Expected Behavior
SUMMARY_PREFIX's "latest message WINS" rule should cover not just overt contradictions but also scope creep. When the latest user message is vague, the model should not use## Remaining Work/## Pending User Askscompaction sections to fill in the specifics. If the user's instruction is ambiguous, the model should either ask for clarification or limit execution to only what the skill explicitly commands. The skill's filter (e.g.--draft=false) must always win over compaction scope.Actual Behavior
Compaction
## Remaining Worksections silently expand the task scope. The model runs a broader query than the skill specifies, and the violation goes undetected because the output looks correct (it just includes more than it should).Affected Component
Debug Report
N/A — internal model behavior, not a gateway deployment issue. The fix for #35344 updated the prefix but the body sections still carry actionable weight.
Operating System
N/A — cross-platform (observed on WSL Ubuntu)
Root Cause Analysis
The fix in #35383 correctly rewrote
SUMMARY_PREFIXto say "latest message WINS — discard those stale items entirely." However, the prefix's carveout for consistency ("If the latest user message is consistent... you may use the summary as background") creates a trap:## Remaining Worksections look like "consistent background" and the model uses them to flesh out the goal.The root cause is twofold:
## Remaining Work,## Pending User Asks) that mirrors real task-management language.Related Issues — Non-overlap explanation
Resumed session can answer with stale compaction Active Task instead of latest user message #35344 (fixed): "Resumed session can answer with stale Active Task instead of latest user message." That fix handles the case where the user changes topic and the old Active Task hijacks the reply. The bug here is the inverse: the user's topic stays the same but the compaction's Remaining Work silently expands the scope of a vague instruction. Resumed session can answer with stale compaction Active Task instead of latest user message #35344 doesn't cover this because the user message there was clearly contradictory — here it's consistent.
[Bug]: Iterative context compaction summary keeps completed topics alive and overrides the current active topic #9631 (open): "Iterative context compaction summary keeps completed topics alive." That bug is about old completed topics overshadowing the current active topic in the summary body. The bug here involves unfinished items in
## Remaining Workthat the model picks up as active work — the items are genuinely pending, not stale. [Bug]: Iterative context compaction summary keeps completed topics alive and overrides the current active topic #9631 addresses the summarizer's writing side (it keeps too much old content); this addresses the reading side (the model treats actionable-sounding sections as tasks even when the prefix says otherwise).Context compression + session resume causes model to re-execute the original first task instead of continuing from compressed state #17344 (open): "Context compression + session resume causes model to re-execute original first task." That bug causes the model to restart from scratch using the original first user message. This bug doesn't re-execute — it correctly uses the summary but expands scope beyond what the user said and the skill allows.
[Bug] Context Compaction + Session Split: compressed summary injected as valid history into new session #20293 (open): "Compressed summary injected as valid history into new session." That bug is about the summary appearing as live conversation in a session-split child. This bug is within the same lineage, not a split/new session.
Proposed Fix
Options (not mutually exclusive):
Remove
## Remaining Workand## Pending User Asksfrom the compaction summary template — these are the sections that trigger the misleading framing. Replace them with less actionable phrasing like## Historical Contextor## Previous Work.Strengthen the compaction handoff body itself — add a structural marker after
## Remaining Work/## Pending User Asksthat the model cannot easily override, e.g. appending a separator like--- These items are historical and should not be treated as active tasks ---directly after each actionable section.Add a pre-flight assertion in the agent loop — before executing any skill, check whether the model's planned query scope matches the skill's explicit filters. If the scope is wider, flag it.
The cleanest fix is option 1: rename the summary body sections to purely historical framing so the model cannot read them as active instructions even when the user's message is vague.