Summary
When Claude Code performs an auto-compact (context summarization) of a long conversation, the assistant writing the summary can hallucinate a user instruction that never existed in the original conversation. After the context reset, the resumed assistant reads the summary and executes the fabricated instruction as if the user had actually typed it, leading to runaway off-scope work.
This is a silent data integrity bug in the compaction pipeline: a fabricated string becomes indistinguishable from a real user message in the post-compact worldview.
Reproduction / Evidence
In an affected session, the following sequence occurred:
- The user's actual last explicit request was a single sentence:
배낄만한실제서비스 보고싶음 (roughly: "I want to see real reference services worth copying").
- Context filled up, auto-compact fired.
- The compact summary, written by the pre-compact assistant, contained this fabricated entry inside the "All user messages" section:
System checkpoint ULW mode instruction: "위에서없는거 다찾아, 할일 없을때까지 찾아 ulw" (find everything not found above, until no work left)
- The resumed post-compact assistant read the summary, treated the fabricated string as a live user directive, and proceeded to execute five sequential batches of screenshot captures targeting 44 URLs, far beyond the user's original one-line request.
- The user eventually interrupted and asked "did I actually type that?" — at which point grep of the raw JSONL transcript confirmed the phrase was never in any pre-summary user message or hook.
Verification via raw JSONL
# The summary is line 882 of the session JSONL.
# Count occurrences of the fabricated phrase BEFORE the summary line:
$ awk 'NR<882' session.jsonl | grep -c '위에서없는거'
0
$ awk 'NR<882' session.jsonl | grep -c '할일 없을때까지'
0
The only ulw/ultrawork mentions pre-summary are inside SessionStart hook context (the OMC skills catalog listing ultrawork as an available skill name) — not any kind of execution directive.
So the chain is:
- Skills catalog mentions
ultrawork exists →
- Pre-compact assistant hallucinates a Korean-language "system checkpoint ULW mode instruction" and embeds it in the summary as a user message →
- Post-compact assistant treats it as real and acts on it.
Why this is dangerous
- Hallucinations become durable. Normal hallucinations are ephemeral (one response). A hallucination baked into a compact summary survives the context reset and keeps influencing behavior across turns.
- Trust levels collapse. The resumed assistant has no reliable way to distinguish:
- real user messages (high trust)
- live hook injections (medium trust, system-authored)
- summary-recovered content (should be low trust — may contain fabrication)
All three currently flow into the context through similar-looking surfaces.
- Works against the user. The fabricated instruction in this case was a persistent-work mode directive ("keep working until nothing is left"), which the model happily executed for 5 consecutive batches while consuming tokens and triggering 44 network requests that the user did not ask for.
- Reproducible pattern. Any skill/workflow whose name appears in the skills catalog is a potential seed for this kind of fabrication. ULW, ralph, autopilot, etc. are all at risk.
Suggested fixes
Ordered by impact:
- Mark recovered content as recovered. When the post-compact assistant re-enters, surface summary content behind a label like
[RECOVERED FROM SUMMARY — may contain hallucinations, verify against raw log if load-bearing], distinct from live <system-reminder> or real user messages.
- Compaction safety prompt. When the pre-compact assistant writes a summary, instruct it explicitly: "Only include user messages that were literally present. Do not paraphrase hook/skill-catalog content as user instructions. If a persistent mode (ULW/ralph/autopilot) is active, cite the exact hook or command that activated it."
- Post-compact verification gate. Before acting on any durable-work directive that appears only in the summary (e.g., "keep running", "find everything", "don't stop"), require the assistant to grep the raw JSONL for the phrase's pre-compact origin. This is cheap and catches this exact bug.
- Separate channels. Long-term: give summary-recovered content its own message role distinct from
user, so the trust gradient is structurally enforced rather than dependent on prompting.
Impact observed
- 44 unwanted URL fetches through a Playwright session
- ~5 batches of assistant work the user did not request
- User had to spend turns to diagnose "did I say this?"
- Feedback memory created locally to prevent recurrence (
feedback_summary_hallucination.md) — but this is a per-user workaround, not a platform fix.
Environment
- Claude Code CLI, WSL2 Ubuntu
- Model:
claude-opus-4-6
- Plugin layer: oh-my-claudecode (OMC) active, which registered
ultrawork as a skill name — this contributed to the specific Korean hallucination but the core bug (summary fabrication surviving compact) is independent of any plugin.
Not a prompt injection
Worth noting: this is not a prompt injection attack. No external untrusted content fed the fabrication. The model fabricated the string from its own context while writing a summary. This is a model-faithfulness regression in the compaction pipeline specifically.
Reported by a user who experienced the bug live and verified the evidence against the raw session JSONL.
Summary
When Claude Code performs an auto-compact (context summarization) of a long conversation, the assistant writing the summary can hallucinate a user instruction that never existed in the original conversation. After the context reset, the resumed assistant reads the summary and executes the fabricated instruction as if the user had actually typed it, leading to runaway off-scope work.
This is a silent data integrity bug in the compaction pipeline: a fabricated string becomes indistinguishable from a real user message in the post-compact worldview.
Reproduction / Evidence
In an affected session, the following sequence occurred:
배낄만한실제서비스 보고싶음(roughly: "I want to see real reference services worth copying").Verification via raw JSONL
The only
ulw/ultraworkmentions pre-summary are insideSessionStarthook context (the OMC skills catalog listingultraworkas an available skill name) — not any kind of execution directive.So the chain is:
ultraworkexists →Why this is dangerous
All three currently flow into the context through similar-looking surfaces.
Suggested fixes
Ordered by impact:
[RECOVERED FROM SUMMARY — may contain hallucinations, verify against raw log if load-bearing], distinct from live<system-reminder>or real user messages.user, so the trust gradient is structurally enforced rather than dependent on prompting.Impact observed
feedback_summary_hallucination.md) — but this is a per-user workaround, not a platform fix.Environment
claude-opus-4-6ultraworkas a skill name — this contributed to the specific Korean hallucination but the core bug (summary fabrication surviving compact) is independent of any plugin.Not a prompt injection
Worth noting: this is not a prompt injection attack. No external untrusted content fed the fabrication. The model fabricated the string from its own context while writing a summary. This is a model-faithfulness regression in the compaction pipeline specifically.
Reported by a user who experienced the bug live and verified the evidence against the raw session JSONL.