Compact summary hallucinates fabricated user instructions that get executed after context reset

## Summary

When Claude Code performs an auto-compact (context summarization) of a long conversation, the assistant writing the summary can **hallucinate a user instruction that never existed in the original conversation**. After the context reset, the resumed assistant reads the summary and executes the fabricated instruction as if the user had actually typed it, leading to runaway off-scope work.

This is a silent data integrity bug in the compaction pipeline: a fabricated string becomes indistinguishable from a real user message in the post-compact worldview.

## Reproduction / Evidence

In an affected session, the following sequence occurred:

1. The user's **actual** last explicit request was a single sentence: `배낄만한실제서비스 보고싶음` (roughly: "I want to see real reference services worth copying").
2. Context filled up, auto-compact fired.
3. The compact summary, written by the pre-compact assistant, contained this fabricated entry inside the "All user messages" section:
   > `System checkpoint ULW mode instruction: "위에서없는거 다찾아, 할일 없을때까지 찾아 ulw" (find everything not found above, until no work left)`
4. The resumed post-compact assistant read the summary, treated the fabricated string as a live user directive, and proceeded to execute **five sequential batches of screenshot captures targeting 44 URLs**, far beyond the user's original one-line request.
5. The user eventually interrupted and asked "did I actually type that?" — at which point grep of the raw JSONL transcript confirmed the phrase was **never** in any pre-summary user message or hook.

### Verification via raw JSONL

```
# The summary is line 882 of the session JSONL.
# Count occurrences of the fabricated phrase BEFORE the summary line:
$ awk 'NR<882' session.jsonl | grep -c '위에서없는거'
0
$ awk 'NR<882' session.jsonl | grep -c '할일 없을때까지'
0
```

The only `ulw`/`ultrawork` mentions pre-summary are inside `SessionStart` hook context (the OMC skills catalog listing `ultrawork` as an available skill name) — not any kind of execution directive.

So the chain is:
- Skills catalog mentions `ultrawork` exists →
- Pre-compact assistant hallucinates a Korean-language "system checkpoint ULW mode instruction" and embeds it in the summary as a user message →
- Post-compact assistant treats it as real and acts on it.

## Why this is dangerous

1. **Hallucinations become durable.** Normal hallucinations are ephemeral (one response). A hallucination baked into a compact summary survives the context reset and keeps influencing behavior across turns.
2. **Trust levels collapse.** The resumed assistant has no reliable way to distinguish:
   - real user messages (high trust)
   - live hook injections (medium trust, system-authored)
   - summary-recovered content (should be low trust — may contain fabrication)
   All three currently flow into the context through similar-looking surfaces.
3. **Works against the user.** The fabricated instruction in this case was a persistent-work mode directive ("keep working until nothing is left"), which the model happily executed for 5 consecutive batches while consuming tokens and triggering 44 network requests that the user did not ask for.
4. **Reproducible pattern.** Any skill/workflow whose name appears in the skills catalog is a potential seed for this kind of fabrication. ULW, ralph, autopilot, etc. are all at risk.

## Suggested fixes

Ordered by impact:

1. **Mark recovered content as recovered.** When the post-compact assistant re-enters, surface summary content behind a label like `[RECOVERED FROM SUMMARY — may contain hallucinations, verify against raw log if load-bearing]`, distinct from live `<system-reminder>` or real user messages.
2. **Compaction safety prompt.** When the pre-compact assistant writes a summary, instruct it explicitly: *"Only include user messages that were literally present. Do not paraphrase hook/skill-catalog content as user instructions. If a persistent mode (ULW/ralph/autopilot) is active, cite the exact hook or command that activated it."*
3. **Post-compact verification gate.** Before acting on any durable-work directive that appears only in the summary (e.g., "keep running", "find everything", "don't stop"), require the assistant to grep the raw JSONL for the phrase's pre-compact origin. This is cheap and catches this exact bug.
4. **Separate channels.** Long-term: give summary-recovered content its own message role distinct from `user`, so the trust gradient is structurally enforced rather than dependent on prompting.

## Impact observed

- 44 unwanted URL fetches through a Playwright session
- ~5 batches of assistant work the user did not request
- User had to spend turns to diagnose "did I say this?"
- Feedback memory created locally to prevent recurrence (`feedback_summary_hallucination.md`) — but this is a per-user workaround, not a platform fix.

## Environment

- Claude Code CLI, WSL2 Ubuntu
- Model: `claude-opus-4-6`
- Plugin layer: oh-my-claudecode (OMC) active, which registered `ultrawork` as a skill name — this contributed to the specific Korean hallucination but the core bug (summary fabrication surviving compact) is independent of any plugin.

## Not a prompt injection

Worth noting: this is **not** a prompt injection attack. No external untrusted content fed the fabrication. The model fabricated the string from its own context while writing a summary. This is a model-faithfulness regression in the compaction pipeline specifically.

---

Reported by a user who experienced the bug live and verified the evidence against the raw session JSONL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact summary hallucinates fabricated user instructions that get executed after context reset #46602

Summary

Reproduction / Evidence

Verification via raw JSONL

Why this is dangerous

Suggested fixes

Impact observed

Environment

Not a prompt injection

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Compact summary hallucinates fabricated user instructions that get executed after context reset #46602

Description

Summary

Reproduction / Evidence

Verification via raw JSONL

Why this is dangerous

Suggested fixes

Impact observed

Environment

Not a prompt injection

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions