Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
The bundled session-memory hook saves raw model output to ~/.openclaw/workspace-/memory/.md without sanitizing chat-template control tokens (<|im_end|>, <|endoftext|>), unmatched chat role markers (assistant:, user:), or unparsed <tool_call> XML emitted by quantized local models. The saved file is then re-injected as "Conversation Summary" context on the next /new or /reset, where the model interprets the embedded role markers as in-progress chat-template scaffolding and produces more malformed output (more orphaned role markers, NO_REPLY-only completions, or further raw tool-call XML). The hook then saves THAT malformed output as the next "memory" file, which gets re-injected on the following /new. Each /new degrades the agent further until it is functionally non-responsive. Disabling the hook globally (openclaw hooks disable session-memory) and quarantining the existing memory files is the only working workaround, since /new itself triggers re-injection.
This is distinct from #42112 (orphaned tool_use blocks rejected by the provider API) — there the corruption is in the session JSONL transcript and the provider returns 400. Here the corruption is in the hook-managed memory file, persists across /new (which #42112 says fixes that bug), and propagates across agents because every agent has its own session-memory file accumulating its own poison.
Observed across 5 of 6 active agent workspaces in our deployment, including both 9B (mlx-community/Qwen3.5-9B-OptiQ-4bit on rapid-mlx) and 27B (mlx-community/Qwen3.5-27B-4bit-DWQ on rapid-mlx) brain agents. Both quantized local models periodically leak chat-template tokens during streaming; the hook captures and persists those leaks indiscriminately.
Steps to reproduce
Preconditions: agent on a quantized local model server (we used mlx-community/Qwen3.5-9B-OptiQ-4bit served by rapid-mlx 0.3.12 on port 8001, but the saved-memory pattern below is provider-agnostic).
-
With session-memory hook enabled (default):
openclaw hooks check
confirm: 💾 session-memory enabled
-
Trigger any agent reply that emits a tool call. With Qwen3-family models on quantized backends, the SSE stream periodically leaks <|im_end|> and the raw <tool_call>...</tool_call> XML into the assistant content channel before the parser converts it. Auto-recovery handles tool execution correctly, but the leaked tokens reach the assistant transcript.
-
Send /new in the same chat. The session-memory hook fires on command:new and writes ~/.openclaw/workspace-/memory/.md containing a "## Conversation Summary" section with the raw leaked tokens preserved verbatim. Example saved content (real, redacted):
assistant: Case subagent is active. Dispatching task.
<tool_call>
<function=sessions_send>
<parameter=message>
Write a Python script to parse a CSV file.
...
</tool_call><|im_end|>
assistant: <tool_call>
...
</tool_call><|im_end|>
-
Send any user message. The hook injects the saved memory file as bootstrap context. The model — seeing what looks like an in-progress chat template inside its conversation history — emits orphaned role tokens as plain content.
Example real output to /new after corruption (redacted):
[name]. What's the mission?
user
I need
assistant
Incomplete, [name]. Specify the task.
-
The hook saves THIS reply on the next /new as the new memory. Subsequent /new produces only:
NO_REPLY
user
NO_REPLY
assistant
NO_REPLY
user
NO_REPLY
-
Quarantining the bad memory file alone is not sufficient — the hook fires on the first /new after quarantine and saves the next malformed reply, re-establishing the poison. Only disabling the hook globally breaks the loop:
mv ~/.openclaw/workspace-/memory/.md{,.quarantine}
openclaw hooks disable session-memory
openclaw gateway restart
With the hook disabled, /new produces clean output and the agent recovers.
A synthetic minimum reproducer would be: configure any agent with the session-memory hook enabled, manually write a memory file containing literal <|im_end|> and assistant: / user: markers in the Conversation Summary section, send /new, observe that the model echoes the role-token pattern in its reply.
Expected behavior
The session-memory hook should sanitize captured assistant turns before persisting:
-
Strip all chat-template control tokens of the active model family from saved transcripts (<|im_end|>, <|endoftext|>, <|im_start|>, etc.). The hook already knows or can know the active model from the session metadata it writes; the corresponding template tokens are knowable.
-
Strip or escape any orphaned role markers (assistant:, user:, system: at line start) that are not part of the hook's own structural formatting.
-
Strip raw <tool_call>...</tool_call> blocks that reached the assistant content channel — these are parser-leak artifacts, not real conversation content. The actual structured tool call (if any) is already represented elsewhere in the session record.
-
Detect and skip turns whose content is dominated by such artifacts (e.g., turns that are >50% control tokens, or turns whose content equals only "NO_REPLY" or similar housekeeping conventions).
A turn that fails sanitization should either be saved as [malformed turn elided] or skipped entirely. Either is preferable to faithfully persisting and re-injecting the garbage.
Actual behavior
The hook captures whatever appears in the assistant content channel, including:
<|im_end|> and <|endoftext|> chat-template terminators leaked by upstream parser bugs
- Multiple
assistant: / user: role-marker lines on consecutive lines from a single turn
- Raw
<tool_call> XML that was supposed to be parsed and converted to structured tool_calls
- Single-token "NO_REPLY" completions that themselves resulted from prior poisoned context
It writes these verbatim under "## Conversation Summary" in the memory file. On the next /new, the session-memory hook injects the file content as context. The model interprets the embedded role-token scaffolding as an active chat template and continues the pattern, producing more of the same.
The loop is observable in the saved file timestamps: every /new produces a new memory file containing more degraded content than the previous one, until the agent emits only NO_REPLY.
Audit of our deployment (grep -l '<|im_end|>\|<tool_call>\|^NO_REPLY\|^assistant$\|^user$' ~/.openclaw/workspace-*/memory/*.md) found poisoned memory files in 5 of 6 active workspaces, spanning 4 days for one agent (Maelcum, our DevOps agent on the 9B model with a 3h heartbeat) where the heartbeat-driven /new cycle accumulated 5 progressively-degraded memory files between Apr 18 and Apr 22. Brain-tier agents (27B DWQ) were also affected, indicating this is not a small-quantized-model-only issue.
OpenClaw version
2026.4.14 (323493f)
Operating system
macOS 26.4.1 (Apple Silicon, M4 Pro)
Install method
Homebrew
Model
Both 9B and 27B Qwen3.5 family observed. Specific: - mlx-community/Qwen3.5-9B-OptiQ-4bit (Armitage, Maelcum, others' heartbeats) - mlx-community/Qwen3.5-27B-4bit-DWQ (Wintermute, Case, Finn — brain tier)
Provider / routing chain
Local rapid-mlx 0.3.12 servers: - vllm-fast/mlx-community/Qwen3.5-9B-OptiQ-4bit on http://localhost:8001 - vllm-brain/mlx-community/Qwen3.5-27B-4bit-DWQ on http://localhost:8000 OpenClaw provider definitions point to localhost. Fallbacks (openrouter/* models) not exercised during the failure — primary always returned 200 with malformed content, which is the trigger for this bug rather than a network/provider error.
Additional provider/model setup details
Both rapid-mlx servers run with:
--enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 --no-thinking --pin-system-prompt --stream-interval 5
The <|im_end|> and <tool_call> leakage into the assistant content channel is a known cosmetic SSE streaming-loop race in rapid-mlx — separately tracked to be filed against rapid-mlx. From OpenClaw's perspective the takeaway is: any provider that ever streams chat-template tokens into the content channel (including but not limited to rapid-mlx, including any future provider with the same parser race) will cause this hook to capture and persist them. Sanitization is the right layer to fix this, regardless of upstream parser behavior.
Logs, screenshots, and evidence
Three real, redacted memory-file excerpts attached as evidence files. All from the same Telegram-direct agent over a single evening:
evidence-1-armitage-memory-2026-04-22-0137.md — early poisoning, raw <tool_call> XML and <|im_end|> from prior turns
evidence-2-armitage-memory-2026-04-22-0152.md — full degradation, model emits "user / I need / assistant / Incomplete" role-token leakage
evidence-3-armitage-post-fix-screenshot.png — Telegram screenshot of NO_REPLY-only output before fix, and clean output after disabling the hook
Audit script demonstrating cross-workspace prevalence:
for d in ~/.openclaw/workspace-*/memory/; do
agent=$(basename $(dirname $d))
latest=$(ls -t $d 2>/dev/null | head -1)
[ -n "$latest" ] && grep -l '<|im_end|>\|<tool_call>\|^NO_REPLY\|^assistant$\|^user$' "$d$latest" >/dev/null 2>&1 && echo "POISONED: $agent → $latest"
done
Output (5 of 6 active workspaces):
POISONED: workspace-armitage → 2026-04-22-0152.md
POISONED: workspace-case → 2026-04-22-0137.md
POISONED: workspace-finn → 2026-04-22-0136.md
POISONED: workspace-maelcum → 2026-04-22-0136.md (+ 4 older files Apr 18-19)
POISONED: workspace-wintermute → 2026-04-22-0137.md
Impact and severity
Severity: High. Breaks all chat agents on quantized local models silently and progressively. Single-agent recovery requires manual intervention (file quarantine + global hook disable). No log line surfaces "the hook just saved chat template tokens"; symptom only appears on the next /new as malformed agent output.
Impact:
- Affects any deployment using the session-memory hook with a provider that occasionally streams chat-template tokens into the content channel. This is a structural risk for every quantized local model integration — not a bug in any one provider.
- Brain agents (larger models, more sophisticated outputs) are equally vulnerable in our deployment, refuting the assumption that this is small-model-only.
- Heartbeat-driven /new cycles compound the problem: each scheduled heartbeat saves another memory file, so an unattended agent silently degrades over hours/days. Our Maelcum agent (3h heartbeat) accumulated 5 progressively-poisoned files over ~4 days.
- Recovery loses all legitimate session memory for the affected agent (quarantining the file is the only safe option, since the file mixes valid summary content with embedded poison).
Workaround: openclaw hooks disable session-memory globally + manual file quarantine. No per-agent disable in the documented config schema (verified Apr 21 against /openclaw/openclaw Context7 docs), so the workaround disables memory for all agents, not just the affected ones.
Additional information
Not a duplicate of:
Related but separate (will file against the upstream provider, not OpenClaw):
- rapid-mlx SSE streaming leaks
<|im_end|> and <tool_call> XML into the content channel before the parser converts. This is the upstream cause of the malformed content, but the OpenClaw hook is what makes it persistent and recurring.
Suggested fix locations (from a quick read of the docs):
- src/hooks/bundled/session-memory/handler.ts — add a sanitization pass before writing to the memory file
- Sanitization should be model-family-aware: derive the chat-template token set from the active model's config and strip those tokens from saved content
- Plus a generic regex strip for orphaned role markers at line starts
- Plus a "skip turn" rule for content that is structurally malformed (>X% template tokens, or matches known housekeeping-only patterns like "NO_REPLY")
I'm happy to test a fix on our deployment — we're running the exact pattern that triggers this bug across 5 agents, so we can validate sanitization both reactively (against our quarantined files) and prospectively (by re-enabling the hook and watching whether new memory files stay clean).
evidence-1-armitage-memory-2026-04-22-0137.md
evidence-2-armitage-memory-2026-04-22-0152.md
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
The bundled session-memory hook saves raw model output to ~/.openclaw/workspace-/memory/.md without sanitizing chat-template control tokens (
<|im_end|>,<|endoftext|>), unmatched chat role markers (assistant:,user:), or unparsed<tool_call>XML emitted by quantized local models. The saved file is then re-injected as "Conversation Summary" context on the next /new or /reset, where the model interprets the embedded role markers as in-progress chat-template scaffolding and produces more malformed output (more orphaned role markers, NO_REPLY-only completions, or further raw tool-call XML). The hook then saves THAT malformed output as the next "memory" file, which gets re-injected on the following /new. Each /new degrades the agent further until it is functionally non-responsive. Disabling the hook globally (openclaw hooks disable session-memory) and quarantining the existing memory files is the only working workaround, since /new itself triggers re-injection.This is distinct from #42112 (orphaned tool_use blocks rejected by the provider API) — there the corruption is in the session JSONL transcript and the provider returns 400. Here the corruption is in the hook-managed memory file, persists across /new (which #42112 says fixes that bug), and propagates across agents because every agent has its own session-memory file accumulating its own poison.
Observed across 5 of 6 active agent workspaces in our deployment, including both 9B (mlx-community/Qwen3.5-9B-OptiQ-4bit on rapid-mlx) and 27B (mlx-community/Qwen3.5-27B-4bit-DWQ on rapid-mlx) brain agents. Both quantized local models periodically leak chat-template tokens during streaming; the hook captures and persists those leaks indiscriminately.
Steps to reproduce
Preconditions: agent on a quantized local model server (we used mlx-community/Qwen3.5-9B-OptiQ-4bit served by rapid-mlx 0.3.12 on port 8001, but the saved-memory pattern below is provider-agnostic).
With session-memory hook enabled (default):
openclaw hooks check
confirm: 💾 session-memory enabled
Trigger any agent reply that emits a tool call. With Qwen3-family models on quantized backends, the SSE stream periodically leaks
<|im_end|>and the raw<tool_call>...</tool_call>XML into the assistant content channel before the parser converts it. Auto-recovery handles tool execution correctly, but the leaked tokens reach the assistant transcript.Send /new in the same chat. The session-memory hook fires on
command:newand writes ~/.openclaw/workspace-/memory/.md containing a "## Conversation Summary" section with the raw leaked tokens preserved verbatim. Example saved content (real, redacted):assistant: Case subagent is active. Dispatching task.
<tool_call>
<function=sessions_send>
<parameter=message>
Write a Python script to parse a CSV file.
...
</tool_call><|im_end|>
assistant: <tool_call>
...
</tool_call><|im_end|>
Send any user message. The hook injects the saved memory file as bootstrap context. The model — seeing what looks like an in-progress chat template inside its conversation history — emits orphaned role tokens as plain content.
Example real output to /new after corruption (redacted):
[name]. What's the mission?
user
I need
assistant
Incomplete, [name]. Specify the task.
The hook saves THIS reply on the next /new as the new memory. Subsequent /new produces only:
NO_REPLY
user
NO_REPLY
assistant
NO_REPLY
user
NO_REPLY
Quarantining the bad memory file alone is not sufficient — the hook fires on the first /new after quarantine and saves the next malformed reply, re-establishing the poison. Only disabling the hook globally breaks the loop:
mv ~/.openclaw/workspace-/memory/.md{,.quarantine}
openclaw hooks disable session-memory
openclaw gateway restart
With the hook disabled, /new produces clean output and the agent recovers.
A synthetic minimum reproducer would be: configure any agent with the session-memory hook enabled, manually write a memory file containing literal
<|im_end|>andassistant:/user:markers in the Conversation Summary section, send /new, observe that the model echoes the role-token pattern in its reply.Expected behavior
The session-memory hook should sanitize captured assistant turns before persisting:
Strip all chat-template control tokens of the active model family from saved transcripts (
<|im_end|>,<|endoftext|>,<|im_start|>, etc.). The hook already knows or can know the active model from the session metadata it writes; the corresponding template tokens are knowable.Strip or escape any orphaned role markers (
assistant:,user:,system:at line start) that are not part of the hook's own structural formatting.Strip raw
<tool_call>...</tool_call>blocks that reached the assistant content channel — these are parser-leak artifacts, not real conversation content. The actual structured tool call (if any) is already represented elsewhere in the session record.Detect and skip turns whose content is dominated by such artifacts (e.g., turns that are >50% control tokens, or turns whose content equals only "NO_REPLY" or similar housekeeping conventions).
A turn that fails sanitization should either be saved as
[malformed turn elided]or skipped entirely. Either is preferable to faithfully persisting and re-injecting the garbage.Actual behavior
The hook captures whatever appears in the assistant content channel, including:
<|im_end|>and<|endoftext|>chat-template terminators leaked by upstream parser bugsassistant:/user:role-marker lines on consecutive lines from a single turn<tool_call>XML that was supposed to be parsed and converted to structured tool_callsIt writes these verbatim under "## Conversation Summary" in the memory file. On the next /new, the session-memory hook injects the file content as context. The model interprets the embedded role-token scaffolding as an active chat template and continues the pattern, producing more of the same.
The loop is observable in the saved file timestamps: every /new produces a new memory file containing more degraded content than the previous one, until the agent emits only NO_REPLY.
Audit of our deployment (
grep -l '<|im_end|>\|<tool_call>\|^NO_REPLY\|^assistant$\|^user$' ~/.openclaw/workspace-*/memory/*.md) found poisoned memory files in 5 of 6 active workspaces, spanning 4 days for one agent (Maelcum, our DevOps agent on the 9B model with a 3h heartbeat) where the heartbeat-driven /new cycle accumulated 5 progressively-degraded memory files between Apr 18 and Apr 22. Brain-tier agents (27B DWQ) were also affected, indicating this is not a small-quantized-model-only issue.OpenClaw version
2026.4.14 (323493f)
Operating system
macOS 26.4.1 (Apple Silicon, M4 Pro)
Install method
Homebrew
Model
Both 9B and 27B Qwen3.5 family observed. Specific: - mlx-community/Qwen3.5-9B-OptiQ-4bit (Armitage, Maelcum, others' heartbeats) - mlx-community/Qwen3.5-27B-4bit-DWQ (Wintermute, Case, Finn — brain tier)
Provider / routing chain
Local rapid-mlx 0.3.12 servers: - vllm-fast/mlx-community/Qwen3.5-9B-OptiQ-4bit on http://localhost:8001 - vllm-brain/mlx-community/Qwen3.5-27B-4bit-DWQ on http://localhost:8000 OpenClaw provider definitions point to localhost. Fallbacks (openrouter/* models) not exercised during the failure — primary always returned 200 with malformed content, which is the trigger for this bug rather than a network/provider error.
Additional provider/model setup details
Both rapid-mlx servers run with:
--enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 --no-thinking --pin-system-prompt --stream-interval 5
The
<|im_end|>and<tool_call>leakage into the assistant content channel is a known cosmetic SSE streaming-loop race in rapid-mlx — separately tracked to be filed against rapid-mlx. From OpenClaw's perspective the takeaway is: any provider that ever streams chat-template tokens into the content channel (including but not limited to rapid-mlx, including any future provider with the same parser race) will cause this hook to capture and persist them. Sanitization is the right layer to fix this, regardless of upstream parser behavior.Logs, screenshots, and evidence
Impact and severity
Severity: High. Breaks all chat agents on quantized local models silently and progressively. Single-agent recovery requires manual intervention (file quarantine + global hook disable). No log line surfaces "the hook just saved chat template tokens"; symptom only appears on the next /new as malformed agent output.
Impact:
Workaround:
openclaw hooks disable session-memoryglobally + manual file quarantine. No per-agent disable in the documented config schema (verified Apr 21 against /openclaw/openclaw Context7 docs), so the workaround disables memory for all agents, not just the affected ones.Additional information
Not a duplicate of:
Bug: persisted orphaned toolCall poisons session replay and makes chat agent stop responding #42112 — orphaned tool_use blocks in session JSONL cause provider to reject with 400. That bug is in the session transcript JSONL, manifests as the provider rejecting the request, and is fixed by /new (which starts a new transcript). The bug filed here is in the hook-managed memory FILE, persists across /new (because /new triggers the hook that re-injects the bad file), and only resolves when the hook itself is disabled. Different file, different mechanism, different recovery path.
Daily session reset silently discards context without memory flush or compaction #56072 — daily session reset doesn't trigger memory flush. That bug is about content NOT being saved when it should be. This bug is about content being saved that should NEVER be saved.
[Bug]: session-memory hook silently skipped for /new via Discord — command:new event not emitted #26293 — session-memory hook silently skipped on Discord because command:new not emitted. That bug is the hook not running. This bug is the hook running and persisting harmful content.
session-memory hook should support session:end events #31266, Feature Request: Automated Session Memory Preservation & Synthesis #40418 — feature requests for the hook to support more event types. Unrelated to content sanitization.
Related but separate (will file against the upstream provider, not OpenClaw):
<|im_end|>and<tool_call>XML into the content channel before the parser converts. This is the upstream cause of the malformed content, but the OpenClaw hook is what makes it persistent and recurring.Suggested fix locations (from a quick read of the docs):
I'm happy to test a fix on our deployment — we're running the exact pattern that triggers this bug across 5 agents, so we can validate sanitization both reactively (against our quarantined files) and prospectively (by re-enabling the hook and watching whether new memory files stay clean).
evidence-1-armitage-memory-2026-04-22-0137.md
evidence-2-armitage-memory-2026-04-22-0152.md