-
-
Notifications
You must be signed in to change notification settings - Fork 52.8k
Description
Summary
Upgrading from 2026.2.14 → 2026.2.15 causes prompt cache invalidation on every turn for local model providers. The cache is trimmed/rebuilt from scratch on every message, even when no workspace files have changed and the conversation is short. Downgrading to 2026.2.14 immediately restores normal cache behavior.
Steps to reproduce
- OpenClaw: 2026.2.15 (3fe22ea) — regression confirmed; 2026.2.14 works correctly
- OS: macOS 26.3 (arm64), Mac Studio M3 Ultra 512GB
- Channel: Telegram (DM, not a group chat)
- Models tested:
- MiniMax-M2.5 via LM Studio 0.4.2+2 (MLX backend)
- Qwen3.5-397B-A17B via llama-server (llama.cpp built from latest main)
Reproduction
- Install
openclaw@2026.2.15 - Start a fresh conversation via Telegram DM with a local model (LM Studio or llama-server)
- Send a simple message (e.g., "hi")
- Wait for the response
- Send another simple message (e.g., "what are you up to?")
- Observe: the entire prompt cache is invalidated and rebuilt from scratch
Expected behavior
The second message reuses the cached prompt prefix. Only new tokens (the user's latest message + assistant response) are processed. LM Studio shows no cache trim; llama-server shows restored context checkpoint.
Actual behavior
Every message triggers a full cache rebuild:
LM Studio (MiniMax-M2.5):
[cache_wrapper][INFO]: Trimmed 21283 tokens from the prompt cache
[minimax-m2.5] Prompt processing progress: 0.0%
[minimax-m2.5] Prompt processing progress: 2.4%
...
The trim amount is nearly identical every turn (~21,242–21,283 tokens) regardless of message content, even on a conversation under 30K tokens total. No workspace files were modified between messages.
llama-server (Qwen3.5-397B):
slot update_slots: id 3 | task 13076 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 3 | task 13076 | prompt processing progress, n_tokens = 2048, ...
Full cache wipe (memory_seq_rm [0, end)) on every turn. Context checkpoints are erased and never restored.
Verification
# Downgrade — cache works immediately
npm install -g openclaw@2026.2.14
# Upgrade — cache broken again
npm install -g openclaw@2026.2.15No configuration changes, no workspace file edits, no LM Studio updates between tests. The only variable is the OpenClaw version.
Suspected cause
Two changes in 2026.2.15 modify per-turn prompt assembly:
-
Group chat context injection (#14447): "Group chats: always inject group chat context (name, participants, reply guidance) into the system prompt on every turn, not just the first." — If this triggers for DM conversations or if the injected content varies per turn, it would invalidate the cache.
-
Memory-flush
Current time:line (#17603, #17633): "append aCurrent time:line to memory-flush turns" — The changelog claims this is "without making the system prompt time-variant," but if the time line leaks into normal (non-flush) turns, it would change the prompt every second.
Code references in pi-embedded-n26FO9Pa.js:
- Line 36106:
timeLine: \Current time: ${formattedTime} (${userTimezone})`` - Line 23571:
Build a persistent group-chat context block that is always included in the
Workaround
Downgrade to 2026.2.14:
npm install -g openclaw@2026.2.14OpenClaw version
2026.2.15
Operating system
macOS
Install method
npm install -g openclaw@2026.2.15
Logs, screenshots, and evidence
Impact and severity
This is a severe performance regression for all local model users. Without prompt caching:
- Every message requires full prompt processing from scratch
- A 40K-token context takes ~200 seconds to process on llama-server (vs ~5 seconds with cache)
- LM Studio/MLX similarly affected — multi-minute waits for simple replies
- Makes local models essentially unusable for multi-turn conversations
Additional information
No response