Skip to content

[Bug]: 2026.2.15 breaks prompt cache for local model providers (llama-server, LM Studio/MLX) #19892

@aidiffuser

Description

@aidiffuser

Summary

Upgrading from 2026.2.14 → 2026.2.15 causes prompt cache invalidation on every turn for local model providers. The cache is trimmed/rebuilt from scratch on every message, even when no workspace files have changed and the conversation is short. Downgrading to 2026.2.14 immediately restores normal cache behavior.

Steps to reproduce

  • OpenClaw: 2026.2.15 (3fe22ea) — regression confirmed; 2026.2.14 works correctly
  • OS: macOS 26.3 (arm64), Mac Studio M3 Ultra 512GB
  • Channel: Telegram (DM, not a group chat)
  • Models tested:
    • MiniMax-M2.5 via LM Studio 0.4.2+2 (MLX backend)
    • Qwen3.5-397B-A17B via llama-server (llama.cpp built from latest main)

Reproduction

  1. Install openclaw@2026.2.15
  2. Start a fresh conversation via Telegram DM with a local model (LM Studio or llama-server)
  3. Send a simple message (e.g., "hi")
  4. Wait for the response
  5. Send another simple message (e.g., "what are you up to?")
  6. Observe: the entire prompt cache is invalidated and rebuilt from scratch

Expected behavior

The second message reuses the cached prompt prefix. Only new tokens (the user's latest message + assistant response) are processed. LM Studio shows no cache trim; llama-server shows restored context checkpoint.

Actual behavior

Every message triggers a full cache rebuild:

LM Studio (MiniMax-M2.5):

[cache_wrapper][INFO]: Trimmed 21283 tokens from the prompt cache
[minimax-m2.5] Prompt processing progress: 0.0%
[minimax-m2.5] Prompt processing progress: 2.4%
...

The trim amount is nearly identical every turn (~21,242–21,283 tokens) regardless of message content, even on a conversation under 30K tokens total. No workspace files were modified between messages.

llama-server (Qwen3.5-397B):

slot update_slots: id 3 | task 13076 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 3 | task 13076 | prompt processing progress, n_tokens = 2048, ...

Full cache wipe (memory_seq_rm [0, end)) on every turn. Context checkpoints are erased and never restored.

Verification

# Downgrade — cache works immediately
npm install -g openclaw@2026.2.14

# Upgrade — cache broken again
npm install -g openclaw@2026.2.15

No configuration changes, no workspace file edits, no LM Studio updates between tests. The only variable is the OpenClaw version.

Suspected cause

Two changes in 2026.2.15 modify per-turn prompt assembly:

  1. Group chat context injection (#14447): "Group chats: always inject group chat context (name, participants, reply guidance) into the system prompt on every turn, not just the first." — If this triggers for DM conversations or if the injected content varies per turn, it would invalidate the cache.

  2. Memory-flush Current time: line (#17603, #17633): "append a Current time: line to memory-flush turns" — The changelog claims this is "without making the system prompt time-variant," but if the time line leaks into normal (non-flush) turns, it would change the prompt every second.

Code references in pi-embedded-n26FO9Pa.js:

  • Line 36106: timeLine: \Current time: ${formattedTime} (${userTimezone})``
  • Line 23571: Build a persistent group-chat context block that is always included in the

Workaround

Downgrade to 2026.2.14:

npm install -g openclaw@2026.2.14

OpenClaw version

2026.2.15

Operating system

macOS

Install method

npm install -g openclaw@2026.2.15

Logs, screenshots, and evidence

Impact and severity

This is a severe performance regression for all local model users. Without prompt caching:

  • Every message requires full prompt processing from scratch
  • A 40K-token context takes ~200 seconds to process on llama-server (vs ~5 seconds with cache)
  • LM Studio/MLX similarly affected — multi-minute waits for simple replies
  • Makes local models essentially unusable for multi-turn conversations

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions