Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).
Steps to reproduce
- Deploy OpenClaw with an openai-completions provider configured with compat.supportsPromptCacheKey: true and cacheRetention: "long".
- Send repeated identical prompts through OpenClaw → oMLX (or any completions backend with prefix caching).
- Observe cached_tokens in backend responses — always 0 on 2026.5.x.
- Downgrade to 2026.3.2 with identical config and hardware.
- Repeat same prompts — cached_tokens correctly populate from request 2 onward.
Expected behavior
As observed in 2026.3.2: outgoing openai-completions requests include prompt_cache_key, and the backend reports cached_tokens > 0 on repeated requests with identical prefixes.
Actual behavior
On 2026.5.x: prompt_cache_key is absent from outgoing requests. Backend reports cached_tokens: 0 on every request. Downgrading to 2026.3.2 restores caching immediately with no config changes.
OpenClaw version
2026.5.7
Operating system
Linux (Docker container/portainer on UGREEN NAS)
Install method
docker
Model
omlx/local_model (Qwen3.6-35B-A3B-RotorQuant-MLX-4bit via oMLX)
Provider / routing chain
openclaw → oMLX (openai-completions, http://cerebro-mac:8080/v1)
Additional provider/model setup details
Provider api: openai-completions
compat.supportsPromptCacheKey: true
cacheRetention: "long" (set at defaults, model, and per-model override levels)
contextInjection: "continuation-skip"
API keys and URLs redacted.
Same config file used on both 2026.3.2 (working) and 2026.5.7 (broken).
oMLX backend confirmed working: direct repeated requests to the same endpoint
produce cached_tokens > 0 from request 2 onward, bypassing OpenClaw entirely.
Logs, screenshots, and evidence
oMLX dashboard on 2026.5.7:
Total Prefill Tokens: 748,811
Cached Tokens: 0
Cache Efficiency: 0.0%
Direct requests to same backend (bypassing OpenClaw):
Request 1: cached_tokens = 0
Request 2: cached_tokens = 71680
Request 3: cached_tokens = 71680
Outgoing request keys observed from OpenClaw on 2026.5.7:
model, messages, stream, max_completion_tokens, tools, reasoning_effort, metadata
(prompt_cache_key absent)
2026.3.2 tested on same UGREEN NAS hardware: caching works correctly.
No config changes between versions.
Impact and severity
Affected: any user of openai-completions with a prefix-caching-capable backend (oMLX, llama.cpp, etc.)
Severity: high — defeats prefix caching entirely, causing full prefill on every request
Frequency: 100% reproducible on 2026.5.7, never occurs on 2026.3.2
Consequence: significantly increased latency and compute cost per request;
on local hardware this is the difference between ~3s and ~60s TTFT for long contexts.
Additional information
Last known good version: 2026.3.2
First known bad version: 2026.5.x (exact first-bad version not tested between 2026.3.2 and 2026.5.7)
No workaround found on 2026.5.x short of downgrading.
Related: #69272, PR #69411 — those addressed the transport condition;
this regression suggests something in that chain changed between 2026.3.2 and 2026.5.x.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).
Steps to reproduce
Expected behavior
As observed in 2026.3.2: outgoing openai-completions requests include prompt_cache_key, and the backend reports cached_tokens > 0 on repeated requests with identical prefixes.
Actual behavior
On 2026.5.x: prompt_cache_key is absent from outgoing requests. Backend reports cached_tokens: 0 on every request. Downgrading to 2026.3.2 restores caching immediately with no config changes.
OpenClaw version
2026.5.7
Operating system
Linux (Docker container/portainer on UGREEN NAS)
Install method
docker
Model
omlx/local_model (Qwen3.6-35B-A3B-RotorQuant-MLX-4bit via oMLX)
Provider / routing chain
openclaw → oMLX (openai-completions, http://cerebro-mac:8080/v1)
Additional provider/model setup details
Provider api: openai-completions
compat.supportsPromptCacheKey: true
cacheRetention: "long" (set at defaults, model, and per-model override levels)
contextInjection: "continuation-skip"
API keys and URLs redacted.
Same config file used on both 2026.3.2 (working) and 2026.5.7 (broken).
oMLX backend confirmed working: direct repeated requests to the same endpoint
produce cached_tokens > 0 from request 2 onward, bypassing OpenClaw entirely.
Logs, screenshots, and evidence
Impact and severity
Affected: any user of openai-completions with a prefix-caching-capable backend (oMLX, llama.cpp, etc.)
Severity: high — defeats prefix caching entirely, causing full prefill on every request
Frequency: 100% reproducible on 2026.5.7, never occurs on 2026.3.2
Consequence: significantly increased latency and compute cost per request;
on local hardware this is the difference between ~3s and ~60s TTFT for long contexts.
Additional information
Last known good version: 2026.3.2
First known bad version: 2026.5.x (exact first-bad version not tested between 2026.3.2 and 2026.5.7)
No workaround found on 2026.5.x short of downgrading.
Related: #69272, PR #69411 — those addressed the transport condition;
this regression suggests something in that chain changed between 2026.3.2 and 2026.5.x.