Summary
buildOpenAIResponsesParams forwards prompt_cache_key: options?.sessionId on every Responses API turn (src/agents/openai-transport-stream.ts:794), and parseTransportChunkUsage already reads prompt_tokens_details.cached_tokens on Completions responses (src/agents/openai-transport-stream.ts:1546).
buildOpenAICompletionsParams (src/agents/openai-transport-stream.ts:1486-1540) does not set prompt_cache_key or any other cache-routing field. Any model routed through openai-completions depends purely on the provider-side content hash for cache lookup, with no opt-in way for operators to pin a stable cache key across turns.
OpenAI's prompt caching guide documents prompt_cache_key as valid on chat.completions.create as well as responses.create, and several OpenAI-compatible providers (vLLM, SGLang, Chutes, and similar) honor it on the completions path when routed through a stable session.
Why this is not covered by #67427
#67427 added compat.supportsPromptCacheKey to gate the Responses API strip. That is the right mechanism, and it is already wired into resolveProviderRequestCapabilities, ModelCompatConfig, and the generated schema. But it only affects Responses API payloads. Operators who run OpenAI-compatible proxies on openai-completions get no knob today: the Completions builder simply never emits prompt_cache_key.
Proposal (conservative)
Extend buildOpenAICompletionsParams so that when compat.supportsPromptCacheKey === true AND cacheRetention !== "none" AND options?.sessionId is present, the field is forwarded:
if (
compat.supportsPromptCacheKey === true &&
resolveCacheRetention(options?.cacheRetention) !== "none" &&
options?.sessionId
) {
params.prompt_cache_key = options.sessionId;
}
Default behavior (undefined) stays as it is today: nothing emitted. This preserves the conservative default from #49877 / #48155 (Volcano Engine DeepSeek rejects unknown fields) and the scope boundary maintainers set in #67427.
Out of scope
- No new provider surface
- No change to default behavior
- No change to Responses API, Anthropic
cacheRetention, or Gemini cachedContents
- No change to tool assembly or system prompt shaping
Motivation
A sibling effort on a separate OpenClaw deployment measured, on turn 2+ of multi-turn sessions routed through an OpenAI-compatible provider that honors prompt_cache_key on completions:
- TTFT: ~22-31% lower
cached_tokens / prompt_tokens: ~0.98
- Pricing: cached input tokens at half rate vs fresh input
The current behavior on openai-completions leaves that on the table with no operator opt-in.
Before I open a PR
CONTRIBUTING.md says to start here first for anything that looks like a feature. Would maintainers prefer:
- The conservative variant above (default off, gated on the existing
compat.supportsPromptCacheKey), or
- A tri-state
compat.supportsPromptCacheKey that also covers completions when true, mirroring the existing Responses semantics, or
- Keep completions off that path entirely and point users at Responses API?
Happy to send the PR in whichever shape you prefer. Code-path pointers above are against main at 042c117342.
AI-assisted disclosure
Designed by a human, delivered with Claude Opus 4.7.
Summary
buildOpenAIResponsesParamsforwardsprompt_cache_key: options?.sessionIdon every Responses API turn (src/agents/openai-transport-stream.ts:794), andparseTransportChunkUsagealready readsprompt_tokens_details.cached_tokenson Completions responses (src/agents/openai-transport-stream.ts:1546).buildOpenAICompletionsParams(src/agents/openai-transport-stream.ts:1486-1540) does not setprompt_cache_keyor any other cache-routing field. Any model routed throughopenai-completionsdepends purely on the provider-side content hash for cache lookup, with no opt-in way for operators to pin a stable cache key across turns.OpenAI's prompt caching guide documents
prompt_cache_keyas valid onchat.completions.createas well asresponses.create, and several OpenAI-compatible providers (vLLM, SGLang, Chutes, and similar) honor it on the completions path when routed through a stable session.Why this is not covered by #67427
#67427 added
compat.supportsPromptCacheKeyto gate the Responses API strip. That is the right mechanism, and it is already wired intoresolveProviderRequestCapabilities,ModelCompatConfig, and the generated schema. But it only affects Responses API payloads. Operators who run OpenAI-compatible proxies onopenai-completionsget no knob today: the Completions builder simply never emitsprompt_cache_key.Proposal (conservative)
Extend
buildOpenAICompletionsParamsso that whencompat.supportsPromptCacheKey === trueANDcacheRetention !== "none"ANDoptions?.sessionIdis present, the field is forwarded:Default behavior (
undefined) stays as it is today: nothing emitted. This preserves the conservative default from #49877 / #48155 (Volcano Engine DeepSeek rejects unknown fields) and the scope boundary maintainers set in #67427.Out of scope
cacheRetention, or GeminicachedContentsMotivation
A sibling effort on a separate OpenClaw deployment measured, on turn 2+ of multi-turn sessions routed through an OpenAI-compatible provider that honors
prompt_cache_keyon completions:cached_tokens / prompt_tokens: ~0.98The current behavior on
openai-completionsleaves that on the table with no operator opt-in.Before I open a PR
CONTRIBUTING.md says to start here first for anything that looks like a feature. Would maintainers prefer:
compat.supportsPromptCacheKey), orcompat.supportsPromptCacheKeythat also covers completions whentrue, mirroring the existing Responses semantics, orHappy to send the PR in whichever shape you prefer. Code-path pointers above are against
mainat042c117342.AI-assisted disclosure
Designed by a human, delivered with Claude Opus 4.7.