Skip to content

Commit 3e351b7

Browse files
steipetelonexreb
andauthored
fix(agents): honor OpenAI-compatible cache retention
Carry over #82973 and fix #81281 by preserving explicit cacheRetention for OpenAI-compatible completions providers that opt into prompt-cache-key support. The change keeps explicit cacheRetention suppressed for OpenAI-compatible providers without compat.supportsPromptCacheKey, adds regression coverage for both paths, and updates prompt-caching docs for prompt_cache_key / prompt_cache_retention behavior. Fixes #81281. Supersedes #82973. Co-authored-by: lonexreb <reach2shubhankar@gmail.com>
1 parent 517ce3d commit 3e351b7

7 files changed

Lines changed: 237 additions & 5 deletions

File tree

docs/reference/prompt-caching.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,8 +104,8 @@ Per-agent heartbeat is supported at `agents.list[].heartbeat`.
104104
### OpenAI (direct API)
105105

106106
- Prompt caching is automatic on supported recent models. OpenClaw does not need to inject block-level cache markers.
107-
- OpenClaw uses `prompt_cache_key` to keep cache routing stable across turns and uses `prompt_cache_retention: "24h"` only when `cacheRetention: "long"` is selected on direct OpenAI hosts.
108-
- OpenAI-compatible Completions providers receive `prompt_cache_key` only when their model config explicitly sets `compat.supportsPromptCacheKey: true`; `cacheRetention: "none"` still suppresses it.
107+
- OpenClaw uses `prompt_cache_key` to keep cache routing stable across turns. Direct OpenAI hosts use `prompt_cache_retention: "24h"` when `cacheRetention: "long"` is selected.
108+
- OpenAI-compatible Completions providers receive `prompt_cache_key` only when their model config explicitly sets `compat.supportsPromptCacheKey: true`; with that same opt-in, explicit `cacheRetention: "long"` also forwards `prompt_cache_retention: "24h"`, and `cacheRetention: "none"` suppresses both fields.
109109
- OpenAI responses expose cached prompt tokens via `usage.prompt_tokens_details.cached_tokens` (or `input_tokens_details.cached_tokens` on Responses API events). OpenClaw maps that to `cacheRead`.
110110
- OpenAI does not expose a separate cache-write token counter, so `cacheWrite` stays `0` on OpenAI paths even when the provider is warming a cache.
111111
- OpenAI returns useful tracing and rate-limit headers such as `x-request-id`, `openai-processing-ms`, and `x-ratelimit-*`, but cache-hit accounting should come from the usage payload, not from headers.

src/agents/openai-transport-stream.test.ts

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4637,6 +4637,67 @@ describe("openai transport stream", () => {
46374637
expect(notOptedIn.prompt_cache_key).toBeUndefined();
46384638
});
46394639

4640+
it("emits prompt_cache_retention=24h for completions when cacheRetention is long", () => {
4641+
const model = {
4642+
id: "custom-model",
4643+
name: "Custom Model",
4644+
api: "openai-completions",
4645+
provider: "custom-cpa",
4646+
baseUrl: "https://proxy.example.com/v1",
4647+
compat: { supportsPromptCacheKey: true },
4648+
reasoning: false,
4649+
input: ["text"],
4650+
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
4651+
contextWindow: 32768,
4652+
maxTokens: 8192,
4653+
} as unknown as Model<"openai-completions">;
4654+
const context = {
4655+
systemPrompt: "system",
4656+
messages: [],
4657+
tools: [],
4658+
} as never;
4659+
4660+
const longRetention = buildOpenAICompletionsParams(model, context, {
4661+
sessionId: "session-123",
4662+
cacheRetention: "long",
4663+
}) as { prompt_cache_key?: string; prompt_cache_retention?: string };
4664+
4665+
expect(longRetention.prompt_cache_key).toBe("session-123");
4666+
expect(longRetention.prompt_cache_retention).toBe("24h");
4667+
});
4668+
4669+
it("omits prompt_cache_retention for completions when cacheRetention is short or unset", () => {
4670+
const model = {
4671+
id: "custom-model",
4672+
name: "Custom Model",
4673+
api: "openai-completions",
4674+
provider: "custom-cpa",
4675+
baseUrl: "https://proxy.example.com/v1",
4676+
compat: { supportsPromptCacheKey: true },
4677+
reasoning: false,
4678+
input: ["text"],
4679+
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
4680+
contextWindow: 32768,
4681+
maxTokens: 8192,
4682+
} as unknown as Model<"openai-completions">;
4683+
const context = {
4684+
systemPrompt: "system",
4685+
messages: [],
4686+
tools: [],
4687+
} as never;
4688+
4689+
const shortRetention = buildOpenAICompletionsParams(model, context, {
4690+
sessionId: "session-123",
4691+
cacheRetention: "short",
4692+
});
4693+
const defaultRetention = buildOpenAICompletionsParams(model, context, {
4694+
sessionId: "session-123",
4695+
});
4696+
4697+
expect(shortRetention).not.toHaveProperty("prompt_cache_retention");
4698+
expect(defaultRetention).not.toHaveProperty("prompt_cache_retention");
4699+
});
4700+
46404701
it("sorts Chat Completions tools by function name for stable prompt-cache payloads", () => {
46414702
const model = {
46424703
id: "custom-model",

src/agents/openai-transport-stream.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3499,6 +3499,15 @@ export function buildOpenAICompletionsParams(
34993499
}
35003500
if (compat.supportsPromptCacheKey && cacheRetention !== "none" && options?.sessionId) {
35013501
params.prompt_cache_key = options.sessionId;
3502+
// When the caller explicitly opted into long retention, forward the
3503+
// canonical prompt_cache_retention value alongside the cache key so
3504+
// OpenAI-compatible completions backends (oMLX, llama.cpp, official
3505+
// OpenAI, etc.) can honor the 24h prefix-cache lifetime. Without this
3506+
// the key reaches the wire but the retention preference is silently
3507+
// dropped (issue #81281).
3508+
if (cacheRetention === "long") {
3509+
params.prompt_cache_retention = "24h";
3510+
}
35023511
}
35033512
if (options?.temperature !== undefined) {
35043513
params.temperature = options.temperature;

src/agents/pi-embedded-runner-extraparams.test.ts

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2647,6 +2647,55 @@ describe("applyExtraParamsToAgent", () => {
26472647
expect(calls[0]?.cacheRetention).toBe("long");
26482648
});
26492649

2650+
it("passes through explicit cacheRetention for prompt-cache-key openai-completions providers", () => {
2651+
const { calls, agent } = createOptionsCaptureAgent();
2652+
const cfg = buildModelConfig("omlx-local/local_model", {
2653+
cacheRetention: "long",
2654+
});
2655+
2656+
applyExtraParamsToAgent(agent, cfg, "omlx-local", "local_model");
2657+
2658+
const model = {
2659+
api: "openai-completions",
2660+
provider: "omlx-local",
2661+
id: "local_model",
2662+
compat: { supportsPromptCacheKey: true },
2663+
} as unknown as Model<"openai-completions">;
2664+
const context: Context = { messages: [] };
2665+
2666+
void agent.streamFn?.(model, context, {
2667+
sessionId: "session-81281",
2668+
});
2669+
2670+
expect(calls).toHaveLength(1);
2671+
expect(calls[0]?.cacheRetention).toBe("long");
2672+
expect(calls[0]?.sessionId).toBe("session-81281");
2673+
});
2674+
2675+
it("keeps explicit cacheRetention off openai-completions providers without prompt-cache-key support", () => {
2676+
const { calls, agent } = createOptionsCaptureAgent();
2677+
const cfg = buildModelConfig("omlx-local/local_model", {
2678+
cacheRetention: "long",
2679+
});
2680+
2681+
applyExtraParamsToAgent(agent, cfg, "omlx-local", "local_model");
2682+
2683+
const model = {
2684+
api: "openai-completions",
2685+
provider: "omlx-local",
2686+
id: "local_model",
2687+
} as Model<"openai-completions">;
2688+
const context: Context = { messages: [] };
2689+
2690+
void agent.streamFn?.(model, context, {
2691+
sessionId: "session-81281",
2692+
});
2693+
2694+
expect(calls).toHaveLength(1);
2695+
expect(calls[0]?.cacheRetention).toBeUndefined();
2696+
expect(calls[0]?.sessionId).toBe("session-81281");
2697+
});
2698+
26502699
it("passes through explicit cacheRetention for custom anthropic-messages providers", () => {
26512700
const { calls, agent } = createOptionsCaptureAgent();
26522701
const cfg = {

src/agents/pi-embedded-runner/extra-params.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,11 +494,20 @@ function createStreamFnWithExtraParams(
494494
streamParams.seed = resolvedSeed;
495495
}
496496

497+
const readSupportsPromptCacheKey = (m: unknown): boolean => {
498+
const compat = (m as { compat?: unknown })?.compat;
499+
if (!compat || typeof compat !== "object") {
500+
return false;
501+
}
502+
return (compat as Record<string, unknown>).supportsPromptCacheKey === true;
503+
};
504+
497505
const initialCacheRetention = resolveCacheRetention(
498506
extraParams,
499507
provider,
500508
typeof model?.api === "string" ? model.api : undefined,
501509
typeof model?.id === "string" ? model.id : undefined,
510+
readSupportsPromptCacheKey(model),
502511
);
503512
if (Object.keys(streamParams).length > 0 || initialCacheRetention) {
504513
const debugParams = initialCacheRetention
@@ -514,6 +523,7 @@ function createStreamFnWithExtraParams(
514523
provider,
515524
typeof callModel.api === "string" ? callModel.api : undefined,
516525
typeof callModel.id === "string" ? callModel.id : undefined,
526+
readSupportsPromptCacheKey(callModel),
517527
);
518528
const hasStreamParams = Object.keys(streamParams).length > 0 || cacheRetention;
519529
if (!hasStreamParams) {

src/agents/pi-embedded-runner/prompt-cache-retention.test.ts

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,100 @@ describe("prompt cache retention", () => {
3030
).toBeUndefined();
3131
});
3232

33+
it("passes explicit cacheRetention through for openai-completions providers when supportsPromptCacheKey (issue #81281)", () => {
34+
// Regression: openai-completions providers with prefix-caching backends
35+
// (oMLX, llama.cpp, etc.) set compat.supportsPromptCacheKey: true and
36+
// cacheRetention: "long" but the wrapper was silently dropping the
37+
// user's explicit cacheRetention because the provider is neither in the
38+
// anthropic family nor google-eligible.
39+
expect(
40+
resolveCacheRetention(
41+
{ cacheRetention: "long" },
42+
"omlx-local",
43+
"openai-completions",
44+
"local_model",
45+
true,
46+
),
47+
).toBe("long");
48+
expect(
49+
resolveCacheRetention(
50+
{ cacheRetention: "short" },
51+
"omlx-local",
52+
"openai-completions",
53+
"local_model",
54+
true,
55+
),
56+
).toBe("short");
57+
expect(
58+
resolveCacheRetention(
59+
{ cacheRetention: "none" },
60+
"omlx-local",
61+
"openai-completions",
62+
"local_model",
63+
true,
64+
),
65+
).toBe("none");
66+
});
67+
68+
it("does not honor explicit cacheRetention for openai-completions without supportsPromptCacheKey", () => {
69+
// Providers that route via openai-completions but do not advertise prompt
70+
// caching (e.g. amazon-bedrock proxying amazon.* nova models) must keep
71+
// the explicit cacheRetention from leaking into the outgoing payload.
72+
expect(
73+
resolveCacheRetention(
74+
{ cacheRetention: "long" },
75+
"amazon-bedrock",
76+
"openai-completions",
77+
"amazon.nova-micro-v1:0",
78+
),
79+
).toBeUndefined();
80+
expect(
81+
resolveCacheRetention(
82+
{ cacheRetention: "long" },
83+
"omlx-local",
84+
"openai-completions",
85+
"local_model",
86+
false,
87+
),
88+
).toBeUndefined();
89+
});
90+
91+
it("returns undefined for openai-completions without explicit cacheRetention", () => {
92+
// Without an explicit user choice, openai-completions providers fall back
93+
// to the transport-level default ("short") rather than receiving a
94+
// wrapper-injected value.
95+
expect(
96+
resolveCacheRetention(undefined, "omlx-local", "openai-completions", "local_model", true),
97+
).toBeUndefined();
98+
expect(
99+
resolveCacheRetention({}, "omlx-local", "openai-completions", "local_model", true),
100+
).toBeUndefined();
101+
});
102+
103+
it("does not map legacy cacheControlTtl for openai-completions prompt-cache-key providers", () => {
104+
// Legacy TTL aliases were Anthropic/Google semantics; OpenAI-compatible
105+
// completions providers need an explicit cacheRetention value before the
106+
// wrapper forwards retention to the transport.
107+
expect(
108+
resolveCacheRetention(
109+
{ cacheControlTtl: "1h" },
110+
"omlx-local",
111+
"openai-completions",
112+
"local_model",
113+
true,
114+
),
115+
).toBeUndefined();
116+
expect(
117+
resolveCacheRetention(
118+
{ cacheControlTtl: "5m" },
119+
"omlx-local",
120+
"openai-completions",
121+
"local_model",
122+
true,
123+
),
124+
).toBeUndefined();
125+
});
126+
33127
it("identifies supported direct Google cache families", () => {
34128
expect(
35129
isGooglePromptCacheEligible({

src/agents/pi-embedded-runner/prompt-cache-retention.ts

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ export function resolveCacheRetention(
1919
provider: string,
2020
modelApi?: string,
2121
modelId?: string,
22+
supportsPromptCacheKey?: boolean,
2223
): CacheRetention | undefined {
2324
const hasExplicitCacheConfig =
2425
extraParams?.cacheRetention !== undefined || extraParams?.cacheControlTtl !== undefined;
@@ -29,8 +30,16 @@ export function resolveCacheRetention(
2930
hasExplicitCacheConfig,
3031
});
3132
const googleEligible = isGooglePromptCacheEligible({ modelApi, modelId });
33+
// OpenAI-compatible completions backends (oMLX, llama.cpp, etc.) opt into
34+
// prompt caching via `compat.supportsPromptCacheKey: true`. Without that
35+
// flag they sit outside the anthropic/google family gates, so issue #81281
36+
// dropped the user's explicit `cacheRetention` before the transport layer
37+
// could emit it. Proxies that route non-cacheable models via the same
38+
// openai-completions wire (amazon-bedrock + amazon.* nova models) leave
39+
// the flag unset, so the existing family gate still applies to them.
40+
const cacheKeyEligible = supportsPromptCacheKey === true;
3241

33-
if (!family && !googleEligible) {
42+
if (!family && !googleEligible && !cacheKeyEligible) {
3443
return undefined;
3544
}
3645

@@ -40,10 +49,10 @@ export function resolveCacheRetention(
4049
}
4150

4251
const legacy = extraParams?.cacheControlTtl;
43-
if (legacy === "5m") {
52+
if (legacy === "5m" && (family || googleEligible)) {
4453
return "short";
4554
}
46-
if (legacy === "1h") {
55+
if (legacy === "1h" && (family || googleEligible)) {
4756
return "long";
4857
}
4958

0 commit comments

Comments
 (0)