fix(agents): honor OpenAI-compatible cache retention#87274
Conversation
Wrapper `resolveCacheRetention` returned undefined for non-anthropic-family, non-google providers, silently dropping the user's explicit `cacheRetention` value before it reached the openai-completions transport. This affected prefix-caching backends (oMLX, llama.cpp, etc.) that opt in via `compat.supportsPromptCacheKey: true`. Now honors any explicit "none" | "short" | "long" regardless of family while keeping legacy `cacheControlTtl` aliasing scoped to anthropic/google families (which is where it was previously meaningful). Fixes #81281.
…tions payload Codex P2 follow-up on PR #82973 / issue #81281. buildOpenAICompletionsParams previously only set prompt_cache_key when supportsPromptCacheKey was true and silently dropped the resolved cacheRetention=long preference. The wire payload reaching oMLX, llama.cpp, and other OpenAI-compatible completions backends therefore never carried the canonical 24h prompt_cache_retention value, even though the resolver returned long. Forward prompt_cache_retention=24h alongside prompt_cache_key whenever the caller opts into long retention on a compat-flagged backend. Short and unset retention continue to omit the field.
…or non-family providers Issue #82974 / regression in 81281 fix: the original removed the '!family && !googleEligible' early-return entirely so openai-completions backends with compat.supportsPromptCacheKey: true (oMLX, llama.cpp) could pass user-set cacheRetention through to the transport. But that also let providers proxying non-cacheable models via openai-completions (amazon-bedrock + amazon.* nova) leak the explicit value into payloads the backend cannot honor. Restore the family/google gate but extend it with a new supportsPromptCacheKey parameter. Callers in extra-params.ts read the flag from the model's compat.supportsPromptCacheKey === true. Without that flag, non-family/non-google providers still drop explicit cacheRetention as before — preserving the new regression test on main ('does not treat non-Anthropic Bedrock models as cache-retention eligible') while keeping the #81281 fix in place for actual prefix-caching backends. Also drop two unnecessary 'as Record<string, unknown>' casts in openai-transport-stream.test.ts that oxlint flagged as redundant in typescript-eslint/no-unnecessary-type-assertion.
|
ClawSweeper status: review started. I am starting a fresh review of this pull request: fix(agents): honor OpenAI-compatible cache retention This is item 1/1 in the current shard. Shard 0/1. This placeholder means the worker is alive and reading the current context. I will edit this same comment with the actual review when the claws are done clicking. Crustacean status: shell secured, claws on keyboard, evidence pebbles being sorted. |
|
Verification update before merge:
Known gap: no live oMLX/llama.cpp backend payload trace; covered by transport/extra-param regression tests and docs proof. |
|
Really solid compatibility fix. I especially like that this doesn’t just blindly pass through The regression coverage is also very well thought out — testing both the positive and negative paths here is huge for long-term stability, especially in multi-provider systems where adapter behavior tends to drift over time. Also appreciate that the docs were updated alongside the runtime behavior. Keeping compatibility docs aligned with actual transport semantics saves a surprising amount of future debugging pain. Overall this feels like a very clean restoration of expected behavior rather than a narrow patch. |
Summary
cacheRetentionpassthrough only whencompat.supportsPromptCacheKeyis set, plus the negative path when it is not.prompt_cache_key/prompt_cache_retention.Fixes #81281.
Supersedes #82973 because GitHub stopped syncing the updated fork branch after maintainer fixups; the fork ref is at the fixed SHA, but the PR ref stayed pinned to the prior red SHA.
Verification
pnpm test src/agents/pi-embedded-runner/prompt-cache-retention.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/openai-transport-stream.test.ts -- --reporter=verbosepnpm check:test-typespnpm check:docsgit diff --check/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode localReal behavior proof
Behavior addressed: OpenAI-compatible completions providers with
compat.supportsPromptCacheKey: truenow keep explicitcacheRetention: "long"through extra params and emitprompt_cache_retention: "24h"withprompt_cache_key.Real environment tested: local OpenClaw source checkout on Node/pnpm plus GitHub CI on the original PR SHA before the fork PR ref stopped syncing.
Exact steps or command run after this patch: focused Vitest command above,
pnpm check:test-types,pnpm check:docs,git diff --check, and local autoreview.Evidence after fix: focused Vitest passed 327 tests;
check:test-typespassed; docs formatting, markdownlint, MDX, i18n glossary, and link audit passed withbroken_links=0; autoreview reported no actionable findings.Observed result after fix: regression tests prove explicit cache retention reaches prompt-cache-key OpenAI-compatible completions providers and stays suppressed for providers without that compat flag.
What was not tested: live oMLX/llama.cpp backend payload trace.
Co-authored-by: lonexreb reach2shubhankar@gmail.com