fix(agents): honor explicit cacheRetention for openai-completions providers#82973
fix(agents): honor explicit cacheRetention for openai-completions providers#82973lonexreb wants to merge 4 commits into
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 27, 2026, 7:49 AM ET / 11:49 UTC. Summary PR surface: Source +28, Tests +204, Docs 0. Total +232 across 7 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model gpt-5.5, reasoning high; reviewed against 513a223c15f6. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +28, Tests +204, Docs 0. Total +232 across 7 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
…tions payload Codex P2 follow-up on PR openclaw#82973 / issue openclaw#81281. buildOpenAICompletionsParams previously only set prompt_cache_key when supportsPromptCacheKey was true and silently dropped the resolved cacheRetention=long preference. The wire payload reaching oMLX, llama.cpp, and other OpenAI-compatible completions backends therefore never carried the canonical 24h prompt_cache_retention value, even though the resolver returned long. Forward prompt_cache_retention=24h alongside prompt_cache_key whenever the caller opts into long retention on a compat-flagged backend. Short and unset retention continue to omit the field.
…tions payload Codex P2 follow-up on PR openclaw#82973 / issue openclaw#81281. buildOpenAICompletionsParams previously only set prompt_cache_key when supportsPromptCacheKey was true and silently dropped the resolved cacheRetention=long preference. The wire payload reaching oMLX, llama.cpp, and other OpenAI-compatible completions backends therefore never carried the canonical 24h prompt_cache_retention value, even though the resolver returned long. Forward prompt_cache_retention=24h alongside prompt_cache_key whenever the caller opts into long retention on a compat-flagged backend. Short and unset retention continue to omit the field.
cb0503e to
c479536
Compare
|
ClawSweeper PR egg 🥚 Incubating: this PR egg is tucked into the review nest. Hatch commandComment Hatchability rules:
What is this egg doing here?
|
Wrapper `resolveCacheRetention` returned undefined for non-anthropic-family, non-google providers, silently dropping the user's explicit `cacheRetention` value before it reached the openai-completions transport. This affected prefix-caching backends (oMLX, llama.cpp, etc.) that opt in via `compat.supportsPromptCacheKey: true`. Now honors any explicit "none" | "short" | "long" regardless of family while keeping legacy `cacheControlTtl` aliasing scoped to anthropic/google families (which is where it was previously meaningful). Fixes openclaw#81281.
…tions payload Codex P2 follow-up on PR openclaw#82973 / issue openclaw#81281. buildOpenAICompletionsParams previously only set prompt_cache_key when supportsPromptCacheKey was true and silently dropped the resolved cacheRetention=long preference. The wire payload reaching oMLX, llama.cpp, and other OpenAI-compatible completions backends therefore never carried the canonical 24h prompt_cache_retention value, even though the resolver returned long. Forward prompt_cache_retention=24h alongside prompt_cache_key whenever the caller opts into long retention on a compat-flagged backend. Short and unset retention continue to omit the field.
…or non-family providers Issue openclaw#82974 / regression in 81281 fix: the original removed the '!family && !googleEligible' early-return entirely so openai-completions backends with compat.supportsPromptCacheKey: true (oMLX, llama.cpp) could pass user-set cacheRetention through to the transport. But that also let providers proxying non-cacheable models via openai-completions (amazon-bedrock + amazon.* nova) leak the explicit value into payloads the backend cannot honor. Restore the family/google gate but extend it with a new supportsPromptCacheKey parameter. Callers in extra-params.ts read the flag from the model's compat.supportsPromptCacheKey === true. Without that flag, non-family/non-google providers still drop explicit cacheRetention as before — preserving the new regression test on main ('does not treat non-Anthropic Bedrock models as cache-retention eligible') while keeping the openclaw#81281 fix in place for actual prefix-caching backends. Also drop two unnecessary 'as Record<string, unknown>' casts in openai-transport-stream.test.ts that oxlint flagged as redundant in typescript-eslint/no-unnecessary-type-assertion.
ccde6e6 to
9cedb4a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9cedb4a818
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| provider: string, | ||
| modelApi?: string, | ||
| modelId?: string, | ||
| supportsPromptCacheKey?: boolean, |
There was a problem hiding this comment.
Pass cache-key eligibility to attempt-level retention
When a cache-key OpenAI-compatible completions model sets cacheRetention: "long", the stream wrapper now resolves and forwards long, but the attempt-level resolver call in src/agents/pi-embedded-runner/run/attempt.ts:3020 still omits this new supportsPromptCacheKey argument. In that scenario effectivePromptCacheRetention remains undefined, so the prompt-cache runtime context and cache observability built from it omit the 24h retention even though the outgoing payload uses it; cache-aware context engines can then make decisions with stale/missing TTL information. Please plumb the model compat flag through the attempt-level call as well.
Useful? React with 👍 / 👎.
Carry over #82973 and fix #81281 by preserving explicit cacheRetention for OpenAI-compatible completions providers that opt into prompt-cache-key support. The change keeps explicit cacheRetention suppressed for OpenAI-compatible providers without compat.supportsPromptCacheKey, adds regression coverage for both paths, and updates prompt-caching docs for prompt_cache_key / prompt_cache_retention behavior. Fixes #81281. Supersedes #82973. Co-authored-by: lonexreb <reach2shubhankar@gmail.com>
|
Thanks again @lonexreb. I carried this over and landed it through #87274 because this fork PR stopped syncing after maintainer fixups: the fork branch advanced, but GitHub kept Landed as 3e351b7 with your original commits credited via co-author. The landed change includes the OpenAI-compatible cache-retention fix, additional regression coverage for the Future PRs should be able to follow the normal maintainer-fixup path; this one just hit a GitHub PR-ref sync failure after the branch update. |
Carry over openclaw#82973 and fix openclaw#81281 by preserving explicit cacheRetention for OpenAI-compatible completions providers that opt into prompt-cache-key support. The change keeps explicit cacheRetention suppressed for OpenAI-compatible providers without compat.supportsPromptCacheKey, adds regression coverage for both paths, and updates prompt-caching docs for prompt_cache_key / prompt_cache_retention behavior. Fixes openclaw#81281. Supersedes openclaw#82973. Co-authored-by: lonexreb <reach2shubhankar@gmail.com>
Carry over openclaw#82973 and fix openclaw#81281 by preserving explicit cacheRetention for OpenAI-compatible completions providers that opt into prompt-cache-key support. The change keeps explicit cacheRetention suppressed for OpenAI-compatible providers without compat.supportsPromptCacheKey, adds regression coverage for both paths, and updates prompt-caching docs for prompt_cache_key / prompt_cache_retention behavior. Fixes openclaw#81281. Supersedes openclaw#82973. Co-authored-by: lonexreb <reach2shubhankar@gmail.com>
Carry over openclaw#82973 and fix openclaw#81281 by preserving explicit cacheRetention for OpenAI-compatible completions providers that opt into prompt-cache-key support. The change keeps explicit cacheRetention suppressed for OpenAI-compatible providers without compat.supportsPromptCacheKey, adds regression coverage for both paths, and updates prompt-caching docs for prompt_cache_key / prompt_cache_retention behavior. Fixes openclaw#81281. Supersedes openclaw#82973. Co-authored-by: lonexreb <reach2shubhankar@gmail.com>
Summary
resolveCacheRetentionin the embedded runner wrapper silently dropped an explicit user-providedcacheRetentionfor any provider that was neither in the anthropic family nor google-eligible. That meant openai-completions providers with prefix-caching backends (oMLX, llama.cpp, etc.) — the ones documented to opt in viacompat.supportsPromptCacheKey: true— never received the user's chosen retention. The wrapper omitted it, so the transport could not propagateprompt_cache_retention: "24h"for long-retention setups, and the explicit user intent was lost on the way tobuildOpenAICompletionsParams.Now any explicit
"none" | "short" | "long"is honored regardless of provider family. LegacycacheControlTtl("5m"/"1h") aliasing stays scoped to anthropic/google families where it was previously meaningful — that path was already gated by the family check.Fixes #81281.
Verification
CI=true pnpm test src/agents/pi-embedded-runner/prompt-cache-retention.test.ts— 6 passed (added regression coverage for openai-completions explicitlong/short/noneand for the undefined fallback that keeps embedded-wrapper behavior unchanged).src/agents/openai-transport-stream.test.tssuite (165 passing); the single pre-existing flake on the unrelatedyields to aborts during bursty Responses streamstest reproduces onorigin/mainwithout these changes and is not touched by this PR.Test plan
openai-completionsrun withcompat.supportsPromptCacheKey: trueandcacheRetention: "long"to confirmprompt_cache_keyandprompt_cache_retention: "24h"reach the outgoing payload end-to-end.Real behavior proof
Behavior addressed:
resolveCacheRetentioninsrc/agents/pi-embedded-runner/prompt-cache-retention.tssilently dropped an explicitcacheRetention: "long" | "short" | "none"whenever the provider was not in the anthropic family and not google-eligible. openai-completions providers that opt into prompt caching viacompat.supportsPromptCacheKey: truetherefore lost the user's intent before it reachedbuildOpenAICompletionsParams, so the outgoing payload never carriedprompt_cache_retention: "24h"for long retention (issue #81281).Real environment tested: branch
fix/81281-openai-prompt-cache-keyat HEADcb0503e19echecked out against this OpenClaw working copy, plus a verbatim re-implementation of theorigin/mainresolver (HEADaef93881af) to drive a before/after comparison throughnodedirectly against the on-branch module.Exact steps or command run after this patch:
The demo imports the on-branch
resolveCacheRetentionand a pure-JS port of theorigin/mainbody (preserving the originalif (!family && !googleEligible) return undefined;early return that caused the bug), then exercises the openai-completions opt-in scenario plus regression guards.Evidence after fix: live
nodestdout against the on-branch source (HEADcb0503e19e):(
{}isJSON.stringify({ resolved: undefined })— theresolvedkey is omitted when the function returnsundefined. The first block is the actual fix: a user-setcacheRetention: "long"against an openai-completions provider that previously resolved toundefinednow resolves to"long", which is the valuebuildOpenAICompletionsParamsneeds in order to emitprompt_cache_retention: "24h"downstream.)Observed result after fix: an openai-completions provider with
compat.supportsPromptCacheKey: trueandcacheRetention: "long"now receives the resolved"long"from the embedded-runner wrapper instead ofundefined. The downstream transport atsrc/agents/openai-transport-stream.ts:1717and:3009already maps that value ontoprompt_cache_retentionin the outgoing OpenAI-completions payload (see commitcb0503e19e). Anthropic-family and google-eligible callers still receive the same value they did onorigin/main; the legacycacheControlTtl: "1h"shortcut remains gated to those families so callers withoutcompat.supportsPromptCacheKeyare unchanged.What was not tested: end-to-end traffic against a live oMLX or llama.cpp
openai-completionsbackend with prompt caching enabled. The transport-side payload emission is covered by the existingsrc/agents/openai-transport-stream.test.tssuite (165 passing on this branch). A maintainer-side oMLX/llama.cpp live trace would be the natural next step and is what the Test plan TODO above flags.