Respect cacheRetention for OpenRouter Anthropic models#42961
Conversation
Greptile SummaryThis PR fixes a real bug where Key changes:
Confidence Score: 4/5
Last reviewed commit: d0e571c |
Anthropic models provided via OpenRouter have had caching of the system prompt enabled similarly to those provided directly via Anthropic. But they didn't respect the cacheRetention setting, instead always adding a 5 minute cache_control marker (i.e. the "short" option), even if cacheRetention was explicitly off. The setting is now respected, using 1h ttl for the "long" option or disabling cache on "none". The default behavior (cacheRetention not specified) is the "short" cache, like the direct Anthropic models.
95bac60 to
b27a02d
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Close as superseded: the reported cache-retention bug is real, but this branch is now stale, merge-conflicting, and targets an older wrapper structure that would lose current endpoint-class OpenRouter gating; the same remaining work is now tracked in a newer current-architecture PR. So I’m closing this here and keeping the remaining discussion on the canonical linked item. Review detailsBest possible solution: Land a current-main fix that preserves endpoint-class OpenRouter gating while making explicit Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible. Configure Is this the best way to solve the issue? No. The branch was reasonable when written, but current main moved the wrapper boundary to endpoint-class gating; the safer path is the newer current-architecture PR rather than rebasing this stale branch directly. Security review: Security review cleared: The diff changes provider request payload cache markers and tests only, with no new dependencies, workflows, permissions, downloads, secrets handling, or code execution surface. What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 3ba2ab7a0950. |
Summary
#17473 introduced caching of the system prompt for Anthropic models provided via OpenRouter similarly to those provided directly via Anthropic. But that implementation doesn't respect the cacheRetention setting, instead always adding a 5 minute cache_control marker (i.e. the "short" option), even if cacheRetention was explicitly off. The "long" option would be very useful to keep the cache warm in heartbeats and save up to 90% of costs.
This PR checks the cacheRetention setting for OpenRouter Anthropic before setting
cache_control(addingttl: "1h"for the "long" option, as per the OpenRouter docs, or disabling cache on "none"). The default behavior (cacheRetention not specified) is the "short" cache, like the direct Anthropic models.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
The cacheRetention setting is now respected for Anthropic models provided via OpenRouter.
Security Impact (required)
Repro + Verification
Environment
Steps
agents.defaults.models["openrouter/anthropic/<any>"].params.cacheRetentionto "long"Expected
Actual
Evidence
See below.
Human Verification (required)
I've observed the broken behavior (described above) in the OpenRouter logs (30m heartbeats or requests >5m apart costing full price). With these changes, cache discounts are applied for these requests.
Also added unit tests for the new behavior.
Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Failure Recovery (if this breaks)
Just revert.
Risks and Mitigations
None