Respect cacheRetention for OpenRouter Anthropic models#437
Open
BingqingLyu wants to merge 1 commit into
Open
Conversation
Anthropic models provided via OpenRouter have had caching of the system prompt enabled similarly to those provided directly via Anthropic. But they didn't respect the cacheRetention setting, instead always adding a 5 minute cache_control marker (i.e. the "short" option), even if cacheRetention was explicitly off. The setting is now respected, using 1h ttl for the "long" option or disabling cache on "none". The default behavior (cacheRetention not specified) is the "short" cache, like the direct Anthropic models.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openclaw#17473 introduced caching of the system prompt for Anthropic models provided via OpenRouter similarly to those provided directly via Anthropic. But that implementation doesn't respect the cacheRetention setting, instead always adding a 5 minute cache_control marker (i.e. the "short" option), even if cacheRetention was explicitly off. The "long" option would be very useful to keep the cache warm in heartbeats and save up to 90% of costs.
This PR checks the cacheRetention setting for OpenRouter Anthropic before setting
cache_control(addingttl: "1h"for the "long" option, as per the OpenRouter docs, or disabling cache on "none"). The default behavior (cacheRetention not specified) is the "short" cache, like the direct Anthropic models.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
The cacheRetention setting is now respected for Anthropic models provided via OpenRouter.
Security Impact (required)
Repro + Verification
Environment
Steps
agents.defaults.models["openrouter/anthropic/<any>"].params.cacheRetentionto "long"Expected
Actual
Evidence
See below.
Human Verification (required)
I've observed the broken behavior (described above) in the OpenRouter logs (30m heartbeats or requests >5m apart costing full price). With these changes, cache discounts are applied for these requests.
Also added unit tests for the new behavior.
Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Failure Recovery (if this breaks)
Just revert.
Risks and Mitigations
None