Skip to content

Respect cacheRetention for OpenRouter Anthropic models#437

Open
BingqingLyu wants to merge 1 commit into
mainfrom
fork-pr-42961-fix-openrouter-anthropic-cache-settings
Open

Respect cacheRetention for OpenRouter Anthropic models#437
BingqingLyu wants to merge 1 commit into
mainfrom
fork-pr-42961-fix-openrouter-anthropic-cache-settings

Conversation

@BingqingLyu

@BingqingLyu BingqingLyu commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

openclaw#17473 introduced caching of the system prompt for Anthropic models provided via OpenRouter similarly to those provided directly via Anthropic. But that implementation doesn't respect the cacheRetention setting, instead always adding a 5 minute cache_control marker (i.e. the "short" option), even if cacheRetention was explicitly off. The "long" option would be very useful to keep the cache warm in heartbeats and save up to 90% of costs.

This PR checks the cacheRetention setting for OpenRouter Anthropic before setting cache_control (adding ttl: "1h" for the "long" option, as per the OpenRouter docs, or disabling cache on "none"). The default behavior (cacheRetention not specified) is the "short" cache, like the direct Anthropic models.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

The cacheRetention setting is now respected for Anthropic models provided via OpenRouter.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Docker
  • Model/provider: openrouter/anthropic/*

Steps

  1. Set agents.defaults.models["openrouter/anthropic/<any>"].params.cacheRetention to "long"
  2. Tell the agent to say hi
  3. Wait >5 minutes
  4. Tell the agent to say hi again

Expected

  • The OpenRouter logs show a cache write on the first request, a steep discount for a cache read on the second

Actual

  • Both requests cost full price (plus useless cache write)

Evidence

See below.

Human Verification (required)

I've observed the broken behavior (described above) in the OpenRouter logs (30m heartbeats or requests >5m apart costing full price). With these changes, cache discounts are applied for these requests.

Also added unit tests for the new behavior.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

Just revert.

Risks and Mitigations

None

Anthropic models provided via OpenRouter have had caching of the system
prompt enabled similarly to those provided directly via Anthropic. But
they didn't respect the cacheRetention setting, instead always adding a
5 minute cache_control marker (i.e. the "short" option), even if
cacheRetention was explicitly off.

The setting is now respected, using 1h ttl for the "long" option or
disabling cache on "none". The default behavior (cacheRetention not
specified) is the "short" cache, like the direct Anthropic models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Prompt Caching support for Anthropic API

2 participants