Skip to content

Respect cacheRetention for OpenRouter Anthropic models#42961

Closed
dddsnn wants to merge 1 commit into
openclaw:mainfrom
dddsnn:fix/openrouter-anthropic-cache-settings
Closed

Respect cacheRetention for OpenRouter Anthropic models#42961
dddsnn wants to merge 1 commit into
openclaw:mainfrom
dddsnn:fix/openrouter-anthropic-cache-settings

Conversation

@dddsnn

@dddsnn dddsnn commented Mar 11, 2026

Copy link
Copy Markdown

Summary

#17473 introduced caching of the system prompt for Anthropic models provided via OpenRouter similarly to those provided directly via Anthropic. But that implementation doesn't respect the cacheRetention setting, instead always adding a 5 minute cache_control marker (i.e. the "short" option), even if cacheRetention was explicitly off. The "long" option would be very useful to keep the cache warm in heartbeats and save up to 90% of costs.

This PR checks the cacheRetention setting for OpenRouter Anthropic before setting cache_control (adding ttl: "1h" for the "long" option, as per the OpenRouter docs, or disabling cache on "none"). The default behavior (cacheRetention not specified) is the "short" cache, like the direct Anthropic models.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

The cacheRetention setting is now respected for Anthropic models provided via OpenRouter.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Docker
  • Model/provider: openrouter/anthropic/*

Steps

  1. Set agents.defaults.models["openrouter/anthropic/<any>"].params.cacheRetention to "long"
  2. Tell the agent to say hi
  3. Wait >5 minutes
  4. Tell the agent to say hi again

Expected

  • The OpenRouter logs show a cache write on the first request, a steep discount for a cache read on the second

Actual

  • Both requests cost full price (plus useless cache write)

Evidence

See below.

Human Verification (required)

I've observed the broken behavior (described above) in the OpenRouter logs (30m heartbeats or requests >5m apart costing full price). With these changes, cache discounts are applied for these requests.

Also added unit tests for the new behavior.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

Just revert.

Risks and Mitigations

None

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 11, 2026
@greptile-apps

greptile-apps Bot commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a real bug where cacheRetention config was ignored for OpenRouter Anthropic models — they always received a 5-minute ephemeral cache regardless of the setting. The fix threads the resolved CacheRetention value from applyExtraParamsToAgent down into createOpenRouterSystemCacheWrapper, so "none" suppresses the cache header entirely and "long" emits { type: "ephemeral", ttl: "1h" } per the OpenRouter documentation. The default behavior (no config → "short") is preserved. The change is consistent with how direct Anthropic caching already works and comes with good unit-test coverage for all three retention values.

Key changes:

  • resolveCacheRetention now accepts modelId and recognises openrouter + anthropic/* as a cache-eligible combination
  • CacheRetention type is exported from anthropic-stream-wrappers.ts so it can be shared across modules
  • createOpenRouterSystemCacheWrapper now skips payload injection when cacheRetention is undefined or "none", and adds ttl: "1h" for "long"
  • Two minor style inconsistencies in the new code: proxy-stream-wrappers.ts imports using a .ts extension (all other relative imports use .js) and extra-params.ts imports CacheRetention without the type modifier (inconsistent with the companion import in proxy-stream-wrappers.ts)

Confidence Score: 4/5

  • Safe to merge — the fix is logically correct, well-tested, and backward-compatible; only minor style nits remain.
  • The core logic is correct and all edge cases (none/short/long/default) are covered by tests. Two small style inconsistencies (.ts vs .js import extension, missing type modifier) are worth cleaning up but do not affect runtime behaviour given the project's allowImportingTsExtensions: true tsconfig setting.
  • No files require special attention beyond the two minor style issues flagged in proxy-stream-wrappers.ts (line 4) and extra-params.ts (line 7).

Last reviewed commit: d0e571c

Comment thread src/agents/pi-embedded-runner/proxy-stream-wrappers.ts Outdated
Comment thread src/agents/pi-embedded-runner/extra-params.ts Outdated
Anthropic models provided via OpenRouter have had caching of the system
prompt enabled similarly to those provided directly via Anthropic. But
they didn't respect the cacheRetention setting, instead always adding a
5 minute cache_control marker (i.e. the "short" option), even if
cacheRetention was explicitly off.

The setting is now respected, using 1h ttl for the "long" option or
disabling cache on "none". The default behavior (cacheRetention not
specified) is the "short" cache, like the direct Anthropic models.
@dddsnn dddsnn force-pushed the fix/openrouter-anthropic-cache-settings branch from 95bac60 to b27a02d Compare March 12, 2026 15:46
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 27, 2026
@clawsweeper

clawsweeper Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as superseded: the reported cache-retention bug is real, but this branch is now stale, merge-conflicting, and targets an older wrapper structure that would lose current endpoint-class OpenRouter gating; the same remaining work is now tracked in a newer current-architecture PR.

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Review details

Best possible solution:

Land a current-main fix that preserves endpoint-class OpenRouter gating while making explicit cacheRetention drive none, short, and long marker behavior.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible. Configure cacheRetention: "none" or "long" for an openrouter/anthropic/* model on a verified OpenRouter route; current main still resolves no OpenRouter retention and injects only short { type: "ephemeral" } system/developer markers.

Is this the best way to solve the issue?

No. The branch was reasonable when written, but current main moved the wrapper boundary to endpoint-class gating; the safer path is the newer current-architecture PR rather than rebasing this stale branch directly.

Security review:

Security review cleared: The diff changes provider request payload cache markers and tests only, with no new dependencies, workflows, permissions, downloads, secrets handling, or code execution surface.

What I checked:

  • Current OpenRouter marker boundary: Current main gates OpenRouter Anthropic marker injection on resolveProviderRequestPolicy().endpointClass, preserving verified OpenRouter routes while excluding arbitrary OpenAI-compatible proxy URLs. (src/agents/pi-embedded-runner/proxy-stream-wrappers.ts:153, 3ba2ab7a0950)
  • Current bug remains source-reproducible: Current main still returns before explicit OpenRouter cacheRetention is read for the openai-completions OpenRouter Anthropic path, and the system/developer marker helper still hardcodes { type: "ephemeral" }. (src/agents/pi-embedded-runner/prompt-cache-retention.ts:17, 3ba2ab7a0950)
  • Current docs define the OpenRouter boundary: OpenClaw docs say OpenRouter Anthropic cache markers are injected only on verified OpenRouter routes, and stop when the model is repointed at arbitrary OpenAI-compatible proxy URLs. Public docs: docs/reference/prompt-caching.md. (docs/reference/prompt-caching.md:128, 3ba2ab7a0950)
  • This PR is stale and conflicting: Live PR metadata reports the branch head b27a02dde8a96c23e7d91626ffc7135e431d2cc5 as mergeable: CONFLICTING; the diff changes older anthropic-stream-wrappers.ts retention plumbing rather than the current prompt-cache-retention.ts/post-plugin wrapper boundary. (b27a02dde8a9)
  • Newer canonical PR tracks the same work: The open replacement PR describes the same OpenRouter Anthropic cacheRetention: "long"/"none" bug, touches current-main files (prompt-cache-retention.ts, anthropic-payload-policy.ts, proxy-stream-wrappers.ts, extra-params.ts), and explicitly lists this PR as related older-architecture work. (535a019d698c)
  • Feature history and routing candidates: History around the central provider/cache files shows Vincent Koc authored the endpoint-gating refactor, Aleksandrs Tihenko authored the original OpenRouter Anthropic cache marker merge, and Peter Steinberger has recent adjacent refactors in the same files. (0a3211df2d17)

Likely related people:

  • vincentkoc: Authored the OpenRouter endpoint-class cache marker gating and much of the recent provider attribution/cache refactor history that defines the current boundary. (role: recent adjacent owner; confidence: high; commits: 0a3211df2d17, 067496b12934, 5572e6965a76; files: src/agents/pi-embedded-runner/proxy-stream-wrappers.ts, src/agents/pi-embedded-runner/prompt-cache-retention.ts, src/agents/anthropic-payload-policy.ts)
  • rrenamed: Authored the merged OpenRouter Anthropic system-prompt cache_control marker feature that this PR attempts to refine. (role: introduced behavior; confidence: medium; commits: c52b2ad5c389; files: src/agents/pi-embedded-runner/proxy-stream-wrappers.ts, src/agents/pi-embedded-runner/extra-params.openrouter-cache-control.test.ts, src/agents/pi-embedded-runner/extra-params.ts)
  • steipete: Recent history shows adjacent refactors and release-adjacent touches in the agent provider/cache files, making this a fallback routing candidate if the endpoint-boundary owner is unavailable. (role: recent area contributor; confidence: medium; commits: 8e4eaec394f8, 05e89ff11707, 52bc809143c6; files: src/agents/pi-embedded-runner/proxy-stream-wrappers.ts, src/agents/pi-embedded-runner/prompt-cache-retention.ts, src/agents/anthropic-payload-policy.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3ba2ab7a0950.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Prompt Caching support for Anthropic API

1 participant