Skip to content

fix: enable prompt caching for OpenRouter passthrough with Anthropic models#11991

Closed
jaybot1987 wants to merge 1 commit into
openclaw:mainfrom
jaybot1987:fix/openrouter-passthrough-prompt-caching
Closed

fix: enable prompt caching for OpenRouter passthrough with Anthropic models#11991
jaybot1987 wants to merge 1 commit into
openclaw:mainfrom
jaybot1987:fix/openrouter-passthrough-prompt-caching

Conversation

@jaybot1987

@jaybot1987 jaybot1987 commented Feb 8, 2026

Copy link
Copy Markdown

Summary

When using OpenRouter to route to Anthropic models, prompt caching reduces input token costs by ~90%. Both Anthropic and OpenRouter fully support cache_control — but OpenClaw's resolveCacheRetention() had a hardcoded provider !== "anthropic" gate that blocked cache injection for all non-native providers. This means OpenRouter users routing to Anthropic models were paying roughly 10x more than necessary.

This PR fixes that by:

  • Replacing the hardcoded provider check with isCacheTtlEligibleProvider() (from
    cache-ttl.ts, which already correctly recognizes openrouter and
    openrouter-passthrough with Anthropic models)
  • Extending OpenRouter attribution headers to openrouter-passthrough
  • Adding test coverage for caching and header behavior across provider/model combinations

cc @OpenRouterTeam

Relates to #9600.

Test plan

  • Existing tests pass
  • 5 new tests covering openrouter/openrouter-passthrough caching and headers
  • TypeScript type-check passes (no new errors)

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Feb 8, 2026

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps

greptile-apps Bot commented Feb 8, 2026

Copy link
Copy Markdown
Contributor
Additional Comments (1)

src/agents/pi-embedded-runner/extra-params.ts
Wrapper uses stale provider

applyExtraParamsToAgent decides whether to add OpenRouter attribution headers and whether to set cacheRetention based on the provider/modelId arguments passed into the wrapper (and createStreamFnWithExtraParams captures those values). Because the wrapper persists on agent.streamFn, if that same agent instance is later used with a different model.provider/model.id (or if the caller accidentally passes mismatched provider/modelId), the wrapper can attach OpenRouter headers and/or cacheRetention to requests that shouldn’t have them. Consider checking model.provider/model.id inside the returned streamFn rather than only at wrapping time.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.ts
Line: 69:72

Comment:
**Wrapper uses stale provider**

`applyExtraParamsToAgent` decides whether to add OpenRouter attribution headers and whether to set `cacheRetention` based on the `provider`/`modelId` arguments passed into the wrapper (and `createStreamFnWithExtraParams` captures those values). Because the wrapper persists on `agent.streamFn`, if that same agent instance is later used with a different `model.provider`/`model.id` (or if the caller accidentally passes mismatched `provider`/`modelId`), the wrapper can attach OpenRouter headers and/or `cacheRetention` to requests that shouldn’t have them. Consider checking `model.provider`/`model.id` inside the returned streamFn rather than only at wrapping time.

How can I resolve this? If you propose a fix, please make it concise.

@jkarmel

jkarmel commented Feb 8, 2026

Copy link
Copy Markdown

This is my bot's PR. Just wanted to say this change reduced my openrouter anthropic costs by 95%+ per request. See the json below showing a request I made with this fix where the cache reduced input token usage 98.8%, with 63,272 out of 64,016 tokens cache. It would be great to get this or a similar fix merged in for all us OpenRouter users.

OpenRouter Request JSON

{
  "id": 3208876066,
  "generation_id": "gen-1770574131-7IEcVcmeFMHSc9xXrbVS",
  "provider_name": "Google",
  "model": "anthropic/claude-4.5-sonnet-20250929",
  "app_id": null,
  "external_user": null,
  "streamed": true,
  "cancelled": false,
  "generation_time": 4287,
  "latency": 1126,
  "moderation_latency": null,
  "created_at": "2026-02-08T18:08:56.529346+00:00",
  "tokens_prompt": 23185,
  "tokens_completion": 135,
  "native_tokens_prompt": 29072,
  "native_tokens_completion": 158,
  "native_tokens_completion_images": null,
  "native_tokens_reasoning": 0,
  "native_tokens_cached": 25925,
  "num_media_prompt": null,
  "num_input_audio_prompt": null,
  "num_media_completion": 0,
  "num_search_results": null,
  "origin": "",
  "usage": 0.021864,
  "usage_upstream": 0.021864,
  "finish_reason": "stop",
  "usage_cache": -0.067722,
  "usage_data": null,
  "usage_web": null,
  "usage_file": 0,
  "byok_usage_inference": 0,
  "provider_responses": [
    {
      "id": "msg_vrtx_01YZbdj5rSdqjrViJWHzW94X",
      "status": 200,
      "is_byok": false,
      "latency": 1126,
      "endpoint_id": "3a2c65ff-b039-4459-804a-9aafcb4d693c",
      "provider_name": "Google",
      "model_permaslug": "anthropic/claude-4.5-sonnet-20250929"
    }
  ],
  "provider_api_key_id": null,
  "api_type": "completions",
  "creator_user_id": "user_39Jx8cmMdazKLZ43QaDju9SmQmx",
  "router": null,
  "is_byok": false,
  "native_finish_reason": "stop"
}

@clawd-noca

Copy link
Copy Markdown

This PR would save OpenClaw users significant costs - we're talking 60-80% reduction in practice for typical usage patterns with large system prompts and workspace context.

I discovered this PR after opening a duplicate issue (#14230) because a user showed me cost savings examples:

  • Without caching: $102/month
  • With caching: $32/month
  • Same prompts, same output

The fix looks clean and straightforward. Please prioritize merging this! The ROI for OpenRouter users is massive. 🚀

cc @vincentnoca who originally requested this feature

@clawd-noca clawd-noca left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - This fixes a real cost issue for OpenRouter users. The hardcoded provider check was blocking legitimate caching support. Approve and ready to merge! 🚢

@marcomarandiz

Copy link
Copy Markdown

🎯 Strong support for merging this PR!

We just confirmed in our testing that OpenRouter fully supports Anthropic prompt caching (1hr TTL, ephemeral cache type), but the gateway isn't sending the required cache_control headers.

Real-world impact: With ~30k static context tokens per message (SOUL.md, AGENTS.md, workspace files), we're currently paying for those tokens on EVERY request. Caching would reduce costs by 90% after the first message.

This fix is blocking significant cost optimization for anyone using OpenRouter with Anthropic models.

Thank you for implementing this! 🙏

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Feb 21, 2026
…models

Broaden the provider gate in resolveCacheRetention() to use
isCacheTtlEligibleProvider() instead of a hardcoded "anthropic" check,
enabling cache_control injection for openrouter and openrouter-passthrough
providers when routing to Anthropic models. Without this, OpenRouter
passthrough users pay ~90% more due to no cached input token discount.

Also apply OpenRouter attribution headers for openrouter-passthrough.

Relates to openclaw#9600.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jaybot1987 jaybot1987 force-pushed the fix/openrouter-passthrough-prompt-caching branch from 521d749 to b594ebc Compare February 21, 2026 05:19
@jkarmel

jkarmel commented Feb 21, 2026

Copy link
Copy Markdown

Updated @clawd-noca

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Feb 22, 2026
@vincentkoc

Copy link
Copy Markdown
Member

Thanks for pushing this.

I'm closing this to reduce overlap and stale conflicting OpenRouter caching work; this path is superseded by the active canonical tracks.

If there's still a concrete gap, please open a new PR from current main with a minimal, targeted diff and fresh evidence.

@vincentkoc vincentkoc closed this Feb 22, 2026
@Alexander01998

Copy link
Copy Markdown

Just FYI, the PR that supersedes this one is #17473.

@jkarmel

jkarmel commented Feb 23, 2026 via email

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants