Skip to content

fix(cache): honour explicit cacheRetention for OpenRouter→Anthropic models#79370

Closed
mene-crab wants to merge 2 commits into
openclaw:mainfrom
mene-crab:fix/openrouter-cache-ttl-pr
Closed

fix(cache): honour explicit cacheRetention for OpenRouter→Anthropic models#79370
mene-crab wants to merge 2 commits into
openclaw:mainfrom
mene-crab:fix/openrouter-cache-ttl-pr

Conversation

@mene-crab

@mene-crab mene-crab commented May 8, 2026

Copy link
Copy Markdown

Problem

When cacheRetention: "long" is explicitly configured for Anthropic models routed through OpenRouter, the resulting requests carry cache_control: { type: "ephemeral" } without ttl: "1h". This means only 5-minute prompt cache lifetime instead of the expected 1 hour — a significant cost regression for heavy users. Additionally, cacheRetention: "none" was also silently ignored — the OpenRouter wrapper always injected ephemeral cache markers regardless.

The same cacheRetention: "long" config works correctly when using the direct Anthropic API.

Root Cause

Three bugs in sequence:

1. resolveCacheRetention() — early return ignores explicit config

src/agents/pi-embedded-runner/prompt-cache-retention.ts

The function checked resolveAnthropicCacheRetentionFamily() first. For provider: "openrouter", this returns undefined (not a recognized Anthropic/Google cache family). The subsequent early return:

if (!family && !googleEligible) {
  return undefined;
}

exited before ever checking extraParams.cacheRetention. Even with cacheRetention: "long" explicitly set in config, the function never reached that check.

Fix: Move explicit-config checks (cacheRetention / legacy cacheControlTtl) before the family-based early return, so operator-specified retention always wins.

2. applyAnthropicEphemeralCacheControlMarkers() — hardcoded no-TTL markers

src/agents/anthropic-payload-policy.ts

Called by createOpenRouterSystemCacheWrapper, this function always wrote { type: "ephemeral" } — even if the wrapper had resolved a cacheControl with ttl: "1h", the TTL was discarded.

Fix: Accept an optional cacheControl: AnthropicEphemeralCacheControl parameter (defaults to { type: "ephemeral" } for backward compat). Also export resolveAnthropicEphemeralCacheControl() so the wrapper can build the proper marker with TTL.

3. cacheRetention: "none" still injected markers

src/agents/pi-embedded-runner/proxy-stream-wrappers.ts

When cacheRetention is "none", resolveAnthropicEphemeralCacheControl correctly returns undefined, but the wrapper's cacheControl ?? { type: "ephemeral" } fallback still injected a short-cache marker — contradicting the operator's explicit opt-out.

Fix: Skip marker insertion entirely when cacheRetention === "none".

Before / After

Before (OpenRouter→Anthropic, cacheRetention: "long"):

"cache_control": { "type": "ephemeral" }

After (same config):

"cache_control": { "type": "ephemeral", "ttl": "1h" }

After (cacheRetention: "none"): no cache_control markers at all.

Review Fix: endpoint-class gate in resolveCacheRetention

Initial patch checked provider === "openrouter" in resolveCacheRetention(), but this did not account for baseUrl. If a user repoints the OpenRouter provider at a custom OpenAI-compatible proxy, the system wrapper correctly skips markers via the endpoint-class gate, but resolveCacheRetention() would still pass cacheRetention through to pi-ai — leaking cache_control with TTL to that proxy path.

Fix: resolveCacheRetention() now accepts an optional baseUrl parameter and uses resolveProviderRequestPolicy to determine endpointClass, mirroring the same gate as createOpenRouterSystemCacheWrapper. Both call sites in extra-params.ts pass model.baseUrl / callModel.baseUrl.

Regression tests added:

  • OpenRouter provider + non-OpenRouter baseUrl → undefined (no leak)
  • OpenRouter provider + default (unset) baseUrl → honoured
  • OpenRouter provider + openrouter.ai baseUrl → honoured

Real Behavior Proof

Behavior or issue addressed: cacheRetention: "long" config was silently ignored for OpenRouter→Anthropic models. Requests always got 5-minute ephemeral cache markers instead of 1-hour TTL. cacheRetention: "none" was also ignored — markers were always injected.

Real environment tested: Self-hosted OpenClaw v2026.5.7 on VMware VM (Linux Mint), OpenRouter provider, model openrouter/anthropic/claude-haiku-4.5 with params.cacheRetention: "long". Outbound request payloads captured via HTTPS logging proxy.

Exact steps or command run after the patch:

  1. Built patched dist from fork branch cherry-picked onto v2026.5.7 tag
  2. Replaced dist/ in production install and restarted gateway
  3. Sent test message to Haiku model via Telegram
  4. Captured outbound request payload through logging proxy

Evidence after fix:

=== Captured request payload (after patch, v2026.5.7) ===
model: anthropic/claude-haiku-4.5
system message: cache_control: { "type": "ephemeral", "ttl": "1h" }
last user message: cache_control: { "type": "ephemeral", "ttl": "1h" }
last tool definition: cache_control: { "type": "ephemeral", "ttl": "1h" }

Observed result after the fix: All 3 cache_control markers in the Anthropic-format payload now include "ttl": "1h". Before the patch (same config, unpatched v2026.5.7), all 3 markers had only { "type": "ephemeral" } without ttl. The fix restores the expected 1-hour prompt cache lifetime on the Anthropic backend, matching the direct Anthropic API behavior. When cacheRetention: "none" is set, no cache markers are injected.

What was not tested: Cache hit/miss ratio on the Anthropic backend side (only verified outbound cache_control markers). Did not test with provider: "anthropic" direct API (already working). Did not test cacheRetention: "none" end-to-end (unit test coverage only).

Changes

File Change
prompt-cache-retention.ts Reorder: explicit config checks before family early-return; add baseUrl param; use resolveProviderRequestPolicy for endpoint-class gate
anthropic-payload-policy.ts Export resolveAnthropicEphemeralCacheControl; add cacheControl param to marker function
proxy-stream-wrappers.ts Accept extraParams; resolve + pass cacheControl with TTL; skip markers when "none"
extra-params.ts Pass effectiveExtraParams to wrapper; pass baseUrl to resolveCacheRetention

Tests

Test file New cases
prompt-cache-retention.test.ts explicit long/short/none for OpenRouter Anthropic; undefined without config; endpoint-class baseUrl gating (3 tests)
anthropic-cache-control-payload.test.ts preserves ttl in custom cacheControl
proxy-stream-wrappers.test.ts ttl: "1h" with long; no ttl with short/default; no markers with "none"; custom provider pointing to openrouter.ai

Related

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 8, 2026
@clawsweeper

clawsweeper Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 1:10 AM ET / 05:10 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +63, Tests +207. Total +270 across 7 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c0195f7ed579.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove proof: sufficient: Current real behavior proof status is not_applicable, not sufficient.
  • remove P2: Current review triage priority is none.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 compatibility: Current PR review selected no merge-risk labels.
  • remove status: 👀 ready for maintainer look: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +63, Tests +207. Total +270 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 4 102 39 +63
Tests 3 209 2 +207
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 311 41 +270

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@mene-crab

mene-crab commented May 8, 2026

Copy link
Copy Markdown
Author

Real Behavior Proof

Reproduction

Setup: OpenClaw v2026.5.4, OpenRouter→Anthropic model with cacheRetention: "long".

Actual model config (from openclaw.json):

"openrouter/anthropic/claude-haiku-4.5": {
  "alias": "Haiku",
  "params": {
    "cacheRetention": "long"
  }
}

The cacheRetention: "long" is set at the model params level. OpenRouter is the provider (resolved from the model ID prefix openrouter/anthropic/), API is the default OpenAI-completions transport.

Before (unpatched v2026.5.4)

Captured request payload via HTTPS proxy dump (OpenRouter /chat/completions):

system message:         cache_control: { "type": "ephemeral" }
last user message:      cache_control: { "type": "ephemeral" }
last tool definition:   cache_control: { "type": "ephemeral" }

The pi-ai SDK places cache_control on the last user message and last tool definition. OpenClaw's createOpenRouterSystemCacheWrapper adds one on the system message. None include ttl — OpenRouter forwards to Anthropic backend → 5-minute cache lifetime.

After (patched, same config)

system message:         cache_control: { "type": "ephemeral", "ttl": "1h" }
last user message:      cache_control: { "type": "ephemeral", "ttl": "1h" }
last tool definition:   cache_control: { "type": "ephemeral", "ttl": "1h" }

All 3 markers now include ttl: "1h". The system message marker is fixed by this PR (patches 1+2). The other two markers already went through getCompatCacheControl() which was returning { type: "ephemeral" } without TTL because resolveCacheRetention() returned undefined for OpenRouter — also fixed by patch 1.

Why the bug happens (trace)

  1. resolveCacheRetention({ cacheRetention: "long" }, "openrouter", "openai-completions", ...) — provider "openrouter" does not match any Anthropic cache family → resolveAnthropicCacheRetentionFamily returns undefined → early return undefined before checking extraParams.cacheRetention

  2. createOpenRouterSystemCacheWrapper receives undefinedresolveAnthropicEphemeralCacheControl(undefined, undefined){ type: "ephemeral" } (no ttl)

  3. applyAnthropicEphemeralCacheControlMarkers(payload) writes { type: "ephemeral" } on the system message — ttl never appears

After patch, step 1 returns "long" (explicit config checked first), step 2 produces { type: "ephemeral", ttl: "1h" }, step 3 preserves it.

Cost impact

With anthropic/claude-sonnet-4.6 at scale:

  • Cache miss input: ~$3.00/M tokens
  • Cache read: ~$0.30/M tokens (10× cheaper)
  • 5-min TTL → frequent re-reads from scratch; 1-hour TTL → sustained cache hits across conversation turns

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@mene-crab mene-crab force-pushed the fix/openrouter-cache-ttl-pr branch from 4f0ab92 to 5b5445d Compare May 9, 2026 09:29
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@mene-crab mene-crab force-pushed the fix/openrouter-cache-ttl-pr branch from 5b5445d to 3f25c10 Compare May 9, 2026 11:03
@mene-crab mene-crab requested review from a team as code owners May 9, 2026 11:03
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: discord Channel integration: discord channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: signal Channel integration: signal channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web labels May 9, 2026
mene-crab added 2 commits May 11, 2026 15:23
…odels

cacheRetention: "long" was silently ignored for models routed through
OpenRouter, producing only 5-minute ephemeral cache markers instead of
the expected 1-hour TTL. cacheRetention: "none" was also ignored.

Two issues in the chain:

1. resolveCacheRetention() bailed out early when the provider didn't
   match a known Anthropic/Google cache family, even when the operator
   had explicitly configured cacheRetention in model params. The
   explicit-config checks now run before the family-based early return,
   but only for eligible providers (Anthropic family, Google, or
   verified OpenRouter→Anthropic routes where provider is "openrouter"
   and the model ref starts with "anthropic/").

2. createOpenRouterSystemCacheWrapper() now accepts optional extraParams
   and resolves cacheRetention internally after confirming the request
   targets a verified OpenRouter→Anthropic route (using the same
   endpoint-class + model-ref boundary as marker insertion). This
   ensures custom provider ids pointing to openrouter.ai are covered.

3. applyAnthropicEphemeralCacheControlMarkers() always hardcoded
   { type: "ephemeral" }, discarding any TTL the wrapper resolved.
   It now accepts an optional cacheControl parameter (defaults to
   { type: "ephemeral" } for backward compat) and a
   skipMarkerInsertion flag for the "none" case.

When cacheRetention is "none", no new cache_control markers are
inserted on system/developer messages, but the thinking-block
sanitizer (stripping stale cache_control from thinking/redacted_thinking
blocks) still runs.

Before: OpenRouter→Anthropic requests always got cache_control without
ttl (or with markers despite "none"), regardless of config.

After: cacheRetention: "long" → { type: "ephemeral", ttl: "1h" };
cacheRetention: "none" → no new markers, sanitizer still active.
Address review finding: resolveCacheRetention() previously checked only
provider === "openrouter", missing the baseUrl/endpoint-class gate used
by createOpenRouterSystemCacheWrapper. If a user repointed the OpenRouter
provider at a custom OpenAI-compatible proxy, the system wrapper would
correctly skip cache markers, but resolveCacheRetention would still pass
cacheRetention through to pi-ai, which then emitted cache_control with
ttl for that proxy path.

Now resolveCacheRetention accepts an optional baseUrl parameter and uses
resolveProviderRequestPolicy to determine endpointClass, mirroring the
same gate as the system wrapper. Added regression tests for:
- OpenRouter provider on non-OpenRouter baseUrl → undefined
- OpenRouter provider on default (unset) baseUrl → honoured
- OpenRouter provider with openrouter.ai baseUrl → honoured
@barnacle-openclaw

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@steipete

steipete commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Thanks for the PR. I landed the current-source version of this fix in #89347 / commit 732d697, with exact-head CI and focused provider/model proof recorded there.

Closing this as superseded by the merged maintainer batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants