Skip to content

fix(cache): enable prompt cache retention for Anthropic Vertex AI#60888

Merged
vincentkoc merged 5 commits intoopenclaw:mainfrom
affsantos:fix/vertex-ai-prompt-cache-retention
Apr 5, 2026
Merged

fix(cache): enable prompt cache retention for Anthropic Vertex AI#60888
vincentkoc merged 5 commits intoopenclaw:mainfrom
affsantos:fix/vertex-ai-prompt-cache-retention

Conversation

@affsantos
Copy link
Copy Markdown
Contributor

@affsantos affsantos commented Apr 4, 2026

Summary

  • Problem: resolveAnthropicCacheRetentionFamily does not recognize anthropic-vertex as a first-class Anthropic provider. Without explicit user config, resolveCacheRetention returns undefined instead of defaulting to "short" — unlike the direct anthropic provider which defaults automatically. Additionally, resolveAnthropicEphemeralCacheControl gates the 1-hour TTL ("1h") behind a URL check for api.anthropic.com, silently blocking the 1-hour cache for Vertex AI even when explicitly requested via cacheRetention: "long".
  • Why it matters: Vertex AI users who opt into cacheRetention: "long" silently get the 5-minute TTL instead of the expected 1-hour. The cache diagnostics (PR feat(agents): add prompt cache break diagnostics #60707) also cannot correctly observe retention for Vertex AI because cacheRetention resolves to undefined through the canonical path. Both Anthropic and Google Vertex AI docs confirm 1-hour TTL is supported on current Claude models on Vertex AI.
  • What changed: (1) Added "anthropic-vertex" to resolveAnthropicCacheRetentionFamily so it returns "anthropic-direct" — matching the direct Anthropic provider behavior. This gives Vertex AI default cacheRetention: "short" through the canonical path. (2) Expanded the URL check in resolveAnthropicEphemeralCacheControl to also allow aiplatform.googleapis.com endpoints for the 1-hour TTL.
  • What did NOT change (scope boundary): No changes to the Vertex AI stream function, the enableCacheControl: true hardcoding, payload shaping, system prompt boundary splitting, or any other provider transport. The 5-minute cache was already working for Vertex AI via a fallback in resolveAnthropicEphemeralCacheControl — this fix makes the canonical path consistent and unblocks 1-hour TTL.

🤖 AI-assisted

  • Marked as AI-assisted
  • Degree of testing: fully tested (5 new tests + 36 total pass across touched test files)
  • I understand what the code does
  • Bot review conversations will be resolved

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: resolveAnthropicCacheRetentionFamily only checks for provider === "anthropic" for the "anthropic-direct" family. The "anthropic-vertex" provider falls through to the "custom-anthropic-api" branch, which requires hasExplicitCacheConfig: true — so without explicit user config, retention is undefined. Separately, resolveAnthropicEphemeralCacheControl only allows "1h" TTL when the base URL contains api.anthropic.com, which excludes the aiplatform.googleapis.com Vertex AI endpoints.
  • Missing detection / guardrail: No test coverage for anthropic-vertex in the cache retention resolution path. No test for Vertex AI URLs in the payload policy TTL gate.
  • Contributing context: The Vertex AI transport (anthropic-vertex-stream.ts) correctly sets enableCacheControl: true and passes through cacheRetention from stream options, so the transport itself is fine. The gap is in the resolution layer that feeds the transport.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extra-params.cache-retention-default.test.ts, anthropic-payload-policy.test.ts
  • Scenario the test should lock in: (1) anthropic-vertex defaults to "short" without config; (2) anthropic-vertex honors explicit "long" and "none"; (3) Vertex AI URLs get ttl: "1h" with long retention; (4) Vertex AI URLs get plain ephemeral with short retention.
  • Why this is the smallest reliable guardrail: Direct unit tests on the two resolution functions that had the gap. No runtime/integration test needed because the transport layer already has coverage.

User-visible / Behavior Changes

  • Vertex AI users with cacheRetention: "long" (or PI_CACHE_RETENTION=long) now correctly get the 1-hour cache TTL instead of silently falling back to 5 minutes.
  • Vertex AI cache retention now resolves through the canonical path ("short" by default), making cache diagnostics and observability accurate.

Diagram (if applicable)

Before:
[anthropic-vertex + cacheRetention=long] -> resolveAnthropicCacheRetentionFamily -> "custom-anthropic-api"
  -> resolveAnthropicEphemeralCacheControl(baseUrl=*.aiplatform.googleapis.com, retention=long)
  -> URL check fails (not api.anthropic.com) -> { type: "ephemeral" } (5m, NOT 1h)

After:
[anthropic-vertex + cacheRetention=long] -> resolveAnthropicCacheRetentionFamily -> "anthropic-direct"
  -> resolveAnthropicEphemeralCacheControl(baseUrl=*.aiplatform.googleapis.com, retention=long)
  -> URL check passes (aiplatform.googleapis.com) -> { type: "ephemeral", ttl: "1h" } ✓

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS
  • Runtime: Node 22
  • Model/provider: Claude Sonnet 4.6 via Anthropic Vertex AI
  • Relevant config: provider: anthropic-vertex, cacheRetention: "long" in extra params

Steps

  1. Configure OpenClaw with anthropic-vertex provider and cacheRetention: "long" in model params
  2. Send a multi-turn conversation
  3. Inspect the API payload sent to Vertex AI

Expected

  • cache_control: { type: "ephemeral", ttl: "1h" } on system and last user message blocks

Actual (before fix)

  • cache_control: { type: "ephemeral" } — 5-minute TTL silently used despite requesting "long"

Evidence

  • Tested patch in our OpenClaw instance and analyse payloads
  • Failing test/log before + passing after
    • 5 new tests pass: 3 in extra-params.cache-retention-default.test.ts (vertex default/long/none), 2 in anthropic-payload-policy.test.ts (vertex 1h TTL, vertex 5m TTL)
    • All 36 tests pass across the 3 touched test files
    • All 11 existing anthropic-vertex-stream.test.ts tests continue to pass

Human Verification (required)

  • Verified scenarios: Ran all 3 affected test files (36 tests total). Verified the resolveAnthropicCacheRetentionFamily change returns "anthropic-direct" for "anthropic-vertex". Verified the URL check matches both api.anthropic.com and aiplatform.googleapis.com (global and regional patterns).
  • Edge cases checked: Global Vertex AI endpoint (aiplatform.googleapis.com without region prefix), regional endpoint (us-east5-aiplatform.googleapis.com), custom proxy URL (still excluded from 1h TTL). Explicit "none" disables caching for Vertex AI. Existing Bedrock, OpenRouter, and custom provider behavior unchanged.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No — existing cacheRetention / PI_CACHE_RETENTION config now works correctly for Vertex AI
  • Migration needed? No

Risks and Mitigations

  • Risk: Sending ttl: "1h" to Vertex AI endpoints that run older Claude models (3.7 Sonnet, 3.5 Sonnet) which do not support 1-hour TTL per Google docs (same behavior as the direct Anthropic API btw)
    • Mitigation: The Vertex AI provider catalog (extensions/anthropic-vertex/provider-catalog.ts) only offers current models (Claude Opus 4.6, Claude Sonnet 4.6) which support 1-hour TTL. Older models are not in the catalog. If a user manually configures an unsupported model, the Vertex AI API would reject the request — same behavior as the direct Anthropic API.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Apr 4, 2026
@affsantos affsantos marked this pull request as ready for review April 4, 2026 13:52
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 4, 2026

Greptile Summary

This PR fixes two related gaps in the Anthropic Vertex AI prompt caching path: resolveAnthropicCacheRetentionFamily now returns "anthropic-direct" for "anthropic-vertex" (matching its behavior in isAnthropicFamilyCacheTtlEligible which already included Vertex AI), and resolveAnthropicEphemeralCacheControl now allows "1h" TTL for aiplatform.googleapis.com base URLs alongside api.anthropic.com. Five new targeted unit tests lock in the corrected behavior.

Confidence Score: 5/5

This PR is safe to merge — it makes two small, targeted bug fixes with five new unit tests and no changes to transport, auth, or payload structure.

Both changes are additive and consistent with existing patterns: the anthropic-vertex family classification aligns with the pre-existing isAnthropicFamilyCacheTtlEligible check, and the URL substring guard follows the same String.includes style already used for api.anthropic.com. All remaining observations are P2 style-level notes. No blocking issues.

No files require special attention.

Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/vertex-ai-p..." | Re-trigger Greptile

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bd8e12269

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/anthropic-payload-policy.ts Outdated
@affsantos
Copy link
Copy Markdown
Contributor Author

affsantos commented Apr 4, 2026

Addressed the Greptile finding by added "anthropic-vertex" to isAnthropicFamilyCacheTtlEligible in 789ac78. Both functions in the file now consistently recognize anthropic-vertex as a first-class Anthropic provider.

@affsantos
Copy link
Copy Markdown
Contributor Author

@greptile-apps review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d631ba8fc2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/anthropic-payload-policy.ts Outdated
@affsantos affsantos force-pushed the fix/vertex-ai-prompt-cache-retention branch from d631ba8 to 86aea6c Compare April 4, 2026 15:51
@vincentkoc vincentkoc force-pushed the fix/vertex-ai-prompt-cache-retention branch from 3b8c691 to d844348 Compare April 5, 2026 06:46
@vincentkoc vincentkoc merged commit eb0f367 into openclaw:main Apr 5, 2026
7 checks passed
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…enclaw#60888)

* fix(cache): enable prompt cache retention for Anthropic Vertex AI

* fix(cache): add anthropic-vertex to isAnthropicFamilyCacheTtlEligible

* fix(cache): use hostname parsing for long-TTL endpoint eligibility

* docs(changelog): note anthropic vertex cache ttl fix

---------

Co-authored-by: affsantos <andreffsantos91@gmail.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…enclaw#60888)

* fix(cache): enable prompt cache retention for Anthropic Vertex AI

* fix(cache): add anthropic-vertex to isAnthropicFamilyCacheTtlEligible

* fix(cache): use hostname parsing for long-TTL endpoint eligibility

* docs(changelog): note anthropic vertex cache ttl fix

---------

Co-authored-by: affsantos <andreffsantos91@gmail.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…enclaw#60888)

* fix(cache): enable prompt cache retention for Anthropic Vertex AI

* fix(cache): add anthropic-vertex to isAnthropicFamilyCacheTtlEligible

* fix(cache): use hostname parsing for long-TTL endpoint eligibility

* docs(changelog): note anthropic vertex cache ttl fix

---------

Co-authored-by: affsantos <andreffsantos91@gmail.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants