fix(cache): enable prompt cache retention for Anthropic Vertex AI#60888
Conversation
Greptile SummaryThis PR fixes two related gaps in the Anthropic Vertex AI prompt caching path: Confidence Score: 5/5This PR is safe to merge — it makes two small, targeted bug fixes with five new unit tests and no changes to transport, auth, or payload structure. Both changes are additive and consistent with existing patterns: the anthropic-vertex family classification aligns with the pre-existing isAnthropicFamilyCacheTtlEligible check, and the URL substring guard follows the same String.includes style already used for api.anthropic.com. All remaining observations are P2 style-level notes. No blocking issues. No files require special attention. Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/vertex-ai-p..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bd8e12269
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed the Greptile finding by added |
|
@greptile-apps review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d631ba8fc2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d631ba8 to
86aea6c
Compare
3b8c691 to
d844348
Compare
…enclaw#60888) * fix(cache): enable prompt cache retention for Anthropic Vertex AI * fix(cache): add anthropic-vertex to isAnthropicFamilyCacheTtlEligible * fix(cache): use hostname parsing for long-TTL endpoint eligibility * docs(changelog): note anthropic vertex cache ttl fix --------- Co-authored-by: affsantos <andreffsantos91@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…enclaw#60888) * fix(cache): enable prompt cache retention for Anthropic Vertex AI * fix(cache): add anthropic-vertex to isAnthropicFamilyCacheTtlEligible * fix(cache): use hostname parsing for long-TTL endpoint eligibility * docs(changelog): note anthropic vertex cache ttl fix --------- Co-authored-by: affsantos <andreffsantos91@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…enclaw#60888) * fix(cache): enable prompt cache retention for Anthropic Vertex AI * fix(cache): add anthropic-vertex to isAnthropicFamilyCacheTtlEligible * fix(cache): use hostname parsing for long-TTL endpoint eligibility * docs(changelog): note anthropic vertex cache ttl fix --------- Co-authored-by: affsantos <andreffsantos91@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Summary
resolveAnthropicCacheRetentionFamilydoes not recognizeanthropic-vertexas a first-class Anthropic provider. Without explicit user config,resolveCacheRetentionreturnsundefinedinstead of defaulting to"short"— unlike the directanthropicprovider which defaults automatically. Additionally,resolveAnthropicEphemeralCacheControlgates the 1-hour TTL ("1h") behind a URL check forapi.anthropic.com, silently blocking the 1-hour cache for Vertex AI even when explicitly requested viacacheRetention: "long".cacheRetention: "long"silently get the 5-minute TTL instead of the expected 1-hour. The cache diagnostics (PR feat(agents): add prompt cache break diagnostics #60707) also cannot correctly observe retention for Vertex AI becausecacheRetentionresolves toundefinedthrough the canonical path. Both Anthropic and Google Vertex AI docs confirm 1-hour TTL is supported on current Claude models on Vertex AI."anthropic-vertex"toresolveAnthropicCacheRetentionFamilyso it returns"anthropic-direct"— matching the direct Anthropic provider behavior. This gives Vertex AI defaultcacheRetention: "short"through the canonical path. (2) Expanded the URL check inresolveAnthropicEphemeralCacheControlto also allowaiplatform.googleapis.comendpoints for the 1-hour TTL.enableCacheControl: truehardcoding, payload shaping, system prompt boundary splitting, or any other provider transport. The 5-minute cache was already working for Vertex AI via a fallback inresolveAnthropicEphemeralCacheControl— this fix makes the canonical path consistent and unblocks 1-hour TTL.🤖 AI-assisted
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
resolveAnthropicCacheRetentionFamilyonly checks forprovider === "anthropic"for the"anthropic-direct"family. The"anthropic-vertex"provider falls through to the"custom-anthropic-api"branch, which requireshasExplicitCacheConfig: true— so without explicit user config, retention isundefined. Separately,resolveAnthropicEphemeralCacheControlonly allows"1h"TTL when the base URL containsapi.anthropic.com, which excludes theaiplatform.googleapis.comVertex AI endpoints.anthropic-vertexin the cache retention resolution path. No test for Vertex AI URLs in the payload policy TTL gate.anthropic-vertex-stream.ts) correctly setsenableCacheControl: trueand passes throughcacheRetentionfrom stream options, so the transport itself is fine. The gap is in the resolution layer that feeds the transport.Regression Test Plan (if applicable)
extra-params.cache-retention-default.test.ts,anthropic-payload-policy.test.tsanthropic-vertexdefaults to"short"without config; (2)anthropic-vertexhonors explicit"long"and"none"; (3) Vertex AI URLs getttl: "1h"with long retention; (4) Vertex AI URLs get plainephemeralwith short retention.User-visible / Behavior Changes
cacheRetention: "long"(orPI_CACHE_RETENTION=long) now correctly get the 1-hour cache TTL instead of silently falling back to 5 minutes."short"by default), making cache diagnostics and observability accurate.Diagram (if applicable)
Security Impact (required)
NoNoNoNoNoRepro + Verification
Environment
provider: anthropic-vertex,cacheRetention: "long"in extra paramsSteps
anthropic-vertexprovider andcacheRetention: "long"in model paramsExpected
cache_control: { type: "ephemeral", ttl: "1h" }on system and last user message blocksActual (before fix)
cache_control: { type: "ephemeral" }— 5-minute TTL silently used despite requesting"long"Evidence
extra-params.cache-retention-default.test.ts(vertex default/long/none), 2 inanthropic-payload-policy.test.ts(vertex 1h TTL, vertex 5m TTL)anthropic-vertex-stream.test.tstests continue to passHuman Verification (required)
resolveAnthropicCacheRetentionFamilychange returns"anthropic-direct"for"anthropic-vertex". Verified the URL check matches bothapi.anthropic.comandaiplatform.googleapis.com(global and regional patterns).aiplatform.googleapis.comwithout region prefix), regional endpoint (us-east5-aiplatform.googleapis.com), custom proxy URL (still excluded from 1h TTL). Explicit"none"disables caching for Vertex AI. Existing Bedrock, OpenRouter, and custom provider behavior unchanged.Review Conversations
Compatibility / Migration
YesNo— existingcacheRetention/PI_CACHE_RETENTIONconfig now works correctly for Vertex AINoRisks and Mitigations
ttl: "1h"to Vertex AI endpoints that run older Claude models (3.7 Sonnet, 3.5 Sonnet) which do not support 1-hour TTL per Google docs (same behavior as the direct Anthropic API btw)extensions/anthropic-vertex/provider-catalog.ts) only offers current models (Claude Opus 4.6, Claude Sonnet 4.6) which support 1-hour TTL. Older models are not in the catalog. If a user manually configures an unsupported model, the Vertex AI API would reject the request — same behavior as the direct Anthropic API.