fix(anthropic-vertex): stop re-marking cache_control on transport-budgeted payloads#92387
Conversation
|
Codex review: passed. Reviewed June 12, 2026, 8:58 AM ET / 12:58 UTC. Summary PR surface: Source -35, Tests -11. Total -46 across 2 files. Reproducibility: yes. at source level. Current main reapplies cache policy to a finalized, fully budgeted payload, and the linked production logs show the corresponding five-marker rejection; this review did not run a live post-fix GCP request. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Keep the shared Anthropic request builder as the sole cache-control budget owner, with the Vertex adapter limited to client adaptation and transparent forwarding of finalized-payload hooks. Do we have a high-confidence way to reproduce the issue? Yes at source level. Current main reapplies cache policy to a finalized, fully budgeted payload, and the linked production logs show the corresponding five-marker rejection; this review did not run a live post-fix GCP request. Is this the best way to solve the issue? Yes. Removing the obsolete duplicate policy owner is narrower and less drift-prone than adding finalized-payload detection or another marker cap inside the Vertex plugin. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against 1bd04ac98389. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source -35, Tests -11. Total -46 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞👀 Command router queued. I will update this comment with the next step. |
|
@clawsweeper automerge |
|
🦞✅ Source: What merged:
Automerge notes:
The automerge loop is complete. Automerge progress:
|
Summary
anthropic-vertexrequests are intermittently rejected withFailoverError: LLM request rejected: A maximum of 4 blocks with cache_control may be provided. Found 5.In the reporter's production deployment the rejection fires whenever active-memory recall injects context into the turn, forcing a model fallback on every hit; the only workaround is disabling active-memory.src/llm/providers/anthropic.ts) builds the payload with full marker budgeting: it splits the system prompt at the cache boundary into a stable prefix (withcache_control) and an intentionally uncached dynamic suffix, marks the tool array, and gives the message pass only the remaining budget (4 - system - tools), so legitimate payloads can carry exactly four markers. The vertex plugin then ranapplyAnthropicPayloadPolicyToParamsagain on that finished payload inside itsonPayloadhook (createAnthropicVertexOnPayloadinextensions/anthropic-vertex/stream-runtime.ts). That helper is written for raw payloads (the agents-sideanthropic-transport-stream.tsbuilds its system blocks with no markers and boundary text intact): any system text block with no boundary marker and nocache_controlgets a marker added unconditionally. Post-transport, the dynamic suffix is exactly such a block, so the shim adds a fifth marker and Anthropic rejects the request. Turns that don't fill the budget stay at four and pass, which is why the failure tracks active-memory hits rather than every request.1a13c34f5b("close cache boundary transport gaps"), when the then-external transport did not understand the OpenClaw cache boundary and the plugin-level policy pass was the only splitter.eef24d452f("preserve provider prompt cache boundaries", shipped in v2026.6.2-beta.1) moved boundary splitting and marker budgeting natively into the shared transport, which turned the shim from a gap-closer into a double-application. The reporter's first failures on 2026.6.5 line up with that release.onPayloadhook unchanged. The shared transport is the single owner of cache-control budgeting for this API family, which is already how the plainanthropicprovider behaves (its wrapper applies the payload policy only forservice_tier, withoutenableCacheControl).extensions/anthropic-vertex/stream-runtime.tsremovescreateAnthropicVertexOnPayloadand its policy imports and passesoptions?.onPayloadstraight through (net −35 lines); the stale tests that encoded the pre-split payload shape are replaced with regression tests for the budgeted-payload invariant.docs/reference/config). Plugin surface unchanged (no exported signature, manifest,api.ts, or SDK changes;createAnthropicVertexStreamFnkeeps its signature). The shared transport, the agents-side anthropic transport (which builds raw payloads and correctly applies the policy once), and theanthropicplugin's service-tier wrapper are untouched.Reproduction
anthropic-vertexas the primary provider with prompt caching at its defaults, and enable active-memory for the agent.tool_resultfallback (the reporter hits this whenever a memory recall injects context).onPayloadshim re-runs the payload policy on the finished payload, addscache_controlto the dynamic suffix, and VertexStreamRawPredictreturns 400A maximum of 4 blocks with cache_control may be provided. Found 5., triggering a model fallback.Real behavior proof
Behavior addressed (#91982): an
anthropic-vertexpayload that already carries the transport's full cache-control budget is no longer re-marked on its way out, so requests that previously breached Anthropic's four-marker cap withFound 5now keep exactly the budgeted markers and the dynamic system suffix stays uncached.Real environment tested (Linux x64, Node v22.22.3 — Vitest against the production vertex stream runtime):
createAnthropicVertexStreamFnfrom the shipped plugin module, exercised through its injectable transport seam; the regression payload mirrors the shared transport's real output shape (split system prefix/suffix, marked tools, marked user text, marked trailing tool_result).Exact steps or command run after this patch:
pnpm test extensions/anthropic-vertex/stream-runtime.test.ts;pnpm tsgo:extensions && pnpm tsgo:extensions:test;node scripts/run-oxlint.mjsand format check on the changed files;.agents/skills/autoreview/scripts/autoreview --engine claude --thinking claude=max.Evidence after fix (Vitest output for the touched test file):
Observed result after fix: the transport options forward the caller's payload hook unchanged; a payload carrying the transport's four budgeted markers keeps exactly four after the hook path runs, with the dynamic suffix block still uncached; when the caller supplies no hook, the transport sets none.
What was not tested: a live Vertex AI
StreamRawPredictround-trip against GCP (no Vertex credentials in this environment). The reporter's GCP Cloud Logging captures in #91982 document the live failure mode this removes.Repro confirmation: both new tests fail on the unpatched tree — the marker count comes back as
expected 5 to be 4, the same five-marker breach the reporter logged, and a transport-injected hook appears where none was supplied — and pass with the fix, so the tests cover the production change.Risk / Mitigation
onPayloadruns, and the vertex stream function routes all requests through it — the same single-owner arrangement the plainanthropicprovider already ships with.onPayloadconsumer relied on the shim re-shaping its returned payload. Mitigation: every runtime supplier was audited (anthropic payload logging, command delivery logging, diagnostics byte accounting, and thebefore_provider_requestplugin hook); all of them inspect or patch the finished payload and none construct boundary-marked system arrays, and the plain anthropic path likewise sends hook results without re-policing.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Fixes #91982