Environment
- OpenClaw version: 2026.6.5
- Plugin:
@openclaw/anthropic-vertex-provider (same version)
- Platform: GCE, Debian 12, Node 24
- Provider:
anthropic-vertex/claude-haiku-4-5 (primary), claude-sonnet-4-6 (fallback)
Summary
When active-memory is enabled for an agent using anthropic-vertex-provider,
the gateway intermittently rejects requests with:
FailoverError: LLM request rejected:
A maximum of 4 blocks with cache_control may be provided. Found 5.
The error is intermittent — it only triggers when active-memory finds relevant
memories and injects them. When no memories are found, the request succeeds.
Root cause (traced in source)
applyAnthropicCacheControlToSystem in src/agents/anthropic-payload-policy.ts
iterates over the system block array and adds cache_control: ephemeral to any
block that has no cache_control and no OPENCLAW_CACHE_BOUNDARY marker in its
text.
The native Anthropic provider (buildAnthropicSystemBlocks) correctly splits
the system prompt into a stable prefix (with CC) and a date/context dynamic
suffix (without CC). active-memory then injects recalled memories as
prependContext into the user message, producing a second user content block
that also carries cache_control.
When the vertex plugin runs applyAnthropicCacheControlToSystem on the already-
split system array, it sees the dynamic suffix block (no boundary marker, no CC)
and adds CC to it. This produces:
| # |
Source |
Block |
| 1 |
system |
stable prefix (CC — correct) |
| 2 |
system |
dynamic suffix (CC — added erroneously by vertex plugin) |
| 3 |
tools |
last tool definition (CC) |
| 4 |
messages |
prependContext block in user message (CC) |
| 5 |
messages |
actual user message block (CC) |
Total: 5 → rejected by Anthropic API.
Without active-memory (no prependContext → single user message block → 1 CC
from messages), the erroneous system CC still exists but total stays at 4 and
requests succeed.
Steps to reproduce
- Configure
anthropic-vertex-provider as primary model for an agent
- Enable
active-memory for that agent in openclaw.json
- Ensure the agent has stored memories that match the user's query
- Send a message that triggers a memory recall hit (summaryChars > 0)
- The
StreamRawPredict call to Vertex AI returns 400:
{"type":"error","error":{"type":"invalid_request_error","message":"A maximum of 4 blocks with cache_control may be provided. Found 5."}}
Without a memory hit (empty recall), the same message succeeds.
GCP Cloud Logging evidence (real production errors)
2026-06-10T14:37:25Z StreamRawPredict claude-haiku-4-5
{"type":"error","error":{"type":"invalid_request_error","message":"A maximum of 4 blocks with cache_control may be provided. Found 5."},"request_id":"req_vrtx_011CbukPQbsjAGaKJVqrQhsv"}
2026-06-10T14:37:41Z model_fallback_decision candidate_succeeded
fallbackStepFromModel: anthropic-vertex/claude-haiku-4-5
fallbackStepToModel: anthropic-vertex/claude-sonnet-4-6
4 occurrences in a 3-hour window (13:07, 13:52, 14:05, 14:37 UTC 2026-06-10).
Suggested fix
applyAnthropicCacheControlToSystem should check whether the system array was
already processed by buildAnthropicSystemBlocks (i.e., whether any block
already carries cache_control) and skip adding CC to remaining blocks.
Alternatively, cap CC additions so the system array contributes at most 1
cache_control block total.
Workaround
None that preserves active-memory functionality. Disabling active-memory
eliminates the extra block and the error disappears.
Related
Environment
@openclaw/anthropic-vertex-provider(same version)anthropic-vertex/claude-haiku-4-5(primary),claude-sonnet-4-6(fallback)Summary
When
active-memoryis enabled for an agent usinganthropic-vertex-provider,the gateway intermittently rejects requests with:
The error is intermittent — it only triggers when active-memory finds relevant
memories and injects them. When no memories are found, the request succeeds.
Root cause (traced in source)
applyAnthropicCacheControlToSysteminsrc/agents/anthropic-payload-policy.tsiterates over the system block array and adds
cache_control: ephemeralto anyblock that has no
cache_controland noOPENCLAW_CACHE_BOUNDARYmarker in itstext.
The native Anthropic provider (
buildAnthropicSystemBlocks) correctly splitsthe system prompt into a stable prefix (with CC) and a date/context dynamic
suffix (without CC).
active-memorythen injects recalled memories asprependContextinto the user message, producing a second user content blockthat also carries
cache_control.When the vertex plugin runs
applyAnthropicCacheControlToSystemon the already-split system array, it sees the dynamic suffix block (no boundary marker, no CC)
and adds CC to it. This produces:
Total: 5 → rejected by Anthropic API.
Without
active-memory(no prependContext → single user message block → 1 CCfrom messages), the erroneous system CC still exists but total stays at 4 and
requests succeed.
Steps to reproduce
anthropic-vertex-provideras primary model for an agentactive-memoryfor that agent inopenclaw.jsonStreamRawPredictcall to Vertex AI returns 400:{"type":"error","error":{"type":"invalid_request_error","message":"A maximum of 4 blocks with cache_control may be provided. Found 5."}}Without a memory hit (empty recall), the same message succeeds.
GCP Cloud Logging evidence (real production errors)
4 occurrences in a 3-hour window (13:07, 13:52, 14:05, 14:37 UTC 2026-06-10).
Suggested fix
applyAnthropicCacheControlToSystemshould check whether the system array wasalready processed by
buildAnthropicSystemBlocks(i.e., whether any blockalready carries
cache_control) and skip adding CC to remaining blocks.Alternatively, cap CC additions so the system array contributes at most 1
cache_controlblock total.Workaround
None that preserves
active-memoryfunctionality. Disablingactive-memoryeliminates the extra block and the error disappears.
Related