-
-
Notifications
You must be signed in to change notification settings - Fork 52.5k
Description
Summary
Allow per-agent control over Anthropic prompt caching behavior via a cacheRetention key in the agent config within openclaw.json.
Problem to solve
Multi-agent deployments have agents with very different traffic patterns. High-traffic agents (e.g., a core orchestrator receiving every message) benefit from Anthropic's 5-minute ephemeral prompt caching — the cache write cost is amortized over many cache reads. But low-traffic agents (e.g., specialized agents that receive 1-2 messages per day) waste money on cache writes that expire before they're ever read.
With Anthropic Haiku 4.5 pricing:
- Cache write (5m TTL): $1.25/MTok
- Cache read: $0.10/MTok
- Base input (no caching): $1.00/MTok
For a low-traffic agent with ~5,500 tokens of system prompt that gets 1-2 messages/day, the math clearly favors disabling caching entirely — paying $1.00/MTok base input instead of $1.25/MTok for a cache write that expires unused.
Currently there is no way to configure this. All agents use the same caching behavior.
Proposed solution
Add an optional cacheRetention key at the agent level in openclaw.json:
{
"agents": {
"list": [
{
"id": "my-low-traffic-agent",
"name": "Specialist",
"workspace": "~/.openclaw/workspace-specialist",
"cacheRetention": "none"
}
]
}
}Possible values:
"none"— do not sendcache_controlbreakpoints to Anthropic; pay base input rate"short"— use ephemeral 5-minute caching (current default behavior)"long"— if/when Anthropic supports longer TTLs, opt in
This could also be set in agents.defaults for deployments that want to disable caching globally.
When cacheRetention is "none", the gateway should omit cache_control: { type: "ephemeral" } from the system prompt blocks sent to the Anthropic API.
Alternatives considered
- Reducing system prompt size — we already trimmed skills from 79→23 lines and 158→35 lines each. But even trimmed prompts still get cache-written on every conversation turn for low-traffic agents, wasting money.
- Using a cheaper model — doesn't address the architectural issue; caching overhead scales with any model.
- Grouping agents into high/low traffic tiers with different config files — OpenClaw doesn't support multiple config files per gateway instance.
Impact
- Affected: Any multi-agent deployment with mixed traffic patterns (common in supervisor/worker architectures)
- Severity: Medium — costs add up but don't block functionality
- Frequency: Every API call to every agent, continuously
- Consequence: In our 8-agent deployment, cache writes for 3 low-traffic agents account for ~25% of daily API spend ($0.50-0.75/day) with near-zero cache hits. Annualized: ~$200-275/year wasted on unused cache writes.
Evidence/examples
Real cost data from our deployment (Anthropic Admin API cost_report):
- Daily spend: ~$2/day on Haiku 4.5
- Cache write tokens: 82% of total cost
- 3 of 8 agents receive <5 messages/day but still incur full cache write costs on every turn
Config that was attempted and rejected by the gateway (v2026.2.13):
{
"id": "gmu-maha",
"name": "Maha — GMU PhD Advisor",
"workspace": "~/.openclaw/workspace-gmu-maha",
"cacheRetention": "none"
}Gateway error:
Config invalid
File: ~/.openclaw/openclaw.json
Problem:
- agents.list.5: Unrecognized key: "cacheRetention"
Additional information
- This should be backward-compatible — the default behavior (caching enabled) remains unchanged when the key is absent.
- The implementation is likely straightforward: when building the Anthropic API request, check the agent's
cacheRetentionsetting and conditionally omitcache_controlbreakpoints. - Related: Add Openrouter cache_control support for provider-side prompt caching. #9600 (OpenRouter cache_control support) addresses a similar concern from the provider routing side.