Skip to content

[Feature]: Per-agent prompt cache control (cacheRetention config key) #17112

@osmanwa

Description

@osmanwa

Summary

Allow per-agent control over Anthropic prompt caching behavior via a cacheRetention key in the agent config within openclaw.json.

Problem to solve

Multi-agent deployments have agents with very different traffic patterns. High-traffic agents (e.g., a core orchestrator receiving every message) benefit from Anthropic's 5-minute ephemeral prompt caching — the cache write cost is amortized over many cache reads. But low-traffic agents (e.g., specialized agents that receive 1-2 messages per day) waste money on cache writes that expire before they're ever read.

With Anthropic Haiku 4.5 pricing:

  • Cache write (5m TTL): $1.25/MTok
  • Cache read: $0.10/MTok
  • Base input (no caching): $1.00/MTok

For a low-traffic agent with ~5,500 tokens of system prompt that gets 1-2 messages/day, the math clearly favors disabling caching entirely — paying $1.00/MTok base input instead of $1.25/MTok for a cache write that expires unused.

Currently there is no way to configure this. All agents use the same caching behavior.

Proposed solution

Add an optional cacheRetention key at the agent level in openclaw.json:

{
  "agents": {
    "list": [
      {
        "id": "my-low-traffic-agent",
        "name": "Specialist",
        "workspace": "~/.openclaw/workspace-specialist",
        "cacheRetention": "none"
      }
    ]
  }
}

Possible values:

  • "none" — do not send cache_control breakpoints to Anthropic; pay base input rate
  • "short" — use ephemeral 5-minute caching (current default behavior)
  • "long" — if/when Anthropic supports longer TTLs, opt in

This could also be set in agents.defaults for deployments that want to disable caching globally.

When cacheRetention is "none", the gateway should omit cache_control: { type: "ephemeral" } from the system prompt blocks sent to the Anthropic API.

Alternatives considered

  1. Reducing system prompt size — we already trimmed skills from 79→23 lines and 158→35 lines each. But even trimmed prompts still get cache-written on every conversation turn for low-traffic agents, wasting money.
  2. Using a cheaper model — doesn't address the architectural issue; caching overhead scales with any model.
  3. Grouping agents into high/low traffic tiers with different config files — OpenClaw doesn't support multiple config files per gateway instance.

Impact

  • Affected: Any multi-agent deployment with mixed traffic patterns (common in supervisor/worker architectures)
  • Severity: Medium — costs add up but don't block functionality
  • Frequency: Every API call to every agent, continuously
  • Consequence: In our 8-agent deployment, cache writes for 3 low-traffic agents account for ~25% of daily API spend ($0.50-0.75/day) with near-zero cache hits. Annualized: ~$200-275/year wasted on unused cache writes.

Evidence/examples

Real cost data from our deployment (Anthropic Admin API cost_report):

  • Daily spend: ~$2/day on Haiku 4.5
  • Cache write tokens: 82% of total cost
  • 3 of 8 agents receive <5 messages/day but still incur full cache write costs on every turn

Config that was attempted and rejected by the gateway (v2026.2.13):

{
  "id": "gmu-maha",
  "name": "Maha — GMU PhD Advisor",
  "workspace": "~/.openclaw/workspace-gmu-maha",
  "cacheRetention": "none"
}

Gateway error:

Config invalid
File: ~/.openclaw/openclaw.json
Problem:
  - agents.list.5: Unrecognized key: "cacheRetention"

Additional information

  • This should be backward-compatible — the default behavior (caching enabled) remains unchanged when the key is absent.
  • The implementation is likely straightforward: when building the Anthropic API request, check the agent's cacheRetention setting and conditionally omit cache_control breakpoints.
  • Related: Add Openrouter cache_control support for provider-side prompt caching. #9600 (OpenRouter cache_control support) addresses a similar concern from the provider routing side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions