[Feature]: Per-agent prompt cache control (cacheRetention config key)

## Summary

Allow per-agent control over Anthropic prompt caching behavior via a `cacheRetention` key in the agent config within `openclaw.json`.

## Problem to solve

Multi-agent deployments have agents with very different traffic patterns. High-traffic agents (e.g., a core orchestrator receiving every message) benefit from Anthropic's 5-minute ephemeral prompt caching — the cache write cost is amortized over many cache reads. But low-traffic agents (e.g., specialized agents that receive 1-2 messages per day) **waste money** on cache writes that expire before they're ever read.

With Anthropic Haiku 4.5 pricing:
- Cache write (5m TTL): **$1.25/MTok**
- Cache read: **$0.10/MTok**  
- Base input (no caching): **$1.00/MTok**

For a low-traffic agent with ~5,500 tokens of system prompt that gets 1-2 messages/day, the math clearly favors disabling caching entirely — paying $1.00/MTok base input instead of $1.25/MTok for a cache write that expires unused.

Currently there is no way to configure this. All agents use the same caching behavior.

## Proposed solution

Add an optional `cacheRetention` key at the agent level in `openclaw.json`:

```json
{
  "agents": {
    "list": [
      {
        "id": "my-low-traffic-agent",
        "name": "Specialist",
        "workspace": "~/.openclaw/workspace-specialist",
        "cacheRetention": "none"
      }
    ]
  }
}
```

Possible values:
- `"none"` — do not send `cache_control` breakpoints to Anthropic; pay base input rate
- `"short"` — use ephemeral 5-minute caching (current default behavior)
- `"long"` — if/when Anthropic supports longer TTLs, opt in

This could also be set in `agents.defaults` for deployments that want to disable caching globally.

When `cacheRetention` is `"none"`, the gateway should omit `cache_control: { type: "ephemeral" }` from the system prompt blocks sent to the Anthropic API.

## Alternatives considered

1. **Reducing system prompt size** — we already trimmed skills from 79→23 lines and 158→35 lines each. But even trimmed prompts still get cache-written on every conversation turn for low-traffic agents, wasting money.
2. **Using a cheaper model** — doesn't address the architectural issue; caching overhead scales with any model.
3. **Grouping agents into high/low traffic tiers with different config files** — OpenClaw doesn't support multiple config files per gateway instance.

## Impact

- **Affected:** Any multi-agent deployment with mixed traffic patterns (common in supervisor/worker architectures)
- **Severity:** Medium — costs add up but don't block functionality
- **Frequency:** Every API call to every agent, continuously
- **Consequence:** In our 8-agent deployment, cache writes for 3 low-traffic agents account for ~25% of daily API spend ($0.50-0.75/day) with near-zero cache hits. Annualized: ~$200-275/year wasted on unused cache writes.

## Evidence/examples

Real cost data from our deployment (Anthropic Admin API `cost_report`):
- Daily spend: ~$2/day on Haiku 4.5
- Cache write tokens: 82% of total cost
- 3 of 8 agents receive <5 messages/day but still incur full cache write costs on every turn

Config that was attempted and rejected by the gateway (v2026.2.13):
```json
{
  "id": "gmu-maha",
  "name": "Maha — GMU PhD Advisor",
  "workspace": "~/.openclaw/workspace-gmu-maha",
  "cacheRetention": "none"
}
```

Gateway error:
```
Config invalid
File: ~/.openclaw/openclaw.json
Problem:
  - agents.list.5: Unrecognized key: "cacheRetention"
```

## Additional information

- This should be backward-compatible — the default behavior (caching enabled) remains unchanged when the key is absent.
- The implementation is likely straightforward: when building the Anthropic API request, check the agent's `cacheRetention` setting and conditionally omit `cache_control` breakpoints.
- Related: #9600 (OpenRouter cache_control support) addresses a similar concern from the provider routing side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Per-agent prompt cache control (cacheRetention config key) #17112

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Per-agent prompt cache control (cacheRetention config key) #17112

Description

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions