Feature Request: Anthropic Prompt Caching
What
Add support for Anthropic's prompt caching (cache_control: {"type": "ephemeral"}) on system prompts and long context blocks.
Why
Currently, every request re-sends the full context (system prompt + workspace files + conversation history) to the API without caching. With prompt caching enabled:
- 90% token cost reduction on cached portions (cache read = 10% of input cost)
- Faster response times on cache hits
- Significant savings for long sessions with large system prompts (MEMORY.md, AGENTS.md, SOUL.md, etc.)
How it works (Anthropic API)
Anthropic supports ephemeral cache points via the cache_control field:
{
"type": "text",
"text": "<long system prompt>",
"cache_control": { "type": "ephemeral" }
}
Cache TTL: 5 minutes. Cache hit pricing: ~10% of normal input token cost.
Suggested Implementation
- Add
promptCaching: true/false option to agent config
- When enabled, attach
cache_control breakpoints to:
- End of system prompt block
- End of injected workspace files (MEMORY.md, AGENTS.md, etc.)
- Optionally: at a configurable position in conversation history
- Only apply when provider is
anthropic (not proxies like OpenRouter/LaoZhang)
References
Impact
Users running long sessions with large workspace context (like me with MEMORY.md + AGENTS.md + SOUL.md injected every request) would see immediate cost reduction.
Feature Request: Anthropic Prompt Caching
What
Add support for Anthropic's prompt caching (
cache_control: {"type": "ephemeral"}) on system prompts and long context blocks.Why
Currently, every request re-sends the full context (system prompt + workspace files + conversation history) to the API without caching. With prompt caching enabled:
How it works (Anthropic API)
Anthropic supports ephemeral cache points via the
cache_controlfield:{ "type": "text", "text": "<long system prompt>", "cache_control": { "type": "ephemeral" } }Cache TTL: 5 minutes. Cache hit pricing: ~10% of normal input token cost.
Suggested Implementation
promptCaching: true/falseoption to agent configcache_controlbreakpoints to:anthropic(not proxies like OpenRouter/LaoZhang)References
Impact
Users running long sessions with large workspace context (like me with MEMORY.md + AGENTS.md + SOUL.md injected every request) would see immediate cost reduction.