Skip to content

Feature Request: Prompt Caching support for Anthropic API #29441

@xptmxm888-a11y

Description

@xptmxm888-a11y

Feature Request: Anthropic Prompt Caching

What

Add support for Anthropic's prompt caching (cache_control: {"type": "ephemeral"}) on system prompts and long context blocks.

Why

Currently, every request re-sends the full context (system prompt + workspace files + conversation history) to the API without caching. With prompt caching enabled:

  • 90% token cost reduction on cached portions (cache read = 10% of input cost)
  • Faster response times on cache hits
  • Significant savings for long sessions with large system prompts (MEMORY.md, AGENTS.md, SOUL.md, etc.)

How it works (Anthropic API)

Anthropic supports ephemeral cache points via the cache_control field:

{
  "type": "text",
  "text": "<long system prompt>",
  "cache_control": { "type": "ephemeral" }
}

Cache TTL: 5 minutes. Cache hit pricing: ~10% of normal input token cost.

Suggested Implementation

  1. Add promptCaching: true/false option to agent config
  2. When enabled, attach cache_control breakpoints to:
    • End of system prompt block
    • End of injected workspace files (MEMORY.md, AGENTS.md, etc.)
    • Optionally: at a configurable position in conversation history
  3. Only apply when provider is anthropic (not proxies like OpenRouter/LaoZhang)

References

Impact

Users running long sessions with large workspace context (like me with MEMORY.md + AGENTS.md + SOUL.md injected every request) would see immediate cost reduction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions