Skip to content

feat(providers): add Anthropic cache_control ephemeral markers for explicit prompt caching #621

@Aaronontheweb

Description

@Aaronontheweb

Summary

Netclaw's Anthropic provider plugin currently sends no cache_control directives. Anthropic's API supports explicit {"type": "ephemeral"} markers on message content blocks that unlock their prompt cache with much more generous semantics than implicit prefix matching: 5-minute TTL, cheaper cached tokens vs uncached, and works even across sessions with matching prefixes.

This is pure additive — provider-specific code in the Anthropic plugin, zero impact on other providers (OpenAI, OpenRouter, openai-compatible/vLLM/llama.cpp). Pays off immediately for any user running Netclaw against Claude.

Context

This is a client-side performance follow-up from the cache-prefix-stability work (#608, PR #618) which made the system prompt prefix stable across turns at the structural level. That fix relies on implicit prefix matching — good for all providers, but leaves money on the table on Anthropic where explicit cache markers are available.

With PR #618 merged, Netclaw now produces a message list shaped like:

[0]      System  persisted prompt (SOUL/AGENTS/TOOLING)
[1]      System  static dynamic context (OnceAtStart layers + [session] + [attachments])
[2..N]   User/Assistant  conversation history
[last]   User  volatile tail

This is exactly the shape Anthropic's cache_control wants. We just need to emit markers at the right positions.

What changes

Inside the Anthropic provider plugin's request builder:

  1. Add cache_control: {type: "ephemeral"} to the LAST content block of the persisted system prompt message (index [0]). This marks everything up through and including that message as cacheable.
  2. Add a second cache_control marker at the end of the static dynamic context message (index [1]) — anything the static block contains (tool index, skill index, [session], [attachments]) gets cached alongside the persisted prompt.
  3. Anthropic supports up to 4 cache_control breakpoints per request — this implementation uses 2, leaving headroom.
  4. The volatile tail message (memory recall, current time, etc.) gets NO marker — it's per-turn by design.

Expected impact

For multi-turn sessions on Anthropic:

  • Turn 2+ prompt tokens that match the cached prefix are billed at the cached rate (~10% of normal input cost on Claude 3.5/4 Sonnet).
  • Reduced latency on turn 2+ because Anthropic doesn't re-process the cached prefix.
  • 5-minute TTL sliding window means a session with a quiet period up to 5 minutes still gets the cache hit on resumption.

Files to touch

  • src/Netclaw.Providers/Anthropic/AnthropicProviderPlugin.cs (or the underlying chat client it creates)
  • Whatever the request-body builder is in the Anthropic provider — needs to emit the cache_control markers on system message content blocks
  • Tests: new test verifying the markers appear at the expected positions when talking to Anthropic (fake HTTP capture handler)

Verification

  1. Unit test: capture the HTTP request body sent to Anthropic, assert cache_control markers are present on messages [0] and [1] and absent on the volatile tail.
  2. Live test against Anthropic: use their usage response fields (cache_creation_input_tokens, cache_read_input_tokens) to confirm cache hits grow turn-over-turn.
  3. The Netclaw eval suite's llama.cpp-specific timings parser (PR feat(providers): parse llama.cpp timings for cache + performance metrics #615) doesn't currently extract Anthropic's cache fields — might need a small adaptation to capture them for the [working-context] performance metrics.

Out of scope

  • Non-Anthropic providers (OpenAI and OpenRouter have their own caching semantics; Claude Code / llama.cpp / vLLM rely on implicit prefix matching, which PR feat(sessions): cache-stable message assembly with volatile User-role tail #618 already handles).
  • Tuning where to put the cache breakpoints — 2 fixed markers at [0] and [1] is the starting point; optimization can come later based on measured usage.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions