You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Netclaw's Anthropic provider plugin currently sends no cache_control directives. Anthropic's API supports explicit {"type": "ephemeral"} markers on message content blocks that unlock their prompt cache with much more generous semantics than implicit prefix matching: 5-minute TTL, cheaper cached tokens vs uncached, and works even across sessions with matching prefixes.
This is pure additive — provider-specific code in the Anthropic plugin, zero impact on other providers (OpenAI, OpenRouter, openai-compatible/vLLM/llama.cpp). Pays off immediately for any user running Netclaw against Claude.
Context
This is a client-side performance follow-up from the cache-prefix-stability work (#608, PR #618) which made the system prompt prefix stable across turns at the structural level. That fix relies on implicit prefix matching — good for all providers, but leaves money on the table on Anthropic where explicit cache markers are available.
With PR #618 merged, Netclaw now produces a message list shaped like:
[0] System persisted prompt (SOUL/AGENTS/TOOLING)
[1] System static dynamic context (OnceAtStart layers + [session] + [attachments])
[2..N] User/Assistant conversation history
[last] User volatile tail
This is exactly the shape Anthropic's cache_control wants. We just need to emit markers at the right positions.
What changes
Inside the Anthropic provider plugin's request builder:
Add cache_control: {type: "ephemeral"} to the LAST content block of the persisted system prompt message (index [0]). This marks everything up through and including that message as cacheable.
Add a second cache_control marker at the end of the static dynamic context message (index [1]) — anything the static block contains (tool index, skill index, [session], [attachments]) gets cached alongside the persisted prompt.
Anthropic supports up to 4 cache_control breakpoints per request — this implementation uses 2, leaving headroom.
The volatile tail message (memory recall, current time, etc.) gets NO marker — it's per-turn by design.
Expected impact
For multi-turn sessions on Anthropic:
Turn 2+ prompt tokens that match the cached prefix are billed at the cached rate (~10% of normal input cost on Claude 3.5/4 Sonnet).
Reduced latency on turn 2+ because Anthropic doesn't re-process the cached prefix.
5-minute TTL sliding window means a session with a quiet period up to 5 minutes still gets the cache hit on resumption.
Files to touch
src/Netclaw.Providers/Anthropic/AnthropicProviderPlugin.cs (or the underlying chat client it creates)
Whatever the request-body builder is in the Anthropic provider — needs to emit the cache_control markers on system message content blocks
Tests: new test verifying the markers appear at the expected positions when talking to Anthropic (fake HTTP capture handler)
Verification
Unit test: capture the HTTP request body sent to Anthropic, assert cache_control markers are present on messages [0] and [1] and absent on the volatile tail.
Live test against Anthropic: use their usage response fields (cache_creation_input_tokens, cache_read_input_tokens) to confirm cache hits grow turn-over-turn.
Summary
Netclaw's Anthropic provider plugin currently sends no
cache_controldirectives. Anthropic's API supports explicit{"type": "ephemeral"}markers on message content blocks that unlock their prompt cache with much more generous semantics than implicit prefix matching: 5-minute TTL, cheaper cached tokens vs uncached, and works even across sessions with matching prefixes.This is pure additive — provider-specific code in the Anthropic plugin, zero impact on other providers (OpenAI, OpenRouter, openai-compatible/vLLM/llama.cpp). Pays off immediately for any user running Netclaw against Claude.
Context
This is a client-side performance follow-up from the cache-prefix-stability work (#608, PR #618) which made the system prompt prefix stable across turns at the structural level. That fix relies on implicit prefix matching — good for all providers, but leaves money on the table on Anthropic where explicit cache markers are available.
With PR #618 merged, Netclaw now produces a message list shaped like:
This is exactly the shape Anthropic's cache_control wants. We just need to emit markers at the right positions.
What changes
Inside the Anthropic provider plugin's request builder:
cache_control: {type: "ephemeral"}to the LAST content block of the persisted system prompt message (index [0]). This marks everything up through and including that message as cacheable.Expected impact
For multi-turn sessions on Anthropic:
Files to touch
src/Netclaw.Providers/Anthropic/AnthropicProviderPlugin.cs(or the underlying chat client it creates)Verification
cache_controlmarkers are present on messages [0] and [1] and absent on the volatile tail.cache_creation_input_tokens,cache_read_input_tokens) to confirm cache hits grow turn-over-turn.[working-context]performance metrics.Out of scope
References