feat(providers): add Anthropic cache_control ephemeral markers for explicit prompt caching

## Summary

Netclaw's Anthropic provider plugin currently sends no `cache_control` directives. Anthropic's API supports explicit `{"type": "ephemeral"}` markers on message content blocks that unlock their prompt cache with much more generous semantics than implicit prefix matching: **5-minute TTL**, cheaper cached tokens vs uncached, and works even across sessions with matching prefixes.

This is pure additive — provider-specific code in the Anthropic plugin, zero impact on other providers (OpenAI, OpenRouter, openai-compatible/vLLM/llama.cpp). Pays off immediately for any user running Netclaw against Claude.

## Context

This is a client-side performance follow-up from the cache-prefix-stability work (#608, PR #618) which made the system prompt prefix stable across turns at the structural level. That fix relies on **implicit** prefix matching — good for all providers, but leaves money on the table on Anthropic where **explicit** cache markers are available.

With PR #618 merged, Netclaw now produces a message list shaped like:
```
[0]      System  persisted prompt (SOUL/AGENTS/TOOLING)
[1]      System  static dynamic context (OnceAtStart layers + [session] + [attachments])
[2..N]   User/Assistant  conversation history
[last]   User  volatile tail
```

This is **exactly** the shape Anthropic's cache_control wants. We just need to emit markers at the right positions.

## What changes

Inside the Anthropic provider plugin's request builder:

1. **Add `cache_control: {type: "ephemeral"}` to the LAST content block of the persisted system prompt message** (index [0]). This marks everything up through and including that message as cacheable.
2. **Add a second cache_control marker at the end of the static dynamic context message** (index [1]) — anything the static block contains (tool index, skill index, [session], [attachments]) gets cached alongside the persisted prompt.
3. Anthropic supports up to **4 cache_control breakpoints** per request — this implementation uses 2, leaving headroom.
4. The volatile tail message (memory recall, current time, etc.) gets NO marker — it's per-turn by design.

## Expected impact

For multi-turn sessions on Anthropic:

- Turn 2+ prompt tokens that match the cached prefix are billed at the cached rate (~10% of normal input cost on Claude 3.5/4 Sonnet).
- Reduced latency on turn 2+ because Anthropic doesn't re-process the cached prefix.
- 5-minute TTL sliding window means a session with a quiet period up to 5 minutes still gets the cache hit on resumption.

## Files to touch

- `src/Netclaw.Providers/Anthropic/AnthropicProviderPlugin.cs` (or the underlying chat client it creates)
- Whatever the request-body builder is in the Anthropic provider — needs to emit the cache_control markers on system message content blocks
- Tests: new test verifying the markers appear at the expected positions when talking to Anthropic (fake HTTP capture handler)

## Verification

1. Unit test: capture the HTTP request body sent to Anthropic, assert `cache_control` markers are present on messages [0] and [1] and absent on the volatile tail.
2. Live test against Anthropic: use their usage response fields (`cache_creation_input_tokens`, `cache_read_input_tokens`) to confirm cache hits grow turn-over-turn.
3. The Netclaw eval suite's llama.cpp-specific timings parser (PR #615) doesn't currently extract Anthropic's cache fields — might need a small adaptation to capture them for the `[working-context]` performance metrics.

## Out of scope

- Non-Anthropic providers (OpenAI and OpenRouter have their own caching semantics; Claude Code / llama.cpp / vLLM rely on implicit prefix matching, which PR #618 already handles).
- Tuning where to put the cache breakpoints — 2 fixed markers at [0] and [1] is the starting point; optimization can come later based on measured usage.

## References

- [Anthropic docs: prompt caching](https://docs.claude.com/en/docs/build-with-claude/prompt-caching)
- #608 (closed by PR #618) — the structural cache-prefix work this builds on
- #609 (closed by PR #610) — session-sticky routing for self-hosted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): add Anthropic cache_control ephemeral markers for explicit prompt caching #621

Summary

Context

What changes

Expected impact

Files to touch

Verification

Out of scope

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(providers): add Anthropic cache_control ephemeral markers for explicit prompt caching #621

Description

Summary

Context

What changes

Expected impact

Files to touch

Verification

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions