feat(caching): multi-block system prompt with tiered TTLs (v2)#5713
Open
Deland78 wants to merge 2 commits into
Open
feat(caching): multi-block system prompt with tiered TTLs (v2)#5713Deland78 wants to merge 2 commits into
Deland78 wants to merge 2 commits into
Conversation
Refactor prompt caching to use structured SystemPromptBlocks with
per-block cache_control markers instead of a single monolithic system
prompt. This maximizes Anthropic prompt cache hits by isolating volatile
content (timestamps, platform hints) from stable content (identity,
skills, memory).
Architecture:
- static block (1h TTL): identity, tool guidance, skills, model-specific
guidance — cross-session stable
- session block (5m TTL): memory, context files, custom system_message —
session-stable
- ephemeral block (none): timestamp, platform hints, alibaba workaround —
changes per-turn
New public API in agent/prompt_caching.py:
- SystemPromptBlock, CacheMetrics, AggregatedCacheMetrics dataclasses
- build_system_content_blocks() — convert blocks to Anthropic format
- apply_anthropic_cache_control_v2() — multi-block + tool caching
- extract_cache_metrics() — per-call cache extraction (native + OpenRouter)
- aggregate_cache_metrics() — cross-turn aggregation
In run_agent.py:
- _build_system_prompt_blocks() assembles the three tiered blocks and
caches them on self._cached_system_blocks
- At API call time, blocks are converted to content blocks with
cache_control markers and sent as the system message
- Falls back to flat-string path for non-caching models
- Plugin context stays in user messages (unchanged from v1)
Test coverage:
- tests/agent/test_prompt_caching.py — 46 unit tests covering all v2
functions (data structures, marker building, content block conversion,
pre-structured detection, breakpoint budgeting, metrics)
- tests/agent/test_prompt_caching_v2.py — 38 additional tests for v2
integration (tool caching, budget interaction, backward compat)
- tests/test_prompt_caching_integration.py — 10 integration tests against
run_agent.py block assembly (tier structure, cache invalidation,
backward compat with v1 code paths)
Verified: 317 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks
|
Linking this into the #17459 direction. The overall cache architecture here may still be useful, but please keep it aligned with the simpler rule from #17459/#17476: stable cached prompt/cacheable prefix, volatile current time in ephemeral runtime/user-message/tool context. This PR should not be required as a prerequisite for fixing the immediate duplicate-tool cache bug (#17335), and it should not introduce hidden quiet-hours/control-plane policy. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactor Anthropic prompt caching to use a structured multi-block system prompt with per-block
cache_controlmarkers instead of a single monolithic system message. This maximizes cache hits by isolating volatile content (timestamps, platform hints) from stable content (identity, skills, memory).Architecture
The system prompt is now assembled as three
SystemPromptBlockinstances with different cache TTLs:system_message, memory store blocks (memory + user), external memory provider block, context files (AGENTS.md/CLAUDE.md/etc.)At API call time, blocks are converted to Anthropic content block format (`[{type: text, text: ..., cache_control: ...}, ...]`) and sent as the system message. Non-caching models fall through to the flat-string path unchanged.
New public API in `agent/prompt_caching.py`
The v1 `apply_anthropic_cache_control` function and `_apply_cache_marker` helper are preserved unchanged for backward compatibility.
Integration in `run_agent.py`
Test coverage
Verified: 317 tests passing (all of the above plus `tests/test_run_agent.py` regression suite).
Test plan
Platforms tested
Linux (WSL2, Ubuntu 22.04), Python 3.11
🤖 Generated with Claude Code