feat(caching): multi-block system prompt with tiered TTLs (v2) by Deland78 · Pull Request #5713 · NousResearch/hermes-agent

Deland78 · 2026-04-07T03:27:00Z

Summary

Refactor Anthropic prompt caching to use a structured multi-block system prompt with per-block cache_control markers instead of a single monolithic system message. This maximizes cache hits by isolating volatile content (timestamps, platform hints) from stable content (identity, skills, memory).

Architecture

The system prompt is now assembled as three SystemPromptBlock instances with different cache TTLs:

Block	TTL	Contents
static	1h	Soul.md / default identity, tool-aware guidance (memory, session_search, skills), Nous subscription prompt, tool-use enforcement, model-specific operational guidance (Google/OpenAI), skills system prompt
session	5m	Custom `system_message`, memory store blocks (memory + user), external memory provider block, context files (AGENTS.md/CLAUDE.md/etc.)
ephemeral	none	Timestamp + session/model/provider line, Alibaba identity workaround, platform hints

At API call time, blocks are converted to Anthropic content block format (`[{type: text, text: ..., cache_control: ...}, ...]`) and sent as the system message. Non-caching models fall through to the flat-string path unchanged.

New public API in `agent/prompt_caching.py`

`SystemPromptBlock`, `CacheMetrics`, `AggregatedCacheMetrics` dataclasses
`build_system_content_blocks(blocks)` — convert blocks to Anthropic format
`apply_anthropic_cache_control_v2(messages, tools, cache_ttl, native_anthropic)` — multi-block + tool caching with budget management (max 4 breakpoints across tools + system + messages)
`extract_cache_metrics(usage, api_mode)` — per-call cache extraction supporting both native Anthropic (`cache_read_input_tokens`, `cache_creation_input_tokens`) and OpenRouter (`prompt_tokens_details.cached_tokens`) response formats
`aggregate_cache_metrics(metrics_list)` — cross-turn aggregation

The v1 `apply_anthropic_cache_control` function and `_apply_cache_marker` helper are preserved unchanged for backward compatibility.

Integration in `run_agent.py`

New `_build_system_prompt_blocks()` method assembles the three tiered blocks and caches them on `self._cached_system_blocks`
The existing `_build_system_prompt()` method still returns a flat string (for backward compatibility with code paths that expect one) but now delegates to the block builder
Cached blocks are invalidated on context compression (`_cached_system_blocks = None` alongside `_cached_system_prompt = None`)
At API call time, when `_use_prompt_caching` is enabled and `_cached_system_blocks` is populated, a multi-block path builds `{role: system, content: [...]}` with cache_control markers already set per block
Plugin turn context (`_plugin_turn_context`) remains reserved for future system-level plugin instructions; plugin context from pre_llm_call hooks still goes into user messages (unchanged)
Fallback flat-string path handles non-caching models and pre-structured content correctly

Test coverage

`tests/agent/test_prompt_caching.py` — 46 unit tests covering v1 (preserved) and v2 functions: data structures, cache markers, content block conversion, pre-structured detection, breakpoint budgeting, metrics extraction and aggregation
`tests/agent/test_prompt_caching_v2.py` — 38 additional integration tests for v2 behavior (tool caching interaction with system blocks, budget with pre-structured content, backward compatibility with v1 code paths)
`tests/test_prompt_caching_integration.py` — 10 integration tests against `run_agent.py` block assembly (three-block structure, tier TTLs, timestamp in ephemeral block only, cache invalidation, backward-compat string return, non-caching models unaffected)

Verified: 317 tests passing (all of the above plus `tests/test_run_agent.py` regression suite).

Test plan

All new v2 unit tests pass (`pytest tests/agent/test_prompt_caching.py tests/agent/test_prompt_caching_v2.py`)
Integration tests against `run_agent.py` block assembly pass (`pytest tests/test_prompt_caching_integration.py`)
Full run_agent.py regression suite passes (`pytest tests/test_run_agent.py`)
`run_agent` imports cleanly
Manual: verify cache hit rate improves on a multi-turn conversation with stable context files (reviewer action)
Manual: verify non-caching models (e.g. local Ollama) still work via flat-string fallback (reviewer action)

Platforms tested

Linux (WSL2, Ubuntu 22.04), Python 3.11

🤖 Generated with Claude Code

Refactor prompt caching to use structured SystemPromptBlocks with per-block cache_control markers instead of a single monolithic system prompt. This maximizes Anthropic prompt cache hits by isolating volatile content (timestamps, platform hints) from stable content (identity, skills, memory). Architecture: - static block (1h TTL): identity, tool guidance, skills, model-specific guidance — cross-session stable - session block (5m TTL): memory, context files, custom system_message — session-stable - ephemeral block (none): timestamp, platform hints, alibaba workaround — changes per-turn New public API in agent/prompt_caching.py: - SystemPromptBlock, CacheMetrics, AggregatedCacheMetrics dataclasses - build_system_content_blocks() — convert blocks to Anthropic format - apply_anthropic_cache_control_v2() — multi-block + tool caching - extract_cache_metrics() — per-call cache extraction (native + OpenRouter) - aggregate_cache_metrics() — cross-turn aggregation In run_agent.py: - _build_system_prompt_blocks() assembles the three tiered blocks and caches them on self._cached_system_blocks - At API call time, blocks are converted to content blocks with cache_control markers and sent as the system message - Falls back to flat-string path for non-caching models - Plugin context stays in user messages (unchanged from v1) Test coverage: - tests/agent/test_prompt_caching.py — 46 unit tests covering all v2 functions (data structures, marker building, content block conversion, pre-structured detection, breakpoint budgeting, metrics) - tests/agent/test_prompt_caching_v2.py — 38 additional tests for v2 integration (tool caching, budget interaction, backward compat) - tests/test_prompt_caching_integration.py — 10 integration tests against run_agent.py block assembly (tier structure, cache invalidation, backward compat with v1 code paths) Verified: 317 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

markojak · 2026-04-29T14:10:00Z

Linking this into the #17459 direction.

The overall cache architecture here may still be useful, but please keep it aligned with the simpler rule from #17459/#17476: stable cached prompt/cacheable prefix, volatile current time in ephemeral runtime/user-message/tool context.

This PR should not be required as a prerequisite for fixing the immediate duplicate-tool cache bug (#17335), and it should not introduce hidden quiet-hours/control-plane policy.

Deland78 and others added 2 commits April 6, 2026 21:50

Merge branch 'main' into feat/prompt-caching-v2

4bcbe5d

This was referenced Apr 23, 2026

feat: configurable prompt_caching.cache_ttl for Anthropic TTL tier #12659

Closed

Question: does the minute-precision timestamp in _build_system_prompt invalidate prompt caching for upstream inference servers? #15866

Closed

markojak mentioned this pull request Apr 29, 2026

Rework quiet-hours/time awareness: surface time to agent/tools, don't enforce in control plane #17459

Open

7 tasks

alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/anthropic Anthropic native Messages API labels Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(caching): multi-block system prompt with tiered TTLs (v2)#5713

feat(caching): multi-block system prompt with tiered TTLs (v2)#5713
Deland78 wants to merge 2 commits into
NousResearch:mainfrom
Deland78:feat/prompt-caching-v2

Deland78 commented Apr 7, 2026 •

edited

Loading

Uh oh!

markojak commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Deland78 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

New public API in `agent/prompt_caching.py`

Integration in `run_agent.py`

Test coverage

Test plan

Platforms tested

Uh oh!

markojak commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Deland78 commented Apr 7, 2026 •

edited

Loading