fix(memory): review fork rebuilds system prompt, breaking prefix cache (~26% per-cycle / ~92% per-fork savings)#17089
Conversation
The forked review agent currently rebuilds its system prompt from scratch, producing a different 'Conversation started: ...' minute-precision timestamp than the parent's cached prompt. This invalidates the Anthropic prefix cache for the entire messages_snapshot, causing each background review to re-pay the full input-token cost. Empirically (Sonnet 4.5, ~4300-token prefix): - Without this fix: cache_create=4316, cache_read=0 - With this fix: cache_create=14, cache_read=4302 ~92% per-fork input-token cost reduction; savings scale O(N^2) with conversation length (each fork rereads cumulative history). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Self-closing — the fix doesn't actually deliver cache hits in the real spawn path. End-to-end on this branch with Root cause I missed: review agent is built with My earlier "E2E" scripts manually built the review fork without passing If I find a clean way to address the tools dimension, I'll open a fresh PR. |
|
Follow-up filed: #17276 — addresses both the system-prompt drift (this PR's original scope) and the tools-schema mismatch I missed here. Real E2E shows |
Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as #17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as NousResearch#17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as NousResearch#17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as NousResearch#17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as NousResearch#17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
~26% reduction on full cycle cost (10 main turns + 1 review fork) and ~92% on the review fork itself, in a typical 10-turn session against Sonnet 4.5. One-line fix: review fork now inherits parent's
_cached_system_promptinstead of rebuilding it with a drifted timestamp that invalidates the entire prefix cache._spawn_background_reviewrebuilds the review fork's system prompt via_build_system_prompt, which callshermes_time.now()at minute precision. The 1-minute drift between session start and review fire-time invalidates the Anthropic prefix cache for the entiremessages_snapshot. This silently violates the project's own caching policy (seeAGENTS.md):Empirical cache-mechanism verification (real Anthropic API + real hermes flow)
Driving
AIAgent.run_conversation()against the live Anthropic API (Sonnet 4.5), captured viaMessages.streaminterception. One main turn primes the cache, then review fires with a 10-turnmessages_snapshot. This isolates the system-prompt-drift effect: the only variable across the two review calls is whether the fix is applied.cache_createcache_readPricing (Sonnet 4.5, per 1M tokens): input $3, output $15, cache_write $3.75, cache_read $0.30.
The
cache_readjumping from 0 to 14,522 confirms the fix restores prefix-cache hits exactly as predicted; the 14,522 cached tokens correspond to the system prompt that main has already cached. (This anchor is single-turn-prime, so main hadn't cached the messages yet — review still pays for those.)Projected to a realistic 10-turn cycle
In real sessions main runs ~10 turns before the review fires (default
nudge_interval=10), so the messages also enter main's cache. Review WITH fix then hits the whole prefix, not just system. Building the per-turn ledger from the empirical anchor — assuming each main turn adds 200 input + 500 output tokens of substantive work:Per-fork savings reaches ~92% because by turn 10 main has pre-cached system + the entire message history; review with the fix only pays for
REVIEW_PROMPTitself. Cycle savings sits at ~26%.Sensitivity to per-turn output verbosity:
Triggers fire every 10/20/30/... turns; review cost scales O(N²) with conversation length while main scales O(N), so cycle savings ratio rises rather than dilutes over long sessions.
The divergence is one character
Two
AIAgentinstances 65 s apart calling_build_system_prompt():That single character is the entire bug surface. The fix makes review's
_cached_system_promptbyte-identical to main's, restoring prefix cache hits.Why safe
_cached_system_promptis astr— no shared mutable state_memory_store,_memory_enabled,_user_profile_enabledalready inherited at the same siteEnd-to-end repro script (~$0.05 to run)
Tested on macOS 14 (Darwin 24.6), Python 3.12.