fix(memory): hit prefix cache in background review fork (salvage #17276 + #25427)#25434
Merged
Conversation
Adds set_thread_tool_whitelist / clear_thread_tool_whitelist to hermes_cli/plugins.py. When set on the current thread, restricts which tools can pass through get_pre_tool_call_block_message; non-whitelisted tools are blocked with a configurable deny message. Mirrors the per-thread approval-callback pattern already used by set_approval_callback (tools/terminal_tool.py:190). Used by _spawn_background_review to deny non-memory/non-skill tools at runtime while inheriting the parent agent's full tools schema for prefix-cache parity (see follow-up commit). Tests cover allow / deny / clear / cross-thread isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as #17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Belt-and-suspenders complement to the cached-system-prompt inheritance: pin session_start and session_id to the parent's so any code path that re-renders parts of the system prompt (compression, plugin hooks) still produces byte-identical output. The cached-prompt assignment already short-circuits the normal rebuild path, but these pins guarantee parity even if a future code path bypasses the cache. Idea from simpolism's reference PR #25427 for #25322. Co-Authored-By: simpolism <32201324+simpolism@users.noreply.github.com>
…view fork - test_background_review_does_not_narrow_toolset_schema: review fork must NOT pass enabled_toolsets to AIAgent (full parent schema = matching Anthropic cache key on the 'tools' field). - test_background_review_installs_thread_local_whitelist: the runtime whitelist that replaces schema-level narrowing must contain memory + skills tools and exclude terminal / send_message / delegate_task / web_search / execute_code. - test_review_fork_inherits_parent_cached_system_prompt: new test for PR #17276's first root cause — the fork's _cached_system_prompt must equal the parent's byte-for-byte. - test_review_fork_pins_session_start_and_session_id: defensive belt-and- suspenders for the cached-prompt inheritance. Inverted the original test_background_review_agent_uses_restricted_toolsets (which asserted the schema-level narrowing) — that narrowing was the direct cause of #25322's cache miss, and the runtime whitelist replaces its safety claim without breaking cache parity. Refs #25322, #15204, PR #17276.
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-argument-type |
7 |
unsupported-operator |
3 |
unresolved-attribute |
3 |
First entries
tests/run_agent/test_provider_attribution_headers.py:155: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache"]` and `Unknown | str | dict[str, str] | ... omitted 3 union elements`
run_agent.py:13663: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str & ~AlwaysFalsy`, `int & ~AlwaysFalsy` in union `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:9038: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/run_agent/test_provider_attribution_headers.py:156: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache-TTL"]` and `Unknown | str | dict[str, str] | ... omitted 3 union elements`
run_agent.py:12417: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | dict[Unknown, Unknown]`
run_agent.py:8955: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | dict[str, str]`
run_agent.py:7449: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:7278: [invalid-argument-type] invalid-argument-type: Argument to function `_codex_cloudflare_headers` is incorrect: Expected `str`, found `Unknown | str | dict[str, str] | ... omitted 3 union elements`
tests/run_agent/test_provider_attribution_headers.py:90: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | dict[str, str]`
tests/agent/test_codex_cloudflare_headers.py:181: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["originator"]` and `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:13660: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
✅ Fixed issues (17):
| Rule | Count |
|---|---|
invalid-argument-type |
10 |
unresolved-attribute |
4 |
unsupported-operator |
3 |
First entries
run_agent.py:8908: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `Divergent | Divergent | str | ... omitted 4 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str & ~AlwaysFalsy`, `int & ~AlwaysFalsy` in union `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 4 union elements`
run_agent.py:8911: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:12370: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | Divergent | Divergent | ... omitted 4 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | Divergent | dict[str, str]`
tests/agent/test_codex_cloudflare_headers.py:181: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["originator"]` and `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 4 union elements`
run_agent.py:8637: [unresolved-attribute] unresolved-attribute: Attribute `strip` is not defined on `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown, Unknown] & ~AlwaysFalsy` in union `Divergent | Divergent | (str & ~AlwaysFalsy) | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:90: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | Divergent | dict[str, str]`
run_agent.py:13613: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 5 union elements`
run_agent.py:8992: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:8992: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str | None`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:13616: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:156: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache-TTL"]` and `Unknown | str | dict[str, str] | ... omitted 4 union elements`
run_agent.py:7402: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:155: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache"]` and `Unknown | str | dict[str, str] | ... omitted 4 union elements`
run_agent.py:8991: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:7231: [invalid-argument-type] invalid-argument-type: Argument to function `_codex_cloudflare_headers` is incorrect: Expected `str`, found `Unknown | str | dict[str, str] | ... omitted 4 union elements`
Unchanged: 4363 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
This was referenced May 14, 2026
Closed
Collaborator
1 task
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Background self-improvement review fork now produces byte-identical
tools+systemcache-key elements as the parent's main turn, restoring Anthropic/OpenRouter prefix-cache hits across the parent → review boundary. Measured by @WorldWriter: ~89% reduction in per-review-call cost, ~26% end-to-end on Sonnet 4.5.Closes #25322. Salvages #17276 (@WorldWriter, primary implementation) and incorporates the defensive pinning idea from #25427 (@simpolism).
Root cause
Three independent system-prompt bytes-difference sources between parent and review-fork:
Conversation started:timestamp regenerated from_hermes_now()per spawn (minute precision)session_idUUID per spawnAnthropic's prefix cache keys on
tools+system(in that order). The historicalenabled_toolsets=["memory", "skills"]narrowing alone was enough to bust thetoolscache key, even before the system-prompt divergences mattered.Changes
hermes_cli/plugins.py: newset_thread_tool_whitelist/clear_thread_tool_whiteliston the existingget_pre_tool_call_block_messagegate. Mirrors the per-thread approval-callback pattern.run_agent.py_spawn_background_review:enabled_toolsets=["memory", "skills"]from the AIAgent constructor → review's outboundtoolsschema matches parent's verbatim._cached_system_promptfrom parent → review's outboundsystembytes match parent's verbatim.session_start+session_idto parent's (belt-and-suspenders for any future code path that re-renders parts of the system prompt).{memory, skills_list, skill_view, skill_manage}on the bg-review daemon thread → non-memory/skill tools are denied at dispatch time, preserving the Background skill-review agent can perform non-skill side effects after creating a skill #15204 safety contract mechanically (not via prompt-trust).scripts/release.py: AUTHOR_MAP entries for @WorldWriter + @simpolism.test_background_review_agent_uses_restricted_toolsets(the schema-level narrowing it asserted was the cause of [Bug]: Background self-improvement review's fresh AIAgent generates a system prompt that bytes-differs from the parent, busting prompt cache + preprocessing tree-dedup #25322's miss); added 4 new tests coveringtoolsparity,_cached_system_promptinheritance,session_start/session_idpinning, and the runtime whitelist allow/deny pattern.Why we did NOT take simpolism's Option A or pure Option B
toolsbeforesystem, so byte-identical system bytes can't help when thetoolsarray differs.codex_app_server→codex_responsesdowngrade — that downgrade is load-bearing for review forks oncodex_app_serverruntime (which bypasses Hermes' own dispatch). This PR keeps it.Validation
E2E bytes-equality (run on this branch):
toolsschema (sha256)14b725bf...)systembytes (sha256)1dd9e294..., 3327 bytes)toolsfieldsystemRuntime whitelist (live, on this branch):
clear_thread_tool_whitelist()Tests:
Cost numbers from @WorldWriter's original E2E (real Sonnet 4.5 run with auto-triggered review):
cache_create/cache_read: 88,385 / 0 → 1,234 / 94,404Credit