Skip to content

fix(memory): hit prefix cache in background review fork (salvage #17276 + #25427)#25434

Merged
teknium1 merged 5 commits into
mainfrom
hermes/hermes-2c8b79f0
May 14, 2026
Merged

fix(memory): hit prefix cache in background review fork (salvage #17276 + #25427)#25434
teknium1 merged 5 commits into
mainfrom
hermes/hermes-2c8b79f0

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Background self-improvement review fork now produces byte-identical tools + system cache-key elements as the parent's main turn, restoring Anthropic/OpenRouter prefix-cache hits across the parent → review boundary. Measured by @WorldWriter: ~89% reduction in per-review-call cost, ~26% end-to-end on Sonnet 4.5.

Closes #25322. Salvages #17276 (@WorldWriter, primary implementation) and incorporates the defensive pinning idea from #25427 (@simpolism).

Root cause

Three independent system-prompt bytes-difference sources between parent and review-fork:

  1. Conversation started: timestamp regenerated from _hermes_now() per spawn (minute precision)
  2. Fresh session_id UUID per spawn
  3. Skills_prompt + tool-aware guidance vary with the (now-removed) narrow toolset

Anthropic's prefix cache keys on tools + system (in that order). The historical enabled_toolsets=["memory", "skills"] narrowing alone was enough to bust the tools cache key, even before the system-prompt divergences mattered.

Changes

  • hermes_cli/plugins.py: new set_thread_tool_whitelist / clear_thread_tool_whitelist on the existing get_pre_tool_call_block_message gate. Mirrors the per-thread approval-callback pattern.
  • run_agent.py _spawn_background_review:
    • Drop enabled_toolsets=["memory", "skills"] from the AIAgent constructor → review's outbound tools schema matches parent's verbatim.
    • Inherit _cached_system_prompt from parent → review's outbound system bytes match parent's verbatim.
    • Defensively pin session_start + session_id to parent's (belt-and-suspenders for any future code path that re-renders parts of the system prompt).
    • Install thread-local whitelist of {memory, skills_list, skill_view, skill_manage} on the bg-review daemon thread → non-memory/skill tools are denied at dispatch time, preserving the Background skill-review agent can perform non-skill side effects after creating a skill #15204 safety contract mechanically (not via prompt-trust).
    • Append soft prompt instruction so the model knows up-front.
  • scripts/release.py: AUTHOR_MAP entries for @WorldWriter + @simpolism.
  • Tests: inverted test_background_review_agent_uses_restricted_toolsets (the schema-level narrowing it asserted was the cause of [Bug]: Background self-improvement review's fresh AIAgent generates a system prompt that bytes-differs from the parent, busting prompt cache + preprocessing tree-dedup #25322's miss); added 4 new tests covering tools parity, _cached_system_prompt inheritance, session_start/session_id pinning, and the runtime whitelist allow/deny pattern.

Why we did NOT take simpolism's Option A or pure Option B

Validation

E2E bytes-equality (run on this branch):

Before After
Parent vs review tools schema (sha256) mismatched (30 vs 4 tools) identical (sha 14b725bf...)
Parent vs review system bytes (sha256) mismatched (timestamp + session_id + skills_prompt) identical (sha 1dd9e294..., 3327 bytes)
Anthropic prefix cache key mismatch on tools field match through end of system

Runtime whitelist (live, on this branch):

Tool Result
memory, skill_view, skills_list, skill_manage allowed (in whitelist)
terminal, send_message, delegate_task, web_search, execute_code denied at dispatch
Other threads unaffected (thread-local)
After clear_thread_tool_whitelist() gate returns to no-op

Tests:

scripts/run_tests.sh \
  tests/run_agent/test_background_review.py \
  tests/run_agent/test_background_review_summary.py \
  tests/run_agent/test_background_review_toolset_restriction.py \
  tests/run_agent/test_background_review_cache_parity.py \
  tests/run_agent/test_codex_app_server_integration.py \
  tests/hermes_cli/test_plugins.py
============================== 97 passed in 3.56s ==============================

Cost numbers from @WorldWriter's original E2E (real Sonnet 4.5 run with auto-triggered review):

  • Per review-call cost: $0.331 → $0.035 (~89% reduction)
  • End-to-end per run: $0.848 → $0.629 (~26% reduction)
  • Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404

Credit

WorldWriter and others added 5 commits May 13, 2026 21:56
Adds set_thread_tool_whitelist / clear_thread_tool_whitelist to
hermes_cli/plugins.py. When set on the current thread, restricts which
tools can pass through get_pre_tool_call_block_message; non-whitelisted
tools are blocked with a configurable deny message.

Mirrors the per-thread approval-callback pattern already used by
set_approval_callback (tools/terminal_tool.py:190). Used by
_spawn_background_review to deny non-memory/non-skill tools at runtime
while inheriting the parent agent's full tools schema for prefix-cache
parity (see follow-up commit).

Tests cover allow / deny / clear / cross-thread isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background review fork is supposed to hit Anthropic's prefix cache on the
parent's messages_snapshot, but currently doesn't (cache_read=0 on every
fork). Two root causes, fixed in this commit:

1. System prompt is rebuilt at fork time. _cached_system_prompt starts as
   None, so run_conversation calls _build_system_prompt, which embeds a
   minute-precision "Conversation started: ..." timestamp. Reviews fire
   10+ turns after session start, so the minute differs from main's,
   producing a 1-character diff that invalidates the byte-exact cache key.
   Fix: inherit the parent's _cached_system_prompt directly (same idea as
   #17089, which was self-closed for only fixing this half).

2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for
   safety. Anthropic's cache key includes `tools`, which sits before
   `system` in the cache hierarchy, so even byte-identical `system` won't
   hit when `tools` differs from main's full set.
   Fix: drop the schema-level restriction so `tools` matches main, and
   deny non-whitelisted tools at runtime via the existing
   get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085,
   already called at all three dispatch sites). Install/clear a thread-
   local whitelist (added in the previous commit) on the daemon thread.
   Append a soft constraint to the review prompt so the model knows.

Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review):
- Per review-call cost: $0.331 → $0.035 (~89% reduction)
- End-to-end per run:   $0.848 → $0.629 (~26% reduction)
- Review fork cache_create / cache_read: 88,385 / 0  →  1,234 / 94,404

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Belt-and-suspenders complement to the cached-system-prompt inheritance:
pin session_start and session_id to the parent's so any code path that
re-renders parts of the system prompt (compression, plugin hooks)
still produces byte-identical output. The cached-prompt assignment
already short-circuits the normal rebuild path, but these pins
guarantee parity even if a future code path bypasses the cache.

Idea from simpolism's reference PR #25427 for #25322.

Co-Authored-By: simpolism <32201324+simpolism@users.noreply.github.com>
…view fork

- test_background_review_does_not_narrow_toolset_schema: review fork must
  NOT pass enabled_toolsets to AIAgent (full parent schema = matching
  Anthropic cache key on the 'tools' field).
- test_background_review_installs_thread_local_whitelist: the runtime
  whitelist that replaces schema-level narrowing must contain memory +
  skills tools and exclude terminal / send_message / delegate_task /
  web_search / execute_code.
- test_review_fork_inherits_parent_cached_system_prompt: new test for
  PR #17276's first root cause — the fork's _cached_system_prompt must
  equal the parent's byte-for-byte.
- test_review_fork_pins_session_start_and_session_id: defensive belt-and-
  suspenders for the cached-prompt inheritance.

Inverted the original test_background_review_agent_uses_restricted_toolsets
(which asserted the schema-level narrowing) — that narrowing was the
direct cause of #25322's cache miss, and the runtime whitelist replaces
its safety claim without breaking cache parity.

Refs #25322, #15204, PR #17276.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-2c8b79f0 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8299 on HEAD, 8299 on base (➖ 0)

🆕 New issues (13):

Rule Count
invalid-argument-type 7
unsupported-operator 3
unresolved-attribute 3
First entries
tests/run_agent/test_provider_attribution_headers.py:155: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache"]` and `Unknown | str | dict[str, str] | ... omitted 3 union elements`
run_agent.py:13663: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str & ~AlwaysFalsy`, `int & ~AlwaysFalsy` in union `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:9038: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/run_agent/test_provider_attribution_headers.py:156: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache-TTL"]` and `Unknown | str | dict[str, str] | ... omitted 3 union elements`
run_agent.py:12417: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | dict[Unknown, Unknown]`
run_agent.py:8955: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | dict[str, str]`
run_agent.py:7449: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:7278: [invalid-argument-type] invalid-argument-type: Argument to function `_codex_cloudflare_headers` is incorrect: Expected `str`, found `Unknown | str | dict[str, str] | ... omitted 3 union elements`
tests/run_agent/test_provider_attribution_headers.py:90: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | dict[str, str]`
tests/agent/test_codex_cloudflare_headers.py:181: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["originator"]` and `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:13660: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`

✅ Fixed issues (17):

Rule Count
invalid-argument-type 10
unresolved-attribute 4
unsupported-operator 3
First entries
run_agent.py:8908: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `Divergent | Divergent | str | ... omitted 4 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str & ~AlwaysFalsy`, `int & ~AlwaysFalsy` in union `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 4 union elements`
run_agent.py:8911: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:12370: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | Divergent | Divergent | ... omitted 4 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | Divergent | dict[str, str]`
tests/agent/test_codex_cloudflare_headers.py:181: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["originator"]` and `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 4 union elements`
run_agent.py:8637: [unresolved-attribute] unresolved-attribute: Attribute `strip` is not defined on `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown, Unknown] & ~AlwaysFalsy` in union `Divergent | Divergent | (str & ~AlwaysFalsy) | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:90: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | Divergent | dict[str, str]`
run_agent.py:13613: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 5 union elements`
run_agent.py:8992: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:8992: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str | None`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:13616: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:156: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache-TTL"]` and `Unknown | str | dict[str, str] | ... omitted 4 union elements`
run_agent.py:7402: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:155: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache"]` and `Unknown | str | dict[str, str] | ... omitted 4 union elements`
run_agent.py:8991: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:7231: [invalid-argument-type] invalid-argument-type: Argument to function `_codex_cloudflare_headers` is incorrect: Expected `str`, found `Unknown | str | dict[str, str] | ... omitted 4 union elements`

Unchanged: 4363 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit 8c6b0c9 into main May 14, 2026
15 of 16 checks passed
@teknium1 teknium1 deleted the hermes/hermes-2c8b79f0 branch May 14, 2026 05:12
@alt-glitch alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins tool/memory Memory tool and memory providers tool/skills Skills system (list, view, manage) labels May 14, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Salvages #17276 + #25427 into a unified fix for #25322. This is the comprehensive version with thread-local tool whitelist (from plugins.py) preserving both cache parity and mechanical safety.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/plugins Plugin system and bundled plugins P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers tool/skills Skills system (list, view, manage) type/perf Performance improvement or optimization

Projects

None yet

3 participants