fix(memory): hit prefix cache in background review fork (salvage #17276 + #25427) by teknium1 · Pull Request #25434 · NousResearch/hermes-agent

teknium1 · 2026-05-14T05:09:31Z

Summary

Background self-improvement review fork now produces byte-identical tools + system cache-key elements as the parent's main turn, restoring Anthropic/OpenRouter prefix-cache hits across the parent → review boundary. Measured by @WorldWriter: ~89% reduction in per-review-call cost, ~26% end-to-end on Sonnet 4.5.

Closes #25322. Salvages #17276 (@WorldWriter, primary implementation) and incorporates the defensive pinning idea from #25427 (@simpolism).

Root cause

Three independent system-prompt bytes-difference sources between parent and review-fork:

Conversation started: timestamp regenerated from _hermes_now() per spawn (minute precision)
Fresh session_id UUID per spawn
Skills_prompt + tool-aware guidance vary with the (now-removed) narrow toolset

Anthropic's prefix cache keys on tools + system (in that order). The historical enabled_toolsets=["memory", "skills"] narrowing alone was enough to bust the tools cache key, even before the system-prompt divergences mattered.

Changes

hermes_cli/plugins.py: new set_thread_tool_whitelist / clear_thread_tool_whitelist on the existing get_pre_tool_call_block_message gate. Mirrors the per-thread approval-callback pattern.
run_agent.py _spawn_background_review:
- Drop enabled_toolsets=["memory", "skills"] from the AIAgent constructor → review's outbound tools schema matches parent's verbatim.
- Inherit _cached_system_prompt from parent → review's outbound system bytes match parent's verbatim.
- Defensively pin session_start + session_id to parent's (belt-and-suspenders for any future code path that re-renders parts of the system prompt).
- Install thread-local whitelist of {memory, skills_list, skill_view, skill_manage} on the bg-review daemon thread → non-memory/skill tools are denied at dispatch time, preserving the Background skill-review agent can perform non-skill side effects after creating a skill #15204 safety contract mechanically (not via prompt-trust).
- Append soft prompt instruction so the model knows up-front.
scripts/release.py: AUTHOR_MAP entries for @WorldWriter + @simpolism.
Tests: inverted test_background_review_agent_uses_restricted_toolsets (the schema-level narrowing it asserted was the cause of [Bug]: Background self-improvement review's fresh AIAgent generates a system prompt that bytes-differs from the parent, busting prompt cache + preprocessing tree-dedup #25322's miss); added 4 new tests covering tools parity, _cached_system_prompt inheritance, session_start/session_id pinning, and the runtime whitelist allow/deny pattern.

Why we did NOT take simpolism's Option A or pure Option B

Option A (inherit cached prompt, keep narrow toolset): does NOT fix the cache miss. Anthropic's cache hashes tools before system, so byte-identical system bytes can't help when the tools array differs.
Pure Option B (inherit cached prompt + parent's enabled_toolsets, rely purely on prompt-layer instruction): cache hits the same as this PR, but loses Background skill-review agent can perform non-skill side effects after creating a skill #15204's mechanical safety claim. This PR keeps the schema broadening (for cache parity) AND adds the runtime gate (for safety), so we get both.
@simpolism's PR fix(agent): preserve prompt cache + tree-dedup across background review (reference impl for #25322 Option B) #25427 also dropped the codex_app_server → codex_responses downgrade — that downgrade is load-bearing for review forks on codex_app_server runtime (which bypasses Hermes' own dispatch). This PR keeps it.

Validation

E2E bytes-equality (run on this branch):

	Before	After
Parent vs review `tools` schema (sha256)	mismatched (30 vs 4 tools)	identical (sha `14b725bf...`)
Parent vs review `system` bytes (sha256)	mismatched (timestamp + session_id + skills_prompt)	identical (sha `1dd9e294...`, 3327 bytes)
Anthropic prefix cache key	mismatch on `tools` field	match through end of `system`

Runtime whitelist (live, on this branch):

Tool	Result
memory, skill_view, skills_list, skill_manage	allowed (in whitelist)
terminal, send_message, delegate_task, web_search, execute_code	denied at dispatch
Other threads	unaffected (thread-local)
After `clear_thread_tool_whitelist()`	gate returns to no-op

Tests:

scripts/run_tests.sh \
  tests/run_agent/test_background_review.py \
  tests/run_agent/test_background_review_summary.py \
  tests/run_agent/test_background_review_toolset_restriction.py \
  tests/run_agent/test_background_review_cache_parity.py \
  tests/run_agent/test_codex_app_server_integration.py \
  tests/hermes_cli/test_plugins.py
============================== 97 passed in 3.56s ==============================

Cost numbers from @WorldWriter's original E2E (real Sonnet 4.5 run with auto-triggered review):

Per review-call cost: $0.331 → $0.035 (~89% reduction)
End-to-end per run: $0.848 → $0.629 (~26% reduction)
Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404

Credit

@WorldWriter — original implementation (fix(memory): restore prefix cache hits in background review fork (~26% token saving per run) #17276), runtime-whitelist design, E2E measurement
@simpolism — bug filing + diagnosis ([Bug]: Background self-improvement review's fresh AIAgent generates a system prompt that bytes-differs from the parent, busting prompt cache + preprocessing tree-dedup #25322), reference Option-B impl (fix(agent): preserve prompt cache + tree-dedup across background review (reference impl for #25322 Option B) #25427), session_start/session_id pinning idea

Adds set_thread_tool_whitelist / clear_thread_tool_whitelist to hermes_cli/plugins.py. When set on the current thread, restricts which tools can pass through get_pre_tool_call_block_message; non-whitelisted tools are blocked with a configurable deny message. Mirrors the per-thread approval-callback pattern already used by set_approval_callback (tools/terminal_tool.py:190). Used by _spawn_background_review to deny non-memory/non-skill tools at runtime while inheriting the parent agent's full tools schema for prefix-cache parity (see follow-up commit). Tests cover allow / deny / clear / cross-thread isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Background review fork is supposed to hit Anthropic's prefix cache on the parent's messages_snapshot, but currently doesn't (cache_read=0 on every fork). Two root causes, fixed in this commit: 1. System prompt is rebuilt at fork time. _cached_system_prompt starts as None, so run_conversation calls _build_system_prompt, which embeds a minute-precision "Conversation started: ..." timestamp. Reviews fire 10+ turns after session start, so the minute differs from main's, producing a 1-character diff that invalidates the byte-exact cache key. Fix: inherit the parent's _cached_system_prompt directly (same idea as #17089, which was self-closed for only fixing this half). 2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for safety. Anthropic's cache key includes `tools`, which sits before `system` in the cache hierarchy, so even byte-identical `system` won't hit when `tools` differs from main's full set. Fix: drop the schema-level restriction so `tools` matches main, and deny non-whitelisted tools at runtime via the existing get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085, already called at all three dispatch sites). Install/clear a thread- local whitelist (added in the previous commit) on the daemon thread. Append a soft constraint to the review prompt so the model knows. Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review): - Per review-call cost: $0.331 → $0.035 (~89% reduction) - End-to-end per run: $0.848 → $0.629 (~26% reduction) - Review fork cache_create / cache_read: 88,385 / 0 → 1,234 / 94,404 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Belt-and-suspenders complement to the cached-system-prompt inheritance: pin session_start and session_id to the parent's so any code path that re-renders parts of the system prompt (compression, plugin hooks) still produces byte-identical output. The cached-prompt assignment already short-circuits the normal rebuild path, but these pins guarantee parity even if a future code path bypasses the cache. Idea from simpolism's reference PR #25427 for #25322. Co-Authored-By: simpolism <32201324+simpolism@users.noreply.github.com>

…view fork - test_background_review_does_not_narrow_toolset_schema: review fork must NOT pass enabled_toolsets to AIAgent (full parent schema = matching Anthropic cache key on the 'tools' field). - test_background_review_installs_thread_local_whitelist: the runtime whitelist that replaces schema-level narrowing must contain memory + skills tools and exclude terminal / send_message / delegate_task / web_search / execute_code. - test_review_fork_inherits_parent_cached_system_prompt: new test for PR #17276's first root cause — the fork's _cached_system_prompt must equal the parent's byte-for-byte. - test_review_fork_pins_session_start_and_session_id: defensive belt-and- suspenders for the cached-prompt inheritance. Inverted the original test_background_review_agent_uses_restricted_toolsets (which asserted the schema-level narrowing) — that narrowing was the direct cause of #25322's cache miss, and the runtime whitelist replaces its safety claim without breaking cache parity. Refs #25322, #15204, PR #17276.

github-actions · 2026-05-14T05:10:44Z

🔎 Lint report: `hermes/hermes-2c8b79f0` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8299 on HEAD, 8299 on base (➖ 0)

🆕 New issues (13):

Rule	Count
`invalid-argument-type`	7
`unsupported-operator`	3
`unresolved-attribute`	3

First entries

tests/run_agent/test_provider_attribution_headers.py:155: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache"]` and `Unknown | str | dict[str, str] | ... omitted 3 union elements`
run_agent.py:13663: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str & ~AlwaysFalsy`, `int & ~AlwaysFalsy` in union `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:9038: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/run_agent/test_provider_attribution_headers.py:156: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache-TTL"]` and `Unknown | str | dict[str, str] | ... omitted 3 union elements`
run_agent.py:12417: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | dict[Unknown, Unknown]`
run_agent.py:8955: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `str | Unknown | dict[Unknown | str, Unknown | str | dict[str, str]] | int | dict[Unknown, Unknown]`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | dict[str, str]`
run_agent.py:7449: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:7278: [invalid-argument-type] invalid-argument-type: Argument to function `_codex_cloudflare_headers` is incorrect: Expected `str`, found `Unknown | str | dict[str, str] | ... omitted 3 union elements`
tests/run_agent/test_provider_attribution_headers.py:90: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | dict[str, str]`
tests/agent/test_codex_cloudflare_headers.py:181: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["originator"]` and `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:13660: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`

✅ Fixed issues (17):

Rule	Count
`invalid-argument-type`	10
`unresolved-attribute`	4
`unsupported-operator`	3

First entries

run_agent.py:8908: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `int`, found `Divergent | Divergent | str | ... omitted 4 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str & ~AlwaysFalsy`, `int & ~AlwaysFalsy` in union `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 4 union elements`
run_agent.py:8911: [invalid-argument-type] invalid-argument-type: Argument to bound method `ContextCompressor.update_model` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:12370: [invalid-argument-type] invalid-argument-type: Argument to function `apply_anthropic_cache_control` is incorrect: Expected `bool`, found `int | Divergent | Divergent | ... omitted 4 union elements`
tests/agent/test_codex_cloudflare_headers.py:163: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | Divergent | dict[str, str]`
tests/agent/test_codex_cloudflare_headers.py:181: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["originator"]` and `(Unknown & ~AlwaysFalsy) | (str & ~AlwaysFalsy) | (dict[str, str] & ~AlwaysFalsy) | ... omitted 4 union elements`
run_agent.py:8637: [unresolved-attribute] unresolved-attribute: Attribute `strip` is not defined on `dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy`, `int & ~AlwaysFalsy`, `dict[Unknown, Unknown] & ~AlwaysFalsy` in union `Divergent | Divergent | (str & ~AlwaysFalsy) | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:90: [unresolved-attribute] unresolved-attribute: Attribute `startswith` is not defined on `dict[str, str]` in union `Unknown | str | Divergent | dict[str, str]`
run_agent.py:13613: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 5 union elements`
run_agent.py:8992: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:8992: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider_request_timeout` is incorrect: Expected `str | None`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:13616: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:156: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache-TTL"]` and `Unknown | str | dict[str, str] | ... omitted 4 union elements`
run_agent.py:7402: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 5 union elements`
tests/run_agent/test_provider_attribution_headers.py:155: [unsupported-operator] unsupported-operator: Operator `not in` is not supported between objects of type `Literal["X-OpenRouter-Cache"]` and `Unknown | str | dict[str, str] | ... omitted 4 union elements`
run_agent.py:8991: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `Divergent | Divergent | str | ... omitted 4 union elements`
run_agent.py:7231: [invalid-argument-type] invalid-argument-type: Argument to function `_codex_cloudflare_headers` is incorrect: Expected `str`, found `Unknown | str | dict[str, str] | ... omitted 4 union elements`

Unchanged: 4363 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

alt-glitch · 2026-05-14T05:14:30Z

Salvages #17276 + #25427 into a unified fix for #25322. This is the comprehensive version with thread-local tool whitelist (from plugins.py) preserving both cache parity and mechanical safety.

WorldWriter and others added 5 commits May 13, 2026 21:56

chore(release): map WorldWriter for PR #17276 salvage

45ec89a

teknium1 merged commit 8c6b0c9 into main May 14, 2026
15 of 16 checks passed

teknium1 deleted the hermes/hermes-2c8b79f0 branch May 14, 2026 05:12

BrewTestBot mentioned this pull request May 16, 2026

hermes-agent 2026.5.16 Homebrew/homebrew-core#283141

Merged

1 task

github-actions Bot mentioned this pull request May 17, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.7 to v2026.5.16 Docker-Hub-sirmark/docker-hermes-agent#6

Merged

alt-glitch mentioned this pull request May 21, 2026

background_review fork sends wider tools[] than parent, fragments Anthropic prefix cache (~50% wasted cache-write on long sessions) #29567

Closed

daimon-nous Bot mentioned this pull request May 21, 2026

fix(background_review): inherit parent's toolset config to keep tools[] cache-stable (~50% fewer cache-write tokens on long sessions) #29568

Closed

alt-glitch mentioned this pull request May 21, 2026

fix(background_review): inherit parent toolset config for tools[] cache parity (salvage #29568) #29704

Merged

JimStenstrom mentioned this pull request Jun 5, 2026

[Bug]: background-review fork advertises the full tool schema to LOCAL endpoints, making weak local models thrash the deny-wall #39996

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): hit prefix cache in background review fork (salvage #17276 + #25427)#25434

fix(memory): hit prefix cache in background review fork (salvage #17276 + #25427)#25434
teknium1 merged 5 commits into
mainfrom
hermes/hermes-2c8b79f0

teknium1 commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

alt-glitch commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 14, 2026

Summary

Root cause

Changes

Why we did NOT take simpolism's Option A or pure Option B

Validation

Credit

Uh oh!

github-actions Bot commented May 14, 2026

🔎 Lint report: hermes/hermes-2c8b79f0 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

alt-glitch commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔎 Lint report: `hermes/hermes-2c8b79f0` vs `origin/main`