fix(background_review): inherit parent's toolset config to keep tools[] cache-stable (~50% fewer cache-write tokens on long sessions)#29568
Conversation
…[] cache-stable ## Summary The background skill/memory-review fork constructed a child `AIAgent` without propagating `enabled_toolsets` / `disabled_toolsets` from the parent. When the parent narrowed its toolset (via `hermes tools disable` or `config.yaml`), the fork's default `enabled_toolsets=None` expanded to "all registered tools" — and the fork's outbound request body sent a wider `tools[]` array than the parent's main-turn request. Anthropic's prompt-cache key includes the `tools[]` array byte-for-byte, so this divergence forked the cache lineage on every nudge and forced a full prefix rewrite. On a captured ~4 hour Claude-via-Hermes session this cost roughly 4.3 M cache-write tokens — about half of those attributable to the per-nudge alternation between the main turn's narrowed `tools[]` and the review fork's wider `tools[]`. ## Goal Extend the byte-stability invariant established by PR NousResearch#17276 (which fixed `system`) to the `tools[]` slot of the request body, so the review fork's outbound request hits the parent's warmed Anthropic prefix cache regardless of how the parent's toolset is configured. ## Implementation Two-line change in `agent/background_review.py`: pass `enabled_toolsets=getattr(agent, "enabled_toolsets", None)` and the matching `disabled_toolsets` kwarg into the `AIAgent(...)` call inside `_spawn_background_review`. Adds an explanatory block comment that calls out the cache-key dependency and the relationship to PR NousResearch#17276. The post-construction runtime whitelist (`set_thread_tool_whitelist({memory, skills})`) is untouched — it still gates which tools the model is allowed to *dispatch*. This change aligns only what the request body *transmits*, not what the review is allowed to do, so the safety contract from issue NousResearch#15204 remains intact. ## Testing - `tests/run_agent/test_background_review_cache_parity.py`: new `test_review_fork_inherits_parent_toolset_config` asserts the parent's `enabled_toolsets` and `disabled_toolsets` reach the review-fork constructor as kwargs. - `tests/run_agent/test_background_review_toolset_restriction.py`: the existing `test_background_review_does_not_narrow_toolset_schema` was inverted (its old "must NOT pass enabled_toolsets" rule was built on the assumption that the parent always ran with the registry default — wrong in practice when the parent is narrowed). Renamed to `test_background_review_matches_parent_toolset_config` and updated to assert the parent's value propagates verbatim. - Verified the new positive test fails without the fix and passes with it. - Full suite for `test_background_review*`: ``` $ python -m pytest tests/run_agent/test_background_review.py \ tests/run_agent/test_background_review_summary.py \ tests/run_agent/test_background_review_toolset_restriction.py \ tests/run_agent/test_background_review_cache_parity.py -q 18 passed in 1.85s ``` ## Scope - `agent/background_review.py`: 2 added kwargs + explanatory comment. - Two test files: one new positive test, one inverted existing test. - No production code paths outside the review fork; no schema changes; no public-API changes. Refs: #1 (root-cause analysis with wire-level cache-write measurements). Extends PR NousResearch#17276's `system`-bytes invariant to the `tools[]` slot.
|
@daimon-nous reproduce and review pls |
Review: PR #29568Verdict: Approve — clean, correctly-scoped fix. Natural extension of the AnalysisBug confirmed on Fix is correct:
Tests are sound:
One edge case (non-blocking)If the parent explicitly disables Scope
All 18 background review tests pass locally. |
- Replace 18-line comment block with 3-line invariant statement - Trim test docstrings from multi-paragraph to single-line summaries - Trim assertion messages from 4-line to 2-line mismatch reports - Replace 5-line WHAT comments in stubs with 1-line WHY comments - Add ziliangdotme@gmail.com -> ziliangpeng to AUTHOR_MAP
e4f130c to
a7cbf3e
Compare
|
Merged via #29704. Your commit was cherry-picked onto current main with your authorship preserved in git log (rebase-merge). We trimmed the comment verbosity and added your email to AUTHOR_MAP in a follow-up commit. Thanks for the thorough analysis + fix @ziliangpeng! |
fix(background_review): inherit parent's toolset config to keep
tools[]cache-stableSummary
Fixes #29567. The background skill/memory-review fork (
agent/background_review.py) constructs its childAIAgentwithout propagatingenabled_toolsets/disabled_toolsetsfrom the parent. When the parent's toolset is narrowed from the registry default (e.g. viahermes tools disableorconfig.yaml), the fork's defaultenabled_toolsets=Noneexpands to "all registered tools" and the fork's outboundtools[]becomes a strict superset of the parent's. Anthropic's prompt-cache key includes thetools[]array byte-for-byte, so the divergence forks the cache lineage and forces a full prefix rewrite on every review nudge.This PR adds the two missing kwargs and aligns the review fork's
tools[]with the parent's, extending PR #17276'ssystem-bytes invariant one slot up the cache hierarchy.Goal
Eliminate cache-write overhead on the background review path. On a captured ~5 hour Sonnet-class session, review-fork requests account for ~51 % of the session's total
cache_creation_input_tokenseven though they sharemessages[0..N]andsystemwith the parent byte-for-byte — the divergence intools[]alone is enough to push every fork req into a cache miss. After the fix, fork requests read from the parent's warmed cache instead of rewriting.The safety contract from #15204 (the review fork must not dispatch terminal / send_message / delegate_task) is preserved: it is enforced by the post-construction
set_thread_tool_whitelist({memory, skills, …})call a few lines below, which gates dispatch, not what the request body transmits.Implementation
agent/background_review.py— two added kwargs in theAIAgent(...)call inside_spawn_background_review:Plus an explanatory block comment that calls out the cache-key dependency and the relationship to PR #17276.
Symmetric inheritance: when the parent's value is
None, the fork's is alsoNoneand both expand to the registry default identically; when the parent narrows, the fork inherits the narrowed set verbatim.Testing
Two test files updated:
tests/run_agent/test_background_review_cache_parity.py— new positive assertiontest_review_fork_inherits_parent_toolset_configconfirms both kwargs reach the constructor with the parent's values.tests/run_agent/test_background_review_toolset_restriction.py— the existingtest_background_review_does_not_narrow_toolset_schemawas renamed totest_background_review_matches_parent_toolset_configand inverted. Its prior invariant ("the fork must NOT passenabled_toolsets") was built on the implicit assumption that the parent always runs with the registry default; when the parent narrows, leaving the fork atNoneis what causes divergence. The corrected invariant is symmetric inheritance.Sanity check: the new positive test fails without the fix and passes with it.
python -m pytest tests/run_agent/test_background_review.py \ tests/run_agent/test_background_review_summary.py \ tests/run_agent/test_background_review_toolset_restriction.py \ tests/run_agent/test_background_review_cache_parity.py -qOutput:
End-to-end verification (capturing real
/v1/messagestraffic via a local HTTP proxy):tools[]hashes per convThe post-fix fork retains its other identifiers (the
_SKILL_REVIEW_PROMPTtext on the last user message, the absence ofoutput_config/thinkingat the top level — those flow through the secondary completion adapter), but itstools[]array now hashes identically to the parent's.Scope
agent/background_review.py: two added kwargs + explanatory comment.Related: extends PR #17276 / #25322 (which fixed
system-bytes inheritance) to thetools[]slot.