Skip to content

fix(background_review): inherit parent toolset config for tools[] cache parity (salvage #29568)#29704

Merged
alt-glitch merged 2 commits into
mainfrom
pr-29568
May 21, 2026
Merged

fix(background_review): inherit parent toolset config for tools[] cache parity (salvage #29568)#29704
alt-glitch merged 2 commits into
mainfrom
pr-29568

Conversation

@alt-glitch

Copy link
Copy Markdown
Collaborator

Summary

Salvage of #29568 by @ziliangpeng. The background review fork's AIAgent() constructor was missing enabled_toolsets/disabled_toolsets kwargs — when the parent narrowed its toolset, the fork defaulted to all registered tools, diverging the tools[] array and breaking Anthropic's prefix cache.

Two-line fix: propagate both kwargs from the parent. Extends the byte-stability invariant from PR #25434 (system slot) to the tools[] slot.

Changes

  • agent/background_review.py: add enabled_toolsets/disabled_toolsets kwargs + 3-line comment
  • tests/run_agent/test_background_review_cache_parity.py: new positive test, updated stub
  • tests/run_agent/test_background_review_toolset_restriction.py: inverted test to match new invariant, updated stub
  • scripts/release.py: add AUTHOR_MAP entry for contributor email

Follow-up (our commit)

Trimmed ~65 lines of verbose comments/docstrings from the original PR. Added AUTHOR_MAP entry for ziliangdotme@gmail.com.

Validation

Scenario Parent tools Fork tools Match
Narrowed (main) 6 30 ❌ (bug)
Narrowed (this PR) 6 6 ✅ (fix)
Default/None (this PR) 29 29 ✅ (symmetric)

18/18 background review tests pass.

Fixes #29567. Cherry-picked from #29568 with @ziliangpeng's authorship preserved.

ziliangpeng and others added 2 commits May 21, 2026 07:18
…[] cache-stable

## Summary

The background skill/memory-review fork constructed a child `AIAgent`
without propagating `enabled_toolsets` / `disabled_toolsets` from the
parent. When the parent narrowed its toolset (via `hermes tools
disable` or `config.yaml`), the fork's default `enabled_toolsets=None`
expanded to "all registered tools" — and the fork's outbound request
body sent a wider `tools[]` array than the parent's main-turn request.

Anthropic's prompt-cache key includes the `tools[]` array byte-for-byte,
so this divergence forked the cache lineage on every nudge and forced a
full prefix rewrite. On a captured ~4 hour Claude-via-Hermes session
this cost roughly 4.3 M cache-write tokens — about half of those
attributable to the per-nudge alternation between the main turn's
narrowed `tools[]` and the review fork's wider `tools[]`.

## Goal

Extend the byte-stability invariant established by PR #17276 (which
fixed `system`) to the `tools[]` slot of the request body, so the
review fork's outbound request hits the parent's warmed Anthropic
prefix cache regardless of how the parent's toolset is configured.

## Implementation

Two-line change in `agent/background_review.py`: pass
`enabled_toolsets=getattr(agent, "enabled_toolsets", None)` and the
matching `disabled_toolsets` kwarg into the `AIAgent(...)` call inside
`_spawn_background_review`. Adds an explanatory block comment that
calls out the cache-key dependency and the relationship to PR #17276.

The post-construction runtime whitelist
(`set_thread_tool_whitelist({memory, skills})`) is untouched — it
still gates which tools the model is allowed to *dispatch*. This
change aligns only what the request body *transmits*, not what the
review is allowed to do, so the safety contract from issue #15204
remains intact.

## Testing

- `tests/run_agent/test_background_review_cache_parity.py`: new
  `test_review_fork_inherits_parent_toolset_config` asserts the
  parent's `enabled_toolsets` and `disabled_toolsets` reach the
  review-fork constructor as kwargs.
- `tests/run_agent/test_background_review_toolset_restriction.py`:
  the existing `test_background_review_does_not_narrow_toolset_schema`
  was inverted (its old "must NOT pass enabled_toolsets" rule was
  built on the assumption that the parent always ran with the
  registry default — wrong in practice when the parent is narrowed).
  Renamed to `test_background_review_matches_parent_toolset_config`
  and updated to assert the parent's value propagates verbatim.
- Verified the new positive test fails without the fix and passes
  with it.
- Full suite for `test_background_review*`:

  ```
  $ python -m pytest tests/run_agent/test_background_review.py \
                     tests/run_agent/test_background_review_summary.py \
                     tests/run_agent/test_background_review_toolset_restriction.py \
                     tests/run_agent/test_background_review_cache_parity.py -q
  18 passed in 1.85s
  ```

## Scope

- `agent/background_review.py`: 2 added kwargs + explanatory comment.
- Two test files: one new positive test, one inverted existing test.
- No production code paths outside the review fork; no schema changes;
  no public-API changes.

Refs: ziliangpeng#1 (root-cause analysis with wire-level
cache-write measurements). Extends PR #17276's `system`-bytes
invariant to the `tools[]` slot.
- Replace 18-line comment block with 3-line invariant statement
- Trim test docstrings from multi-paragraph to single-line summaries
- Trim assertion messages from 4-line to 2-line mismatch reports
- Replace 5-line WHAT comments in stubs with 1-line WHY comments
- Add ziliangdotme@gmail.com -> ziliangpeng to AUTHOR_MAP
@alt-glitch alt-glitch merged commit 87d9239 into main May 21, 2026
16 of 17 checks passed
@alt-glitch alt-glitch deleted the pr-29568 branch May 21, 2026 07:19
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: pr-29568 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8994 on HEAD, 8992 on base (🆕 +2)

🆕 New issues (1):

Rule Count
invalid-argument-type 1
First entries
agent/background_review.py:408: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `list[str]`, found `Any | None`

✅ Fixed issues: none

Unchanged: 4743 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

background_review fork sends wider tools[] than parent, fragments Anthropic prefix cache (~50% wasted cache-write on long sessions)

2 participants