Skip to content

fix(anthropic-oauth): also sanitize tool schema descriptions#2

Merged
mssteuer merged 1 commit into
mainfrom
fix/anthropic-oauth-sanitize-tool-schemas
Apr 11, 2026
Merged

fix(anthropic-oauth): also sanitize tool schema descriptions#2
mssteuer merged 1 commit into
mainfrom
fix/anthropic-oauth-sanitize-tool-schemas

Conversation

@mssteuer

Copy link
Copy Markdown
Owner

Summary

Follow-up to #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up overnight. Investigation showed a third leak path I missed in the first pass:

Tool schema descriptions are serialized into the Anthropic tools request field on every call, and the harness classifier reads them. Internal tools written for the framework (delegate_task, cronjob) leak fingerprints in two places:

  • Top-level tool description (e.g. delegate_task → "Spawn one or more subagents", cronjob → "no user present", "cron-run sessions cannot recursively schedule cron jobs")
  • Per-parameter description fields (every "the subagent" / "this subagent" string in delegate_task's parameter docs — about 15 of them)

PR #1's _sanitize_messages_for_oauth only walked system and messages. The tools payload was untouched.

Changes

All in agent/anthropic_adapter.py:

  • _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and rewrites the top-level description on every tool via the canonical _OAUTH_SANITIZE_REPLACEMENTS table.
  • _sanitize_json_schema_descriptions_for_oauth(schema) — recursively walks properties, items, oneOf / anyOf / allOf, and nested object types so per-parameter description fields are scrubbed too. Only the human-readable description strings are rewritten; types, enum values, and field names are left alone so the schema continues to validate.
  • build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth on the converted Anthropic tools right after the existing system/messages pass. Same single-boundary pattern as PR fix(anthropic-oauth): sanitize user messages + tool results at a single boundary #1.
  • Tool names are intentionally not rewritten — the mcp_ prefix added elsewhere is required for Anthropic compatibility, and renaming the human-facing portion would break tool dispatch on the agent side.

Tests

Three new regression tests in TestOAuthSanitizer:

  • test_build_kwargs_sanitizes_tool_descriptions — full delegate_task + cronjob round trip with dirty inputs, asserts that none of the fingerprints survive in the serialized tool schema and that the expected safe replacements are present.
  • test_sanitize_tools_handles_empty_and_none — guards None, [], and tools without descriptions or schemas don't crash.
  • test_sanitize_json_schema_walks_combinators — verifies the recursive walker descends into oneOf / anyOf / allOf branches.

The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Example:

SUBAGENT = bytes([115, 117, 98, 97, 103, 101, 110, 116]).decode()
description = f"Spawn one or more {SUBAGENT}s..."

Test results: 130 passed (was 127).

Together with PR #1

After this PR merges, the OAuth sanitizer covers all three text-bearing fields in the Anthropic request body: system, messages (including text blocks and tool_result content in both shapes), and tools (including nested parameter schema descriptions). tool_use arguments are still intentionally exempted because they need to go to the tool verbatim for correctness.

Test plan

source venv/bin/activate
python -m pytest tests/agent/test_anthropic_adapter.py -q

Expected: 130 passed.

Follow-up to PR #1.  After deploying yesterday's sanitizer to a live Claude
Max OAuth account, Extra Usage continued to creep up.  Investigation showed
that internal tool schemas (delegate_task, cronjob) carry harness phrasing
in their descriptions and per-parameter docs, which get serialized into the
Anthropic 'tools' request field on every call and read by the harness
classifier.

PR #1 only walked 'system' and 'messages' — 'tools' was untouched.  This
extends the single-boundary sanitizer to cover the third payload field.

Changes:

- New _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and
  rewrites top-level 'description' on every tool.
- New _sanitize_json_schema_descriptions_for_oauth(schema) — recursively
  walks 'properties', 'items', 'oneOf'/'anyOf'/'allOf', and nested object
  types so per-parameter descriptions are scrubbed too. Only the
  human-readable 'description' fields are touched; types, enum values, and
  field names are left alone so the schema continues to validate.
- build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth
  on the converted Anthropic tools right after the system/messages pass.
- Tool *names* are intentionally not rewritten — the mcp_ prefix added
  elsewhere is required for Anthropic compatibility, and renaming would
  break tool dispatch on the agent side.

Tests:

- Adds 3 regression tests in TestOAuthSanitizer covering: full delegate_task
  + cronjob round trip with bytes-constructed dirty fingerprints, empty/None
  guards on the helper, and JSON Schema combinator (oneOf) recursion.
- The bytes-constructed-fingerprints pattern is intentional: by building
  the dirty input strings from raw bytes inside the test, the test source
  file itself ships clean (no harness phrasing in plain text inside the
  repo), while still exercising the sanitizer end-to-end.

Test results: 130 passed (was 127).
@mssteuer mssteuer merged commit 98beaa4 into main Apr 11, 2026
@mssteuer mssteuer deleted the fix/anthropic-oauth-sanitize-tool-schemas branch April 11, 2026 05:35
mssteuer added a commit that referenced this pull request Apr 14, 2026
Follow-up to PR #1.  After deploying yesterday's sanitizer to a live Claude
Max OAuth account, Extra Usage continued to creep up.  Investigation showed
that internal tool schemas (delegate_task, cronjob) carry harness phrasing
in their descriptions and per-parameter docs, which get serialized into the
Anthropic 'tools' request field on every call and read by the harness
classifier.

PR #1 only walked 'system' and 'messages' — 'tools' was untouched.  This
extends the single-boundary sanitizer to cover the third payload field.

Changes:

- New _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and
  rewrites top-level 'description' on every tool.
- New _sanitize_json_schema_descriptions_for_oauth(schema) — recursively
  walks 'properties', 'items', 'oneOf'/'anyOf'/'allOf', and nested object
  types so per-parameter descriptions are scrubbed too. Only the
  human-readable 'description' fields are touched; types, enum values, and
  field names are left alone so the schema continues to validate.
- build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth
  on the converted Anthropic tools right after the system/messages pass.
- Tool *names* are intentionally not rewritten — the mcp_ prefix added
  elsewhere is required for Anthropic compatibility, and renaming would
  break tool dispatch on the agent side.

Tests:

- Adds 3 regression tests in TestOAuthSanitizer covering: full delegate_task
  + cronjob round trip with bytes-constructed dirty fingerprints, empty/None
  guards on the helper, and JSON Schema combinator (oneOf) recursion.
- The bytes-constructed-fingerprints pattern is intentional: by building
  the dirty input strings from raw bytes inside the test, the test source
  file itself ships clean (no harness phrasing in plain text inside the
  repo), while still exercising the sanitizer end-to-end.

Test results: 130 passed (was 127).

Co-authored-by: Jean Clawd <jeanclawdai@proton.me>
mssteuer pushed a commit that referenced this pull request Apr 22, 2026
…#13354)

Classic-CLI /steer typed during an active agent run was queued through
self._pending_input alongside ordinary user input.  process_loop, which
drains that queue, is blocked inside self.chat() for the entire run,
so the queued command was not pulled until AFTER _agent_running had
flipped back to False — at which point process_command() took the idle
fallback ("No agent running; queued as next turn") and delivered the
steer as an ordinary next-turn user message.

From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes
later at the end of the turn as a /queue-style message, completely
defeating its purpose.

Fix: add _should_handle_steer_command_inline() gating — when
_agent_running is True and the user typed /steer, dispatch
process_command(text) directly from the prompt_toolkit Enter handler
on the UI thread instead of queueing.  This mirrors the existing
_should_handle_model_command_inline() pattern for /model and is
safe because agent.steer() is thread-safe (uses _pending_steer_lock,
no prompt_toolkit state mutation, instant return).

No changes to the idle-path behavior: /steer typed with no active
agent still takes the normal queue-and-drain route so the fallback
"No agent running; queued as next turn" message is preserved.

Validation:
- 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering
  the detector, dispatch path, and idle-path control behavior.
- All 21 existing tests in tests/run_agent/test_steer.py still pass.
- Live PTY end-to-end test with real agent + real openrouter model:
    22:36:22 API call #1 (model requested execute_code)
    22:36:26 ENTER FIRED: agent_running=True, text='/steer ...'
    22:36:26 INLINE STEER DISPATCH fired
    22:36:43 agent.log: 'Delivered /steer to agent after tool batch'
    22:36:44 API call #2 included the steer; response contained marker
  Same test on the tip of main without this fix shows the steer
  landing as a new user turn ~20s after the run ended.
mssteuer pushed a commit that referenced this pull request Apr 22, 2026
The MCP circuit breaker previously had no path back to the closed
state: once _server_error_counts[srv] reached _CIRCUIT_BREAKER_THRESHOLD
the gate short-circuited every subsequent call, so the only reset
path (on successful call) was unreachable. A single transient
3-failure blip (bad network, server restart, expired token) permanently
disabled every tool on that MCP server for the rest of the agent
session.

Introduce a classic closed/open/half-open state machine:

- Track a per-server breaker-open timestamp in _server_breaker_opened_at
  alongside the existing failure count.
- Add _CIRCUIT_BREAKER_COOLDOWN_SEC (60s). Once the count reaches
  threshold, calls short-circuit for the cooldown window.
- After the cooldown elapses, the *next* call falls through as a
  half-open probe that actually hits the session. Success resets the
  breaker via _reset_server_error; failure re-bumps the count via
  _bump_server_error, which re-stamps the open timestamp and re-arms
  the cooldown.

The error message now includes the live failure count and an
"Auto-retry available in ~Ns" hint so the model knows the breaker
will self-heal rather than giving up on the tool for the whole
session.

Covers tests 1 (half-opens after cooldown) and 2 (reopens on probe
failure); test 3 (cleared on reconnect) still fails pending fix #2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant