fix(anthropic-oauth): also sanitize tool schema descriptions#2
Merged
Merged
Conversation
Follow-up to PR #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up. Investigation showed that internal tool schemas (delegate_task, cronjob) carry harness phrasing in their descriptions and per-parameter docs, which get serialized into the Anthropic 'tools' request field on every call and read by the harness classifier. PR #1 only walked 'system' and 'messages' — 'tools' was untouched. This extends the single-boundary sanitizer to cover the third payload field. Changes: - New _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and rewrites top-level 'description' on every tool. - New _sanitize_json_schema_descriptions_for_oauth(schema) — recursively walks 'properties', 'items', 'oneOf'/'anyOf'/'allOf', and nested object types so per-parameter descriptions are scrubbed too. Only the human-readable 'description' fields are touched; types, enum values, and field names are left alone so the schema continues to validate. - build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth on the converted Anthropic tools right after the system/messages pass. - Tool *names* are intentionally not rewritten — the mcp_ prefix added elsewhere is required for Anthropic compatibility, and renaming would break tool dispatch on the agent side. Tests: - Adds 3 regression tests in TestOAuthSanitizer covering: full delegate_task + cronjob round trip with bytes-constructed dirty fingerprints, empty/None guards on the helper, and JSON Schema combinator (oneOf) recursion. - The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Test results: 130 passed (was 127).
mssteuer
added a commit
that referenced
this pull request
Apr 14, 2026
Follow-up to PR #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up. Investigation showed that internal tool schemas (delegate_task, cronjob) carry harness phrasing in their descriptions and per-parameter docs, which get serialized into the Anthropic 'tools' request field on every call and read by the harness classifier. PR #1 only walked 'system' and 'messages' — 'tools' was untouched. This extends the single-boundary sanitizer to cover the third payload field. Changes: - New _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and rewrites top-level 'description' on every tool. - New _sanitize_json_schema_descriptions_for_oauth(schema) — recursively walks 'properties', 'items', 'oneOf'/'anyOf'/'allOf', and nested object types so per-parameter descriptions are scrubbed too. Only the human-readable 'description' fields are touched; types, enum values, and field names are left alone so the schema continues to validate. - build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth on the converted Anthropic tools right after the system/messages pass. - Tool *names* are intentionally not rewritten — the mcp_ prefix added elsewhere is required for Anthropic compatibility, and renaming would break tool dispatch on the agent side. Tests: - Adds 3 regression tests in TestOAuthSanitizer covering: full delegate_task + cronjob round trip with bytes-constructed dirty fingerprints, empty/None guards on the helper, and JSON Schema combinator (oneOf) recursion. - The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Test results: 130 passed (was 127). Co-authored-by: Jean Clawd <jeanclawdai@proton.me>
mssteuer
pushed a commit
that referenced
this pull request
Apr 22, 2026
…#13354) Classic-CLI /steer typed during an active agent run was queued through self._pending_input alongside ordinary user input. process_loop, which drains that queue, is blocked inside self.chat() for the entire run, so the queued command was not pulled until AFTER _agent_running had flipped back to False — at which point process_command() took the idle fallback ("No agent running; queued as next turn") and delivered the steer as an ordinary next-turn user message. From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes later at the end of the turn as a /queue-style message, completely defeating its purpose. Fix: add _should_handle_steer_command_inline() gating — when _agent_running is True and the user typed /steer, dispatch process_command(text) directly from the prompt_toolkit Enter handler on the UI thread instead of queueing. This mirrors the existing _should_handle_model_command_inline() pattern for /model and is safe because agent.steer() is thread-safe (uses _pending_steer_lock, no prompt_toolkit state mutation, instant return). No changes to the idle-path behavior: /steer typed with no active agent still takes the normal queue-and-drain route so the fallback "No agent running; queued as next turn" message is preserved. Validation: - 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering the detector, dispatch path, and idle-path control behavior. - All 21 existing tests in tests/run_agent/test_steer.py still pass. - Live PTY end-to-end test with real agent + real openrouter model: 22:36:22 API call #1 (model requested execute_code) 22:36:26 ENTER FIRED: agent_running=True, text='/steer ...' 22:36:26 INLINE STEER DISPATCH fired 22:36:43 agent.log: 'Delivered /steer to agent after tool batch' 22:36:44 API call #2 included the steer; response contained marker Same test on the tip of main without this fix shows the steer landing as a new user turn ~20s after the run ended.
mssteuer
pushed a commit
that referenced
this pull request
Apr 22, 2026
The MCP circuit breaker previously had no path back to the closed state: once _server_error_counts[srv] reached _CIRCUIT_BREAKER_THRESHOLD the gate short-circuited every subsequent call, so the only reset path (on successful call) was unreachable. A single transient 3-failure blip (bad network, server restart, expired token) permanently disabled every tool on that MCP server for the rest of the agent session. Introduce a classic closed/open/half-open state machine: - Track a per-server breaker-open timestamp in _server_breaker_opened_at alongside the existing failure count. - Add _CIRCUIT_BREAKER_COOLDOWN_SEC (60s). Once the count reaches threshold, calls short-circuit for the cooldown window. - After the cooldown elapses, the *next* call falls through as a half-open probe that actually hits the session. Success resets the breaker via _reset_server_error; failure re-bumps the count via _bump_server_error, which re-stamps the open timestamp and re-arms the cooldown. The error message now includes the live failure count and an "Auto-retry available in ~Ns" hint so the model knows the breaker will self-heal rather than giving up on the tool for the whole session. Covers tests 1 (half-opens after cooldown) and 2 (reopens on probe failure); test 3 (cleared on reconnect) still fails pending fix #2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up overnight. Investigation showed a third leak path I missed in the first pass:
Tool schema descriptions are serialized into the Anthropic
toolsrequest field on every call, and the harness classifier reads them. Internal tools written for the framework (delegate_task,cronjob) leak fingerprints in two places:delegate_task→ "Spawn one or more subagents",cronjob→ "no user present", "cron-run sessions cannot recursively schedule cron jobs")delegate_task's parameter docs — about 15 of them)PR #1's
_sanitize_messages_for_oauthonly walkedsystemandmessages. Thetoolspayload was untouched.Changes
All in
agent/anthropic_adapter.py:_sanitize_tools_for_oauth(tools)— walks the Anthropic tools list and rewrites the top-leveldescriptionon every tool via the canonical_OAUTH_SANITIZE_REPLACEMENTStable._sanitize_json_schema_descriptions_for_oauth(schema)— recursively walksproperties,items,oneOf/anyOf/allOf, and nested object types so per-parameterdescriptionfields are scrubbed too. Only the human-readabledescriptionstrings are rewritten; types, enum values, and field names are left alone so the schema continues to validate.build_anthropic_kwargs(is_oauth=True)now calls_sanitize_tools_for_oauthon the converted Anthropic tools right after the existing system/messages pass. Same single-boundary pattern as PR fix(anthropic-oauth): sanitize user messages + tool results at a single boundary #1.mcp_prefix added elsewhere is required for Anthropic compatibility, and renaming the human-facing portion would break tool dispatch on the agent side.Tests
Three new regression tests in
TestOAuthSanitizer:test_build_kwargs_sanitizes_tool_descriptions— fulldelegate_task+cronjobround trip with dirty inputs, asserts that none of the fingerprints survive in the serialized tool schema and that the expected safe replacements are present.test_sanitize_tools_handles_empty_and_none— guardsNone,[], and tools without descriptions or schemas don't crash.test_sanitize_json_schema_walks_combinators— verifies the recursive walker descends intooneOf/anyOf/allOfbranches.The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Example:
Test results: 130 passed (was 127).
Together with PR #1
After this PR merges, the OAuth sanitizer covers all three text-bearing fields in the Anthropic request body:
system,messages(including text blocks andtool_resultcontent in both shapes), andtools(including nested parameter schema descriptions).tool_usearguments are still intentionally exempted because they need to go to the tool verbatim for correctness.Test plan
Expected:
130 passed.