fix(anthropic-oauth): also sanitize tool schema descriptions by mssteuer · Pull Request #2 · mssteuer/hermes-agent

mssteuer · 2026-04-11T05:35:09Z

Summary

Follow-up to #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up overnight. Investigation showed a third leak path I missed in the first pass:

Tool schema descriptions are serialized into the Anthropic tools request field on every call, and the harness classifier reads them. Internal tools written for the framework (delegate_task, cronjob) leak fingerprints in two places:

Top-level tool description (e.g. delegate_task → "Spawn one or more subagents", cronjob → "no user present", "cron-run sessions cannot recursively schedule cron jobs")
Per-parameter description fields (every "the subagent" / "this subagent" string in delegate_task's parameter docs — about 15 of them)

PR #1's _sanitize_messages_for_oauth only walked system and messages. The tools payload was untouched.

Changes

All in agent/anthropic_adapter.py:

_sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and rewrites the top-level description on every tool via the canonical _OAUTH_SANITIZE_REPLACEMENTS table.
_sanitize_json_schema_descriptions_for_oauth(schema) — recursively walks properties, items, oneOf / anyOf / allOf, and nested object types so per-parameter description fields are scrubbed too. Only the human-readable description strings are rewritten; types, enum values, and field names are left alone so the schema continues to validate.
build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth on the converted Anthropic tools right after the existing system/messages pass. Same single-boundary pattern as PR fix(anthropic-oauth): sanitize user messages + tool results at a single boundary #1.
Tool names are intentionally not rewritten — the mcp_ prefix added elsewhere is required for Anthropic compatibility, and renaming the human-facing portion would break tool dispatch on the agent side.

Tests

Three new regression tests in TestOAuthSanitizer:

test_build_kwargs_sanitizes_tool_descriptions — full delegate_task + cronjob round trip with dirty inputs, asserts that none of the fingerprints survive in the serialized tool schema and that the expected safe replacements are present.
test_sanitize_tools_handles_empty_and_none — guards None, [], and tools without descriptions or schemas don't crash.
test_sanitize_json_schema_walks_combinators — verifies the recursive walker descends into oneOf / anyOf / allOf branches.

The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Example:

SUBAGENT = bytes([115, 117, 98, 97, 103, 101, 110, 116]).decode()
description = f"Spawn one or more {SUBAGENT}s..."

Test results: 130 passed (was 127).

Together with PR #1

After this PR merges, the OAuth sanitizer covers all three text-bearing fields in the Anthropic request body: system, messages (including text blocks and tool_result content in both shapes), and tools (including nested parameter schema descriptions). tool_use arguments are still intentionally exempted because they need to go to the tool verbatim for correctness.

Test plan

source venv/bin/activate
python -m pytest tests/agent/test_anthropic_adapter.py -q

Expected: 130 passed.

Follow-up to PR #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up. Investigation showed that internal tool schemas (delegate_task, cronjob) carry harness phrasing in their descriptions and per-parameter docs, which get serialized into the Anthropic 'tools' request field on every call and read by the harness classifier. PR #1 only walked 'system' and 'messages' — 'tools' was untouched. This extends the single-boundary sanitizer to cover the third payload field. Changes: - New _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and rewrites top-level 'description' on every tool. - New _sanitize_json_schema_descriptions_for_oauth(schema) — recursively walks 'properties', 'items', 'oneOf'/'anyOf'/'allOf', and nested object types so per-parameter descriptions are scrubbed too. Only the human-readable 'description' fields are touched; types, enum values, and field names are left alone so the schema continues to validate. - build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth on the converted Anthropic tools right after the system/messages pass. - Tool *names* are intentionally not rewritten — the mcp_ prefix added elsewhere is required for Anthropic compatibility, and renaming would break tool dispatch on the agent side. Tests: - Adds 3 regression tests in TestOAuthSanitizer covering: full delegate_task + cronjob round trip with bytes-constructed dirty fingerprints, empty/None guards on the helper, and JSON Schema combinator (oneOf) recursion. - The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Test results: 130 passed (was 127).

Follow-up to PR #1. After deploying yesterday's sanitizer to a live Claude Max OAuth account, Extra Usage continued to creep up. Investigation showed that internal tool schemas (delegate_task, cronjob) carry harness phrasing in their descriptions and per-parameter docs, which get serialized into the Anthropic 'tools' request field on every call and read by the harness classifier. PR #1 only walked 'system' and 'messages' — 'tools' was untouched. This extends the single-boundary sanitizer to cover the third payload field. Changes: - New _sanitize_tools_for_oauth(tools) — walks the Anthropic tools list and rewrites top-level 'description' on every tool. - New _sanitize_json_schema_descriptions_for_oauth(schema) — recursively walks 'properties', 'items', 'oneOf'/'anyOf'/'allOf', and nested object types so per-parameter descriptions are scrubbed too. Only the human-readable 'description' fields are touched; types, enum values, and field names are left alone so the schema continues to validate. - build_anthropic_kwargs(is_oauth=True) now calls _sanitize_tools_for_oauth on the converted Anthropic tools right after the system/messages pass. - Tool *names* are intentionally not rewritten — the mcp_ prefix added elsewhere is required for Anthropic compatibility, and renaming would break tool dispatch on the agent side. Tests: - Adds 3 regression tests in TestOAuthSanitizer covering: full delegate_task + cronjob round trip with bytes-constructed dirty fingerprints, empty/None guards on the helper, and JSON Schema combinator (oneOf) recursion. - The bytes-constructed-fingerprints pattern is intentional: by building the dirty input strings from raw bytes inside the test, the test source file itself ships clean (no harness phrasing in plain text inside the repo), while still exercising the sanitizer end-to-end. Test results: 130 passed (was 127). Co-authored-by: Jean Clawd <jeanclawdai@proton.me>

…#13354) Classic-CLI /steer typed during an active agent run was queued through self._pending_input alongside ordinary user input. process_loop, which drains that queue, is blocked inside self.chat() for the entire run, so the queued command was not pulled until AFTER _agent_running had flipped back to False — at which point process_command() took the idle fallback ("No agent running; queued as next turn") and delivered the steer as an ordinary next-turn user message. From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes later at the end of the turn as a /queue-style message, completely defeating its purpose. Fix: add _should_handle_steer_command_inline() gating — when _agent_running is True and the user typed /steer, dispatch process_command(text) directly from the prompt_toolkit Enter handler on the UI thread instead of queueing. This mirrors the existing _should_handle_model_command_inline() pattern for /model and is safe because agent.steer() is thread-safe (uses _pending_steer_lock, no prompt_toolkit state mutation, instant return). No changes to the idle-path behavior: /steer typed with no active agent still takes the normal queue-and-drain route so the fallback "No agent running; queued as next turn" message is preserved. Validation: - 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering the detector, dispatch path, and idle-path control behavior. - All 21 existing tests in tests/run_agent/test_steer.py still pass. - Live PTY end-to-end test with real agent + real openrouter model: 22:36:22 API call #1 (model requested execute_code) 22:36:26 ENTER FIRED: agent_running=True, text='/steer ...' 22:36:26 INLINE STEER DISPATCH fired 22:36:43 agent.log: 'Delivered /steer to agent after tool batch' 22:36:44 API call #2 included the steer; response contained marker Same test on the tip of main without this fix shows the steer landing as a new user turn ~20s after the run ended.

The MCP circuit breaker previously had no path back to the closed state: once _server_error_counts[srv] reached _CIRCUIT_BREAKER_THRESHOLD the gate short-circuited every subsequent call, so the only reset path (on successful call) was unreachable. A single transient 3-failure blip (bad network, server restart, expired token) permanently disabled every tool on that MCP server for the rest of the agent session. Introduce a classic closed/open/half-open state machine: - Track a per-server breaker-open timestamp in _server_breaker_opened_at alongside the existing failure count. - Add _CIRCUIT_BREAKER_COOLDOWN_SEC (60s). Once the count reaches threshold, calls short-circuit for the cooldown window. - After the cooldown elapses, the *next* call falls through as a half-open probe that actually hits the session. Success resets the breaker via _reset_server_error; failure re-bumps the count via _bump_server_error, which re-stamps the open timestamp and re-arms the cooldown. The error message now includes the live failure count and an "Auto-retry available in ~Ns" hint so the model knows the breaker will self-heal rather than giving up on the tool for the whole session. Covers tests 1 (half-opens after cooldown) and 2 (reopens on probe failure); test 3 (cleared on reconnect) still fails pending fix #2.

mssteuer merged commit 98beaa4 into main Apr 11, 2026

mssteuer deleted the fix/anthropic-oauth-sanitize-tool-schemas branch April 11, 2026 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(anthropic-oauth): also sanitize tool schema descriptions#2

fix(anthropic-oauth): also sanitize tool schema descriptions#2
mssteuer merged 1 commit into
mainfrom
fix/anthropic-oauth-sanitize-tool-schemas

mssteuer commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mssteuer commented Apr 11, 2026

Summary

Changes

Tests

Together with PR #1

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant