Skip to content

fix(tools): strip slash-containing enum values for xAI Responses#27907

Closed
gbarany wants to merge 5 commits into
NousResearch:mainfrom
gbarany:fix/xai-tool-schema-slash-enum
Closed

fix(tools): strip slash-containing enum values for xAI Responses#27907
gbarany wants to merge 5 commits into
NousResearch:mainfrom
gbarany:fix/xai-tool-schema-slash-enum

Conversation

@gbarany

@gbarany gbarany commented May 18, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes a 100%-broken state for any user on the xAI provider (xai/xai-oauth) who also has the Brave Search MCP enabled. Brave's brave_llm_context and brave_place_search tools advertise an HTTP-header accept parameter as {"type": "string", "enum": ["application/json", "*/*"]}. xAI's /v1/responses schema validator rejects any string enum value containing /, replies with HTTP 200 plus a single SSE event: error frame whose message is the opaque "Invalid arguments passed to the model.", then closes the stream. Hermes retries three times and aborts the turn.

The fix is a new strip_xai_incompatible_enum_values() sanitizer in tools/schema_sanitizer.py that drops any enum constraint containing a slash-bearing string value. It's wired into agent/chat_completion_helpers.build_api_kwargs alongside the existing strip_pattern_and_format call (same xAI gate, same deep-copy-then-strip pattern). Other Responses-compatible backends are not touched.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • Tests (adding or improving test coverage)

Changes Made

  • tools/schema_sanitizer.py — new strip_xai_incompatible_enum_values(tools) helper. Walks both OpenAI-format and Responses-format tool entries. If any string enum value contains /, drops the entire enum constraint (rather than narrowing the list, which would silently change the model-visible value space). Logs malformed (non-list) enums at DEBUG. Emits a sibling DEBUG log when the strip fires.
  • agent/chat_completion_helpers.py — adds the new sanitizer to the existing xAI tool-schema transform pass, on a deep-copied tool list so the shared registry stays valid for non-xAI turns served by the same process.
  • tests/tools/test_schema_sanitizer.py — 13 new unit tests (OpenAI format, Responses format, drops whole enum on mixed values, nested schemas, no-op when clean, empty list, array items enum, anyOf union, mixed non-string members, all-stripped with sibling default, malformed enum, caller-deepcopy contract, returned list-reference identity).
  • tests/run_agent/test_run_agent_codex_responses.py — 3 integration tests verifying _build_api_kwargs strips for xAI agents, leaves slash enums untouched for non-xAI, and that preflight remains provider-neutral.
  • tests/agent/transports/test_codex_transport.py — renamed test class to TestCodexPreflightProviderNeutral, pinning that preflight does not apply provider policy and does not mutate caller-owned schemas.

How to Test

Reproduce against xAI directly without Hermes:

from openai import OpenAI
client = OpenAI(api_key=XAI_OAUTH_TOKEN, base_url="https://api.x.ai/v1")
gen = client.responses.create(
    model="grok-4.3",
    input=[{"role":"user","content":"hi"}],
    instructions="You are Hermes.",
    store=False, stream=True,
    tools=[{"type":"function","name":"x","description":"x","strict":False,
            "parameters":{"type":"object","properties":{
                "accept":{"type":"string","enum":["application/json","*/*"]}}}}],
)
next(iter(gen))  # first SSE frame is event: error, "Invalid arguments passed to the model."

Drop */* (or any value containing /) from the enum and the same call streams response.completed.

In Hermes itself: enable the Brave Search MCP and pick xai-oauth + grok-4.3 — every turn fails on main before this PR and succeeds after.

Confirmed empirically (any string enum value containing / triggers the rejection):

Enum value xAI accepts?
"application/json" rejected
"*/*" rejected
"text/plain" rejected
"text/*" rejected
"a/b" rejected
"foo*bar" accepted
"foo", "bar" accepted

Test runs

pytest tests/tools/test_schema_sanitizer.py -v       # all pass (13 new)
pytest tests/run_agent/test_run_agent_codex_responses.py -k "preflight or xai or build_api_kwargs"
                                                      # all pass (3 new)
pytest tests/tools/test_schema_sanitizer.py \
       tests/run_agent/test_run_agent_codex_responses.py \
       tests/agent/transports/test_codex_transport.py
                                                      # all pass

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(tools):, refactor(tools):, test(tools):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run the suite on every touched module and all preflight/xAI tests
  • I've added tests for my changes
  • I've tested on my platform: macOS 15 (Darwin 25.5.0), Python 3.11, nousresearch/hermes-agent:main

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (the new sanitizer has a thorough docstring; behavior change is internal)
  • I've updated cli-config.yaml.example — N/A (no new config keys)
  • I've updated CONTRIBUTING.md or AGENTS.md — N/A (no workflow/architecture change)
  • I've considered cross-platform impact — pure Python string manipulation, no platform-specific paths
  • I've updated tool descriptions/schemas — N/A (no tool added or modified)

Notes for reviewers

  • The xAI tool-schema policy now lives in one place (chat_completion_helpers.build_api_kwargs) alongside the existing strip_pattern_and_format call — same gate, same deep-copy, same try/except. Preflight remains provider-neutral validation.
  • On mixed enums (e.g. ["fast", "fast/precise", "precise"]), the entire enum constraint is dropped rather than narrowed to ["fast", "precise"]. Narrowing would silently change the accepted value space for that property in ways a user couldn't see from their schema — dropping the constraint keeps type: string typing, so free-form values still validate, but doesn't pretend the model never knew about the missing values.
  • Follow-up worth considering separately: surface xAI's event: error frame's message field in the user-visible error along with the offending tool name when available. Today the openai SDK collapses the frame into the generic _StreamErrorEvent summary so neither Hermes nor the user can tell which tool is bad.

Screenshots / Logs

Before (on nousresearch/hermes-agent:main, fresh hello):

API call failed (attempt N/3) error_type=_StreamErrorEvent
provider=xai-oauth base_url=https://api.x.ai/v1 model=grok-4.3
summary=Invalid arguments passed to the model.

Captured wire response (first SSE frame from xAI):

event: error
data: {"sequence_number":0,"type":"error","code":null,
       "message":"Invalid arguments passed to the model.","param":null}

After the patch — same request body succeeds, with the sanitizer logging:

schema_sanitizer: stripped 2 slash-containing enum value(s) from tool schemas (xAI Responses compatibility)

Gabor Barany added 2 commits May 18, 2026 10:40
xAI's /v1/responses schema validator rejects any tool-parameter string
enum value containing a / character (e.g. "application/json", "*/*",
"text/plain"). It returns HTTP 200 with a single SSE "event: error"
frame whose message is the opaque "Invalid arguments passed to the
model.", then closes the stream, aborting every agent turn.

The Brave Search MCP triggers this out of the box: its brave_llm_context
and brave_place_search tools advertise an HTTP-header "accept" parameter
with enum ["application/json", "*/*"].

Other Responses-compatible backends (OpenAI Codex) accept these enums, so
the strip is gated by a new is_xai_responses parameter plumbed from the
existing detection in conversation_loop and codex_runtime through the
transport into _preflight_codex_api_kwargs. The new
strip_xai_incompatible_enum_values helper lives next to
strip_pattern_and_format in tools/schema_sanitizer.py and emits a parallel
INFO log line when it fires.
… gating

- 6 unit tests for the new sanitizer covering OpenAI and Responses tool
  formats, mixed safe+unsafe enums, nested schemas, and idempotence on
  clean input.
- 2 integration tests on _preflight_codex_api_kwargs verifying the strip
  runs only when is_xai_responses=True (and is a no-op for openai-codex
  and other Responses-compatible backends).
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets provider/xai xAI (Grok) P3 Low — cosmetic, nice to have labels May 18, 2026
Gabor Barany added 3 commits May 18, 2026 11:13
Round of fixes from the multi-agent PR review on NousResearch#27907:

* Deep-copy normalized_tools before stripping in the xAI branch of
  _preflight_codex_api_kwargs so the in-place mutation can't leak back
  through the aliased `parameters` reference into the caller's upstream
  tool registry (would have affected non-xAI turns served by the same
  Python process).
* Wrap the strip call in try/except + logger.warning so a malformed
  schema can't crash the request mid-preflight — the request falls back
  to sending the original tools and xAI's real error surfaces instead
  of a Hermes-side traceback.
* Extract `agent_uses_xai_responses(agent)` in
  agent.codex_responses_adapter so the predicate stays in sync across
  conversation_loop.py and codex_runtime.py. Future provider keys land
  in one place.
* Drop the `getattr(agent, '_base_url_hostname', None)` softening — a
  missing attribute is a real initialization bug and silencing it
  would hide the very mis-route this PR is meant to fix.
* Log non-list (malformed) `enum` values at DEBUG instead of silently
  ignoring them, and demote the stripped-count log from INFO to DEBUG
  (fires every turn on affected setups; not a notable event).
* Tighten docstring: pin the mutation/return contract and replace the
  brittle Brave tool-name references with the underlying header-enum
  shape that triggers the rejection.

Tests:
* 7 new unit tests covering caller-deepcopy contract, list-reference
  return, array `items` with enum, anyOf union, mixed-type enum
  values, all-stripped enum with sibling `default` survival, and the
  malformed (non-list) enum path.
* 4 new helper tests covering `agent_uses_xai_responses` (provider
  match, hostname-only match, negative providers, missing-attrs
  tolerance).
* 3 transport-wrapper tests pinning that
  ResponsesApiTransport.preflight_kwargs forwards `is_xai_responses`
  and that the caller's tool registry is preserved across the call.

All previously-passing tests on the touched modules still pass (563 of
577; the 14 unrelated failures in tests/agent/test_anthropic_adapter.py
also fail on upstream main from this checkout — they read ambient
Claude Code OAuth tokens from the host keychain).
Per the silent-failure-hunter review: getattr fallbacks would silently
route a mis-initialized agent (one that skipped agent_init and so never
got provider / _base_url_hostname set) as non-xAI, re-triggering the
exact opaque-SSE-error bug this PR exists to prevent. Bare attribute
access surfaces AttributeError loudly so the real init bug bubbles up.

Matches the inline-expression pattern already used in
agent/chat_completion_helpers.py:287.

Test now asserts pytest.raises(AttributeError) on a Bare() agent stub
instead of the previous silent-False behavior.
…boundary

Move strip_xai_incompatible_enum_values out of the preflight path and into
agent/chat_completion_helpers.build_api_kwargs alongside the existing
strip_pattern_and_format call. Both xAI-specific tool-schema transforms now
live in the same is_xai_responses block that already existed there, on a
deep-copied tool list so the shared registry stays valid for non-xAI turns.

Removes the plumbing introduced for the earlier review round:
* agent_uses_xai_responses helper in agent/codex_responses_adapter.py
* is_xai_responses kwarg on ResponsesApiTransport.preflight_kwargs
* is_xai_responses kwarg on _preflight_codex_api_kwargs
* derivation + pass-through in conversation_loop.py and codex_runtime.py

Preflight is once again provider-neutral validation. xAI compatibility is
now owned end-to-end by the request builder, which is the only place that
also knows whether the call is bound for xAI.

Strip semantics: if any string value in an enum contains "/", drop the
entire enum constraint rather than narrowing the list. Preserving the
non-slash subset would silently change the model-visible value space for
that property; dropping the constraint keeps the type-string typing so
free-form values still validate.

Tests now exercise the behavior end-to-end through agent._build_api_kwargs
instead of probing preflight. The transport-level test class is renamed to
TestCodexPreflightProviderNeutral and pins that preflight does not apply
provider policy.
teknium1 pushed a commit that referenced this pull request May 29, 2026
…27907)

The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format)
mutate their input in place — that's their documented contract. The two
call sites (chat_completion_helpers.build_api_kwargs and the auxiliary
client) were passing agent.tools straight through, so the first xAI
request would permanently strip slash-containing enum constraints and
pattern/format keywords from the per-agent tool registry.

Effect: any subsequent non-xAI call from the same agent (auxiliary task
routed to Anthropic, OpenRouter fallback, mid-session model switch) saw
the already-stripped schema with no way for the user to notice from
their config.

Fix: deepcopy tools_for_api before sanitizing at both call sites.

The slash-enum bug itself (xAI 400ing on enums with '/') was fixed
earlier by #32443 (Nami4D) — that PR landed the strip but used the
sanitizers directly without copying. This salvages #27907's correctness
contribution (the deepcopy) while skipping its redundant parallel
sanitizer (strip_xai_incompatible_enum_values is functionally
equivalent to the existing strip_slash_enum) and its preflight-
neutrality argument (we chose model-gated preflight in #32443).

3 new tests in tests/run_agent/test_run_agent_codex_responses.py:

- strips_slash_enum_from_outgoing_request — outgoing kwargs has no
  slash-containing enum values (functional contract preserved).
- does_not_mutate_agent_tools — headline #27907 regression. Snapshot
  agent.tools before build_api_kwargs, assert it survives intact
  after. Pre-fix this assertion would have caught the mutation.
- is_idempotent_across_repeated_calls — three xAI requests in a row
  each strip cleanly AND don't progressively erode the source schema.

344/344 across tests/agent/test_auxiliary_client.py,
tests/agent/transports/test_codex_transport.py,
tests/run_agent/test_run_agent_codex_responses.py, and
tests/tools/test_schema_sanitizer.py.

Co-authored-by: Gabor Barany <barany.gabor@gmail.com>
@teknium1

Copy link
Copy Markdown
Contributor

Closing — the deepcopy correctness fix from your PR was salvaged into #34416 (#34416) with your authorship preserved (Co-authored-by: trailer on the merge commit 1386a7e4).

What landed:

  • _copy.deepcopy(tools_for_api) before the sanitizer calls in chat_completion_helpers.build_api_kwargs AND the symmetric site in auxiliary_client.py (which used list(tools) — only a shallow copy — and had the same mutation issue).
  • 3 new tests, including the headline does_not_mutate_agent_tools regression that asserts agent.tools survives intact after a build_api_kwargs call.

What didn't land, and why:

Thank you for the careful empirical work mapping which xAI enum values get rejected (application/json rejected, text/* rejected, foo*bar accepted) and for identifying the mutation as a separate bug from the rejection itself. Without your PR the mutation bug would have stayed on main even post-#32443.

@teknium1 teknium1 closed this May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…ousResearch#27907)

The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format)
mutate their input in place — that's their documented contract. The two
call sites (chat_completion_helpers.build_api_kwargs and the auxiliary
client) were passing agent.tools straight through, so the first xAI
request would permanently strip slash-containing enum constraints and
pattern/format keywords from the per-agent tool registry.

Effect: any subsequent non-xAI call from the same agent (auxiliary task
routed to Anthropic, OpenRouter fallback, mid-session model switch) saw
the already-stripped schema with no way for the user to notice from
their config.

Fix: deepcopy tools_for_api before sanitizing at both call sites.

The slash-enum bug itself (xAI 400ing on enums with '/') was fixed
earlier by NousResearch#32443 (Nami4D) — that PR landed the strip but used the
sanitizers directly without copying. This salvages NousResearch#27907's correctness
contribution (the deepcopy) while skipping its redundant parallel
sanitizer (strip_xai_incompatible_enum_values is functionally
equivalent to the existing strip_slash_enum) and its preflight-
neutrality argument (we chose model-gated preflight in NousResearch#32443).

3 new tests in tests/run_agent/test_run_agent_codex_responses.py:

- strips_slash_enum_from_outgoing_request — outgoing kwargs has no
  slash-containing enum values (functional contract preserved).
- does_not_mutate_agent_tools — headline NousResearch#27907 regression. Snapshot
  agent.tools before build_api_kwargs, assert it survives intact
  after. Pre-fix this assertion would have caught the mutation.
- is_idempotent_across_repeated_calls — three xAI requests in a row
  each strip cleanly AND don't progressively erode the source schema.

344/344 across tests/agent/test_auxiliary_client.py,
tests/agent/transports/test_codex_transport.py,
tests/run_agent/test_run_agent_codex_responses.py, and
tests/tools/test_schema_sanitizer.py.

Co-authored-by: Gabor Barany <barany.gabor@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have provider/xai xAI (Grok) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants