feat(xai): upgrade to Responses API, add TTS provider#10783
Conversation
|
Cherry-picked from PR #10600 by Jaaneek — the media/search tool additions, separated from the core provider upgrade (PR #10783). NOTE: Depends on PR #10783 being merged first (for xai_http.py, codex_responses transport, and XAI_API_KEY env var). - Add video generation tool (generate, edit, extend) with async polling - Add xAI image generation/editing backend alongside FAL - Add X search tool backed by xAI Responses API - Add x_search and video_gen toolset definitions - Add CONFIGURABLE_TOOLSETS entries for tools_config UI - Wire into safe and api-server toolsets - Add test coverage for all new tools Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Cherry-picked and trimmed from PR #10600 by Jaaneek. - Switch xAI transport from openai_chat to codex_responses (Responses API) - Add codex_responses detection for xAI in all runtime_provider resolution paths - Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect) - Add extra_headers passthrough for codex_responses requests - Add x-grok-conv-id session header for xAI prompt caching - Add xAI reasoning support (encrypted_content include, no effort param) - Move x-grok-conv-id from chat_completions path to codex_responses path - Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion) - Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary - Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning) - Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS - Add xAI TTS config section, setup wizard entry, tools_config provider option - Add shared xai_http.py helper for User-Agent string Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
d90ab1f to
635ee3a
Compare
|
Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (#11363) — optional-skills-catalog entry - /gquota (#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (#11363) — optional-skills-catalog entry - /gquota (#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
Extracted and standalone-ified from Jaaneek's PR NousResearch#10600 / Teknium's split PR NousResearch#10786. The x_search tool is the smallest, most self-contained piece of that work and doesn't depend on image/video generation changes, so it ships cleanly on its own while NousResearch#10786 rebases. ## What New tool `x_search` backed by xAI's built-in `x_search` Responses API tool. Searches X (Twitter) posts with configurable model, timeout, retry count, handle filtering, and citation extraction. ## Why split PR NousResearch#10786 bundles x_search + video_generation + image_generation xAI backend in ~2k lines across 8 files. Tests on that branch currently regress (mostly unrelated flake from Discord/Telegram suites on CI, plus xai_media asserts drifted vs evolving main). Shipping x_search alone gets 351 LOC of production + 207 LOC of tests into main while the heavier media pieces are rebased. Co-authored credit preserved. ## Scope - tools/x_search_tool.py — tool implementation (351 LOC) - tests/tools/test_x_search_tool.py — unit tests (207 LOC, 6 tests) - toolsets.py — add x_search to _HERMES_CORE_TOOLS + new TOOLSETS entry - hermes_cli/tools_config.py — add x_search to CONFIGURABLE_TOOLSETS Deliberately **not** changing: image_generation_tool.py, video_generation_tool.py, browser_cdp (preserved), xAI TTS wiring (already on main via NousResearch#10783). ## Tests - tests/tools/test_x_search_tool.py — 6 passed - tests/ -k toolset — 126 passed, 4 skipped - tests/ -k tools_config — 91 passed, 4 skipped - Registry smoke: x_search registered, browser_cdp preserved ## Config Reads optional `x_search` section from user config: ```yaml x_search: model: grok-4.20-reasoning # default timeout_seconds: 180 # default retries: 2 # default ``` ## Requirements Gated on `XAI_API_KEY` (already wired by PR NousResearch#10783). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
) Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry - /gquota (NousResearch#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
) Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry - /gquota (NousResearch#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
Extracted and standalone-ified from Jaaneek's PR NousResearch#10600 / Teknium's split PR NousResearch#10786. The x_search tool is the smallest, most self-contained piece of that work and doesn't depend on image/video generation changes, so it ships cleanly on its own while NousResearch#10786 rebases. ## What New tool `x_search` backed by xAI's built-in `x_search` Responses API tool. Searches X (Twitter) posts with configurable model, timeout, retry count, handle filtering, and citation extraction. ## Why split PR NousResearch#10786 bundles x_search + video_generation + image_generation xAI backend in ~2k lines across 8 files. Tests on that branch currently regress (mostly unrelated flake from Discord/Telegram suites on CI, plus xai_media asserts drifted vs evolving main). Shipping x_search alone gets 351 LOC of production + 207 LOC of tests into main while the heavier media pieces are rebased. Co-authored credit preserved. ## Scope - tools/x_search_tool.py — tool implementation (351 LOC) - tests/tools/test_x_search_tool.py — unit tests (207 LOC, 6 tests) - toolsets.py — add x_search to _HERMES_CORE_TOOLS + new TOOLSETS entry - hermes_cli/tools_config.py — add x_search to CONFIGURABLE_TOOLSETS Deliberately **not** changing: image_generation_tool.py, video_generation_tool.py, browser_cdp (preserved), xAI TTS wiring (already on main via NousResearch#10783). ## Tests - tests/tools/test_x_search_tool.py — 6 passed - tests/ -k toolset — 126 passed, 4 skipped - tests/ -k tools_config — 91 passed, 4 skipped - Registry smoke: x_search registered, browser_cdp preserved ## Config Reads optional `x_search` section from user config: ```yaml x_search: model: grok-4.20-reasoning # default timeout_seconds: 180 # default retries: 2 # default ``` ## Requirements Gated on `XAI_API_KEY` (already wired by PR NousResearch#10783). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
) Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry - /gquota (NousResearch#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
) Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry - /gquota (NousResearch#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
) Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry - /gquota (NousResearch#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.
Summary
Salvaged from PR #10600 by @Jaaneek — cherry-picked the core xAI provider upgrade and TTS, stripped the new tool additions (video gen, image gen, X search) into a separate follow-up PR.
What changed
xAI Responses API upgrade:
openai_chattocodex_responses(Responses API)codex_responsesdetection for xAI across all 3 runtime_provider resolution paths + AIAgent.initextra_headerspassthrough support for codex_responses requestsx-grok-conv-idsession header for xAI prompt caching (moved from chat_completions to codex_responses path)reasoning.encrypted_contentinclude, no effort param — xAI reasons automatically)xAI TTS provider:
_generate_xai_tts()using xAI's dedicated/v1/ttsendpointtts.xai), setup wizard entry, and tools_config provider optionProvider cleanup:
grokalias across auth, models, providers, auxiliary_clientgrok-4.20-reasoning,grok-4-1-fast-reasoningXAI_API_KEY/XAI_BASE_URLto OPTIONAL_ENV_VARStools/xai_http.pyhelper (User-Agent string)What was NOT included (follow-up PR)
tools/video_generation_tool.py)image_generation_tool.pychanges)tools/x_search_tool.py)_API_KEY_PROVIDER_AUX_MODELSentry (main-model-first design handles xAI automatically)Files changed (14 files, +189/-24)
run_agent.py— api_mode detection, extra_headers, reasoning, x-grok-conv-idhermes_cli/providers.py— transport change + grok aliashermes_cli/runtime_provider.py— codex_responses in 3 resolution paths + URL detectionhermes_cli/auth.py— grok aliashermes_cli/models.py— trimmed model listhermes_cli/main.py— xai in --provider choiceshermes_cli/config.py— TTS config section + env varshermes_cli/setup.py— xAI TTS setup handlerhermes_cli/tools_config.py— xAI TTS provider optionhermes_cli/nous_subscription.py— TTS labelagent/auxiliary_client.py— provider aliasestools/tts_tool.py— xAI TTS implementationtools/xai_http.py— shared helper (new)toolsets.py— TTS description updateTest plan
Co-authored-by: Jaaneek Jaaneek@users.noreply.github.com