Skip to content

feat(xai): upgrade to Responses API, add TTS provider#10783

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1397f187
Apr 16, 2026
Merged

feat(xai): upgrade to Responses API, add TTS provider#10783
teknium1 merged 1 commit into
mainfrom
hermes/hermes-1397f187

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Salvaged from PR #10600 by @Jaaneek — cherry-picked the core xAI provider upgrade and TTS, stripped the new tool additions (video gen, image gen, X search) into a separate follow-up PR.

What changed

xAI Responses API upgrade:

  • Switch xAI transport from openai_chat to codex_responses (Responses API)
  • Add codex_responses detection for xAI across all 3 runtime_provider resolution paths + AIAgent.init
  • Add extra_headers passthrough support for codex_responses requests
  • Add x-grok-conv-id session header for xAI prompt caching (moved from chat_completions to codex_responses path)
  • Add xAI reasoning support (reasoning.encrypted_content include, no effort param — xAI reasons automatically)

xAI TTS provider:

  • Add _generate_xai_tts() using xAI's dedicated /v1/tts endpoint
  • Wire into provider dispatch, Opus conversion (Telegram voice bubbles), and requirements check
  • Add config section (tts.xai), setup wizard entry, and tools_config provider option

Provider cleanup:

  • Add grok alias across auth, models, providers, auxiliary_client
  • Trim xAI model list to agentic models: grok-4.20-reasoning, grok-4-1-fast-reasoning
  • Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS
  • Add shared tools/xai_http.py helper (User-Agent string)

What was NOT included (follow-up PR)

  • Video generation tool (tools/video_generation_tool.py)
  • xAI image generation/editing backend (image_generation_tool.py changes)
  • X search tool (tools/x_search_tool.py)
  • _API_KEY_PROVIDER_AUX_MODELS entry (main-model-first design handles xAI automatically)

Files changed (14 files, +189/-24)

  • run_agent.py — api_mode detection, extra_headers, reasoning, x-grok-conv-id
  • hermes_cli/providers.py — transport change + grok alias
  • hermes_cli/runtime_provider.py — codex_responses in 3 resolution paths + URL detection
  • hermes_cli/auth.py — grok alias
  • hermes_cli/models.py — trimmed model list
  • hermes_cli/main.py — xai in --provider choices
  • hermes_cli/config.py — TTS config section + env vars
  • hermes_cli/setup.py — xAI TTS setup handler
  • hermes_cli/tools_config.py — xAI TTS provider option
  • hermes_cli/nous_subscription.py — TTS label
  • agent/auxiliary_client.py — provider aliases
  • tools/tts_tool.py — xAI TTS implementation
  • tools/xai_http.py — shared helper (new)
  • toolsets.py — TTS description update

Test plan

  • 287 targeted tests pass (runtime_provider, api_key_providers, config, codex_responses)
  • 785 hermes_cli tests pass
  • 44 codex_responses tests pass

Co-authored-by: Jaaneek Jaaneek@users.noreply.github.com

@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

4005:+    response = requests.post(

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

⚠️ WARNING: CI/CD workflow files modified

Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.

Files:

.github/workflows/deploy-site.yml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

teknium1 added a commit that referenced this pull request Apr 16, 2026
Cherry-picked from PR #10600 by Jaaneek — the media/search tool additions,
separated from the core provider upgrade (PR #10783).

NOTE: Depends on PR #10783 being merged first (for xai_http.py, codex_responses
transport, and XAI_API_KEY env var).

- Add video generation tool (generate, edit, extend) with async polling
- Add xAI image generation/editing backend alongside FAL
- Add X search tool backed by xAI Responses API
- Add x_search and video_gen toolset definitions
- Add CONFIGURABLE_TOOLSETS entries for tools_config UI
- Wire into safe and api-server toolsets
- Add test coverage for all new tools

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Cherry-picked and trimmed from PR #10600 by Jaaneek.

- Switch xAI transport from openai_chat to codex_responses (Responses API)
- Add codex_responses detection for xAI in all runtime_provider resolution paths
- Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect)
- Add extra_headers passthrough for codex_responses requests
- Add x-grok-conv-id session header for xAI prompt caching
- Add xAI reasoning support (encrypted_content include, no effort param)
- Move x-grok-conv-id from chat_completions path to codex_responses path
- Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion)
- Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary
- Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning)
- Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS
- Add xAI TTS config section, setup wizard entry, tools_config provider option
- Add shared xai_http.py helper for User-Agent string

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
@teknium1 teknium1 force-pushed the hermes/hermes-1397f187 branch from d90ab1f to 635ee3a Compare April 16, 2026 09:23
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

424:+    response = requests.post(

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1 teknium1 merged commit 0c1217d into main Apr 16, 2026
6 of 7 checks passed
@teknium1 teknium1 deleted the hermes/hermes-1397f187 branch April 16, 2026 09:24
teknium1 added a commit that referenced this pull request Apr 18, 2026
Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (#11363) — optional-skills-catalog entry
- /gquota (#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
teknium1 added a commit that referenced this pull request Apr 18, 2026
Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (#11363) — optional-skills-catalog entry
- /gquota (#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
Julientalbot pushed a commit to Julientalbot/hermes-agent that referenced this pull request Apr 19, 2026
Extracted and standalone-ified from Jaaneek's PR NousResearch#10600 / Teknium's split
PR NousResearch#10786. The x_search tool is the smallest, most self-contained piece
of that work and doesn't depend on image/video generation changes, so it
ships cleanly on its own while NousResearch#10786 rebases.

## What

New tool `x_search` backed by xAI's built-in `x_search` Responses API
tool. Searches X (Twitter) posts with configurable model, timeout, retry
count, handle filtering, and citation extraction.

## Why split

PR NousResearch#10786 bundles x_search + video_generation + image_generation xAI
backend in ~2k lines across 8 files. Tests on that branch currently
regress (mostly unrelated flake from Discord/Telegram suites on CI, plus
xai_media asserts drifted vs evolving main). Shipping x_search alone
gets 351 LOC of production + 207 LOC of tests into main while the heavier
media pieces are rebased. Co-authored credit preserved.

## Scope

- tools/x_search_tool.py — tool implementation (351 LOC)
- tests/tools/test_x_search_tool.py — unit tests (207 LOC, 6 tests)
- toolsets.py — add x_search to _HERMES_CORE_TOOLS + new TOOLSETS entry
- hermes_cli/tools_config.py — add x_search to CONFIGURABLE_TOOLSETS

Deliberately **not** changing: image_generation_tool.py, video_generation_tool.py,
browser_cdp (preserved), xAI TTS wiring (already on main via NousResearch#10783).

## Tests

- tests/tools/test_x_search_tool.py — 6 passed
- tests/ -k toolset — 126 passed, 4 skipped
- tests/ -k tools_config — 91 passed, 4 skipped
- Registry smoke: x_search registered, browser_cdp preserved

## Config

Reads optional `x_search` section from user config:
```yaml
x_search:
  model: grok-4.20-reasoning    # default
  timeout_seconds: 180          # default
  retries: 2                    # default
```

## Requirements

Gated on `XAI_API_KEY` (already wired by PR NousResearch#10783).

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
)

Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry
- /gquota (NousResearch#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
)

Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry
- /gquota (NousResearch#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
Julientalbot pushed a commit to Julientalbot/hermes-agent that referenced this pull request May 10, 2026
Extracted and standalone-ified from Jaaneek's PR NousResearch#10600 / Teknium's split
PR NousResearch#10786. The x_search tool is the smallest, most self-contained piece
of that work and doesn't depend on image/video generation changes, so it
ships cleanly on its own while NousResearch#10786 rebases.

## What

New tool `x_search` backed by xAI's built-in `x_search` Responses API
tool. Searches X (Twitter) posts with configurable model, timeout, retry
count, handle filtering, and citation extraction.

## Why split

PR NousResearch#10786 bundles x_search + video_generation + image_generation xAI
backend in ~2k lines across 8 files. Tests on that branch currently
regress (mostly unrelated flake from Discord/Telegram suites on CI, plus
xai_media asserts drifted vs evolving main). Shipping x_search alone
gets 351 LOC of production + 207 LOC of tests into main while the heavier
media pieces are rebased. Co-authored credit preserved.

## Scope

- tools/x_search_tool.py — tool implementation (351 LOC)
- tests/tools/test_x_search_tool.py — unit tests (207 LOC, 6 tests)
- toolsets.py — add x_search to _HERMES_CORE_TOOLS + new TOOLSETS entry
- hermes_cli/tools_config.py — add x_search to CONFIGURABLE_TOOLSETS

Deliberately **not** changing: image_generation_tool.py, video_generation_tool.py,
browser_cdp (preserved), xAI TTS wiring (already on main via NousResearch#10783).

## Tests

- tests/tools/test_x_search_tool.py — 6 passed
- tests/ -k toolset — 126 passed, 4 skipped
- tests/ -k tools_config — 91 passed, 4 skipped
- Registry smoke: x_search registered, browser_cdp preserved

## Config

Reads optional `x_search` section from user config:
```yaml
x_search:
  model: grok-4.20-reasoning    # default
  timeout_seconds: 180          # default
  retries: 2                    # default
```

## Requirements

Gated on `XAI_API_KEY` (already wired by PR NousResearch#10783).

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
)

Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry
- /gquota (NousResearch#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
)

Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry
- /gquota (NousResearch#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
)

Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry
- /gquota (NousResearch#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant