feat(stt): add xAI Grok STT provider by Julientalbot · Pull Request #12120 · NousResearch/hermes-agent

Julientalbot · 2026-04-18T11:10:18Z

Summary

Add xAI as a sixth STT provider using the POST /v1/stt endpoint with multipart/form-data.

Features

Inverse Text Normalization (ITN) via format=true (default on)
Optional diarization via stt.xai.diarize config
Language configuration (default: fr, overridable via config or HERMES_LOCAL_STT_LANGUAGE env)
Custom base URL (XAI_STT_BASE_URL env or stt.xai.base_url config)
Full provider integration: explicit config + auto-detect fallback chain
Consistent error handling matching existing provider patterns

Auto-detect priority

local → groq → openai → mistral → xai → none

Configuration

stt:
  provider: xai
  xai:
    language: fr
    format: true        # Inverse Text Normalization
    diarize: false      # Speaker diarization
    base_url: https://api.x.ai/v1   # optional override

Testing

17 new unit tests covering: transcription, error handling, provider selection, dispatch
All 89 tests passing (existing + new)

Files changed

tools/transcription_tools.py — xAI provider implementation (+120 lines)
tests/tools/test_transcription_tools.py — unit tests (+256 lines)

xAI STT API reference

Endpoint: POST https://api.x.ai/v1/stt
Auth: Bearer token via XAI_API_KEY
Input: multipart/form-data (file + optional language, format, diarize)
Output: {"text": "...", "language": "fr", "duration": 3.2}
21 languages supported, ~5% WER (best-in-class entity recognition)

Add xAI as a sixth STT provider using the POST /v1/stt endpoint. Features: - Multipart/form-data upload to api.x.ai/v1/stt - Inverse Text Normalization (ITN) via format=true (default) - Optional diarization via config (stt.xai.diarize) - Language configuration (default: fr, overridable via config or env) - Custom base_url support (XAI_STT_BASE_URL env or stt.xai.base_url) - Full provider integration: explicit config + auto-detect fallback chain - Consistent error handling matching existing provider patterns Config (config.yaml): stt: provider: xai xai: language: fr format: true diarize: false base_url: https://api.x.ai/v1 # optional override Auto-detect priority: local > groq > openai > mistral > xai > none

Covers: - _transcribe_xai: no key, successful transcription, whitespace stripping, API error (HTTP 400), empty transcript, permission error, network error, language/format params sent, custom base_url, diarize config - _get_provider xAI: key set, no key, auto-detect after mistral, mistral preferred over xai, no key returns none - transcribe_audio xAI dispatch: dispatch, default model (grok-stt), model override

Julientalbot · 2026-04-18T11:51:55Z

CI failures on this PR are unrelated to its scope — both are pre-existing regressions on main:

1. test_no_single_field_categories fails because hermes_cli/config.py:775 defines a code_execution category with a single field (mode), violating the assertion count >= 2. Fix options: merge code_execution into another category via the web_server merge map, add a second field, or relax the test.

2. test_config_version_matches_current_schema fails because hermes_cli/config.py:805 bumped _config_version to 19 but tests/tools/test_browser_camofox_state.py:67 still hardcodes == 18. Fix: bump the test assertion to 19.

Both regressions pre-date this PR and affect any currently open PR against main. This PR only touches tools/transcription_tools.py and its test file — scope is fully independent.

Opening a separate focused PR to fix these two so main goes green again. Happy to rebase this one once that merges.

cetej

Solid PR — clean implementation following the existing provider pattern (mirrors _transcribe_mistral well), thorough test coverage (17 tests covering happy path, error handling, env/config fallback, dispatch), and correct multipart upload semantics. Verified tools/xai_http.hermes_xai_user_agent() exists upstream so the import resolves.

CI failures (test_no_single_field_categories, test_config_version_matches_current_schema) are pre-existing on main and orthogonal to this change — your separate fix PR plan is the right call.

A few small nits, none blocking:

Hardcoded language: "fr" default in _transcribe_xai (tools/transcription_tools.py)
The module already exports DEFAULT_LOCAL_STT_LANGUAGE = "en". The "fr" literal looks like a locale leak — consider:
```
language = str(
    xai_config.get("language")
    or os.getenv("HERMES_LOCAL_STT_LANGUAGE")
    or DEFAULT_LOCAL_STT_LANGUAGE
).strip()
```
Redundant default=True on the format flag:
```
use_format = is_truthy_value(xai_config.get("format", True), default=True)
```
.get("format", True) already returns True when the key is missing, so default=True is unreachable. Either drop the dict default or drop default=True — pick one source of truth.
Stale comment in _get_provider — the auto-detect comment still reads "local > groq > openai > mistral"; worth appending > xai to match the new behavior.

Optional cleanups:

_transcribe_xai(file_path, model_name) accepts model_name but never references it; the dispatch comment says "pass through for logging" but logger.info doesn't include it. Either log it or drop the parameter.
Minor doc inconsistency: docstring says "26 languages", PR description says "21 languages".
Error masking is asymmetric vs. Mistral (which only returns type(e).__name__); your version exposes the full exception, which is actually better for debugging — just flagging the inconsistency.

Security check is clean: API key from env, Bearer in header (not URL), no secret leakage in logs (only lang/duration/char count), reasonable 120s timeout.

Approving — happy to see this land once the three one-line nits are addressed (or even as-is, your call).

- Replace hardcoded 'fr' default with DEFAULT_LOCAL_STT_LANGUAGE ('en') — removes locale leak, matches other providers - Drop redundant default=True on is_truthy_value (dict .get already defaults) - Update auto-detect comment to include 'xai' in the chain - Fix docstring: 21 languages (match PR body + actual xAI API) - Update test_sends_language_and_format to set HERMES_LOCAL_STT_LANGUAGE=fr explicitly, since default is no longer 'fr' All 18 xAI STT tests pass locally.

Julientalbot · 2026-04-18T19:56:19Z

Thanks for the thorough review @cetej! Pushed bd40bac addressing all three nits:

1. Hardcoded fr default → replaced with DEFAULT_LOCAL_STT_LANGUAGE (en). Locale leak fixed, now matches the pattern used by _transcribe_local_command. Updated test_sends_language_and_format to explicitly set HERMES_LOCAL_STT_LANGUAGE=fr via monkeypatch (so the test exercises the override chain rather than depending on a locale default).

2. Redundant default=True → dropped. .get("format", True) is now the single source of truth; is_truthy_value just normalizes config strings ("false"/"no"/etc).

3. Stale auto-detect comment → updated to local > groq > openai > mistral > xai.

Bonus: fixed the docstring inconsistency (21 languages, matching the PR body and actual xAI API).

All 18 xAI STT tests pass locally. The model_name parameter — kept for signature consistency with the other _transcribe_* functions (they all follow (file_path, model_name) even when unused for some providers); can drop if you prefer.

The pre-existing CI failures should clear once #12139 merges.

Julientalbot · 2026-04-19T12:30:59Z

Closing — xAI media provider work is being consolidated through @Jaaneek's #10600 line (TTS in #10783, video/image/x_search in #10786). An STT entry is a natural follow-up to that track rather than a separate PR. Happy to revisit once the provider upgrades settle.

Julientalbot · 2026-04-23T05:11:10Z

Hi @cetej and NousResearch team,

Reopening this PR. Background: I had closed it few weeks weeks ago anticipating consolidation through @Jaaneek's broader xAI media provider track (#10600), but Jaaneek is currently on leave until mid-May.

Rather than letting the xAI STT contribution sit idle for another month, I'd like to land this independently so the Hermes community can benefit from native Grok STT support now. The code is review-clean (your nits addressed in bd40bac), tests pass, and the implementation follows existing provider patterns without core file modifications.

I'm also in active discussion with the xAI team about deeper Hermes-Grok integration — having STT merged upstream strengthens that relationship and gives users immediate value while the broader media track matures.

Happy to address any fresh feedback promptly. Thanks for considering.

— @Julientalbot

julientalbot-ergonomia added 2 commits April 18, 2026 15:07

Julientalbot mentioned this pull request Apr 18, 2026

fix(tests): merge code_execution category and sync hardcoded config version #12139

Closed

2 tasks

cetej approved these changes Apr 18, 2026

View reviewed changes

Julientalbot closed this Apr 19, 2026

Julientalbot reopened this Apr 23, 2026

teknium1 mentioned this pull request Apr 23, 2026

feat(stt): add xAI Grok STT provider #14473

Merged

teknium1 closed this in #14473 Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): add xAI Grok STT provider#12120

feat(stt): add xAI Grok STT provider#12120
Julientalbot wants to merge 3 commits into
NousResearch:mainfrom
Julientalbot:feat/xai-stt-provider

Julientalbot commented Apr 18, 2026 •

edited

Loading

Uh oh!

Julientalbot commented Apr 18, 2026

Uh oh!

cetej left a comment

Uh oh!

Julientalbot commented Apr 18, 2026

Uh oh!

Julientalbot commented Apr 19, 2026

Uh oh!

Julientalbot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Julientalbot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Auto-detect priority

Configuration

Testing

Files changed

xAI STT API reference

Uh oh!

Julientalbot commented Apr 18, 2026

Uh oh!

cetej left a comment

Choose a reason for hiding this comment

Uh oh!

Julientalbot commented Apr 18, 2026

Uh oh!

Julientalbot commented Apr 19, 2026

Uh oh!

Julientalbot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Julientalbot commented Apr 18, 2026 •

edited

Loading