feat(stt): add xAI Grok STT provider#12120
Conversation
Add xAI as a sixth STT provider using the POST /v1/stt endpoint.
Features:
- Multipart/form-data upload to api.x.ai/v1/stt
- Inverse Text Normalization (ITN) via format=true (default)
- Optional diarization via config (stt.xai.diarize)
- Language configuration (default: fr, overridable via config or env)
- Custom base_url support (XAI_STT_BASE_URL env or stt.xai.base_url)
- Full provider integration: explicit config + auto-detect fallback chain
- Consistent error handling matching existing provider patterns
Config (config.yaml):
stt:
provider: xai
xai:
language: fr
format: true
diarize: false
base_url: https://api.x.ai/v1 # optional override
Auto-detect priority: local > groq > openai > mistral > xai > none
Covers: - _transcribe_xai: no key, successful transcription, whitespace stripping, API error (HTTP 400), empty transcript, permission error, network error, language/format params sent, custom base_url, diarize config - _get_provider xAI: key set, no key, auto-detect after mistral, mistral preferred over xai, no key returns none - transcribe_audio xAI dispatch: dispatch, default model (grok-stt), model override
|
CI failures on this PR are unrelated to its scope — both are pre-existing regressions on 1. 2. Both regressions pre-date this PR and affect any currently open PR against Opening a separate focused PR to fix these two so main goes green again. Happy to rebase this one once that merges. |
cetej
left a comment
There was a problem hiding this comment.
Solid PR — clean implementation following the existing provider pattern (mirrors _transcribe_mistral well), thorough test coverage (17 tests covering happy path, error handling, env/config fallback, dispatch), and correct multipart upload semantics. Verified tools/xai_http.hermes_xai_user_agent() exists upstream so the import resolves.
CI failures (test_no_single_field_categories, test_config_version_matches_current_schema) are pre-existing on main and orthogonal to this change — your separate fix PR plan is the right call.
A few small nits, none blocking:
-
Hardcoded
language: "fr"default in_transcribe_xai(tools/transcription_tools.py)
The module already exportsDEFAULT_LOCAL_STT_LANGUAGE = "en". The"fr"literal looks like a locale leak — consider:language = str( xai_config.get("language") or os.getenv("HERMES_LOCAL_STT_LANGUAGE") or DEFAULT_LOCAL_STT_LANGUAGE ).strip()
-
Redundant
default=Trueon theformatflag:use_format = is_truthy_value(xai_config.get("format", True), default=True)
.get("format", True)already returnsTruewhen the key is missing, sodefault=Trueis unreachable. Either drop the dict default or dropdefault=True— pick one source of truth. -
Stale comment in
_get_provider— the auto-detect comment still reads"local > groq > openai > mistral"; worth appending> xaito match the new behavior.
Optional cleanups:
_transcribe_xai(file_path, model_name)acceptsmodel_namebut never references it; the dispatch comment says "pass through for logging" butlogger.infodoesn't include it. Either log it or drop the parameter.- Minor doc inconsistency: docstring says "26 languages", PR description says "21 languages".
- Error masking is asymmetric vs. Mistral (which only returns
type(e).__name__); your version exposes the full exception, which is actually better for debugging — just flagging the inconsistency.
Security check is clean: API key from env, Bearer in header (not URL), no secret leakage in logs (only lang/duration/char count), reasonable 120s timeout.
Approving — happy to see this land once the three one-line nits are addressed (or even as-is, your call).
- Replace hardcoded 'fr' default with DEFAULT_LOCAL_STT_LANGUAGE ('en')
— removes locale leak, matches other providers
- Drop redundant default=True on is_truthy_value (dict .get already defaults)
- Update auto-detect comment to include 'xai' in the chain
- Fix docstring: 21 languages (match PR body + actual xAI API)
- Update test_sends_language_and_format to set HERMES_LOCAL_STT_LANGUAGE=fr
explicitly, since default is no longer 'fr'
All 18 xAI STT tests pass locally.
|
Thanks for the thorough review @cetej! Pushed bd40bac addressing all three nits: 1. Hardcoded 2. Redundant 3. Stale auto-detect comment → updated to Bonus: fixed the docstring inconsistency (21 languages, matching the PR body and actual xAI API). All 18 xAI STT tests pass locally. The The pre-existing CI failures should clear once #12139 merges. |
|
Hi @cetej and NousResearch team, Reopening this PR. Background: I had closed it few weeks weeks ago anticipating consolidation through @Jaaneek's broader xAI media provider track (#10600), but Jaaneek is currently on leave until mid-May. Rather than letting the xAI STT contribution sit idle for another month, I'd like to land this independently so the Hermes community can benefit from native Grok STT support now. The code is review-clean (your nits addressed in bd40bac), tests pass, and the implementation follows existing provider patterns without core file modifications. I'm also in active discussion with the xAI team about deeper Hermes-Grok integration — having STT merged upstream strengthens that relationship and gives users immediate value while the broader media track matures. Happy to address any fresh feedback promptly. Thanks for considering. |
Summary
Add xAI as a sixth STT provider using the
POST /v1/sttendpoint with multipart/form-data.Features
format=true(default on)stt.xai.diarizeconfigfr, overridable via config orHERMES_LOCAL_STT_LANGUAGEenv)XAI_STT_BASE_URLenv orstt.xai.base_urlconfig)Auto-detect priority
local→groq→openai→mistral→xai→noneConfiguration
Testing
Files changed
tools/transcription_tools.py— xAI provider implementation (+120 lines)tests/tools/test_transcription_tools.py— unit tests (+256 lines)xAI STT API reference
POST https://api.x.ai/v1/sttBearertoken viaXAI_API_KEYfile+ optionallanguage,format,diarize){"text": "...", "language": "fr", "duration": 3.2}