Skip to content

[Feature]: First-class Soniox STT provider support #22428

@materemias

Description

@materemias

Problem or Use Case

Hermes today supports Whisper-family STT backends (OpenAI Whisper, Groq Whisper, local faster-whisper, generic HERMES_LOCAL_STT_COMMAND). For multilingual voice workflows (Telegram/Discord/CLI) and low-latency streaming, Soniox offers a stronger product:

  • real-time WebSocket STT with partial results
  • automatic language ID + code-switching across 60+ languages in a single stream (no need to pre-declare locale)
  • speaker diarization
  • built-in PII redaction
  • async batch endpoint as well

There is no first-class config path today. Users would have to wrap Soniox behind HERMES_LOCAL_STT_COMMAND, which loses streaming/partials and the OpenAI-shaped response surface that downstream code expects.

Proposed Solution

Add soniox as a provider option alongside openai / groq / local. Config sketch:

stt:
  enabled: true
  provider: soniox
  soniox:
    api_key: ${SONIOX_API_KEY}
    model: stt-rt-preview        # or stt-async-preview for batch
    language_hints: [en, hu]     # optional; auto-detect when omitted
    enable_speaker_diarization: false
    enable_endpoint_detection: true

Endpoints:

  • realtime: wss://stt-rt.soniox.com/transcribe-websocket
  • async: https://api.soniox.com/v1/transcriptions

Adapter would normalize Soniox tokens[] (with per-token language + speaker) into the existing transcript shape Hermes consumes.

Alternatives Considered

  1. Generic HERMES_LOCAL_STT_COMMAND wrapper — works for batch, but drops streaming/partials and forces a custom JSON-shape translation per user.
  2. Stick with Whisper — fine for English, weaker for code-switched speech and higher latency than Soniox realtime.
  3. Deepgram (already on the Follow-up: Pluggable STT providers and auto-disable when no API key #1166 wishlist) — comparable capability; this request is additive, not a replacement.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/configConfig system, migrations, profilestype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions