Background
This is a follow-up to #1100 and PR #1110. The initial fix implemented honoring stt.enabled: false in config, but there are additional improvements needed for a more robust STT (Speech-to-Text) configuration experience.
Current State
PR #1110 added:
- Basic support for
stt.enabled: false to skip transcription entirely
- Default behavior unchanged (STT enabled by default)
Proposed Improvements
1. Pluggable STT Providers
Currently, the gateway hardcodes OpenAI Whisper as the STT provider. Users should be able to choose their preferred provider:
Desired providers:
- OpenAI Whisper (current default)
- Deepgram
- Local Whisper (run locally without external API calls)
Proposed config format:
stt:
enabled: true
provider: deepgram # Options: openai, deepgram, local
openai:
api_key: ${OPENAI_API_KEY}
model: whisper-1
deepgram:
api_key: ${DEEPGRAM_API_KEY}
model: nova-2
local:
model_path: /path/to/whisper/model
device: cuda # or cpu
2. Auto-disable When No API Key is Present
The gateway should gracefully handle missing API keys by:
- Detecting when STT is enabled but no valid API key is configured for the selected provider
- Logging a warning message explaining the situation
- Auto-disabling STT for that session (or permanently until fixed)
- Continuing to operate normally for text-based messages
This prevents the 401 errors described in #1100 when users don't have API keys configured.
3. Runtime Provider Selection (Optional Future)
Consider allowing per-message provider selection via command, e.g.:
/stt use deepgram
/stt use local
Acceptance Criteria
Related
Background
This is a follow-up to #1100 and PR #1110. The initial fix implemented honoring
stt.enabled: falsein config, but there are additional improvements needed for a more robust STT (Speech-to-Text) configuration experience.Current State
PR #1110 added:
stt.enabled: falseto skip transcription entirelyProposed Improvements
1. Pluggable STT Providers
Currently, the gateway hardcodes OpenAI Whisper as the STT provider. Users should be able to choose their preferred provider:
Desired providers:
Proposed config format:
2. Auto-disable When No API Key is Present
The gateway should gracefully handle missing API keys by:
This prevents the 401 errors described in #1100 when users don't have API keys configured.
3. Runtime Provider Selection (Optional Future)
Consider allowing per-message provider selection via command, e.g.:
/stt use deepgram/stt use localAcceptance Criteria
stt.providerfield with multiple provider optionsRelated