Skip to content

feat(voice): add Gemini STT and WhatsApp PTT support#21540

Closed
mrlufepines wants to merge 1 commit into
NousResearch:mainfrom
mrlufepines:feat/gemini-voice-whatsapp
Closed

feat(voice): add Gemini STT and WhatsApp PTT support#21540
mrlufepines wants to merge 1 commit into
NousResearch:mainfrom
mrlufepines:feat/gemini-voice-whatsapp

Conversation

@mrlufepines

@mrlufepines mrlufepines commented May 7, 2026

Copy link
Copy Markdown

Add Gemini STT, Gemini TTS options, and WhatsApp PTT delivery

Problem

Hermes has WhatsApp and voice-related surfaces, but the end-to-end path for Gemini speech-to-text, configurable Gemini text-to-speech, and native WhatsApp push-to-talk delivery is incomplete.

Fix

Add three coordinated pieces:

  1. tools/transcription_tools.py: add Gemini as a first-class STT provider using the Google Generative Language API.
  2. tools/tts_tool.py: expose Gemini TTS model, voice, and output codec configuration, including OGG Opus output.
  3. gateway/platforms/whatsapp.py: add a send_voice override that can deliver supported audio as WhatsApp PTT and fall back to the generic audio path when needed.

The change is opt-in. Existing STT, TTS, and send_audio behavior is unchanged when Gemini configuration is absent.

Configuration

GEMINI_STT_API_KEY=...
GEMINI_STT_MODEL=gemini-3-flash
GEMINI_TTS_API_KEY=...
GEMINI_TTS_MODEL=gemini-3.1-flash-tts-preview
GEMINI_TTS_VOICE=Kore

Files touched

  • tools/transcription_tools.py
  • tools/tts_tool.py
  • gateway/platforms/whatsapp.py

Verification

  • Syntax checked with python3 -m py_compile on the three touched files.
  • Suggested follow-up: add fixture-based STT and TTS tests plus a WhatsApp adapter test for send_voice.

@alt-glitch alt-glitch added type/feature New feature or request platform/whatsapp WhatsApp Business adapter tool/tts Text-to-speech and transcription P2 Medium — degraded but workaround exists labels May 7, 2026
Add Gemini as a first-class STT provider, expose configurable Gemini TTS options, and add a WhatsApp send_voice override for native PTT delivery.

The changes are opt-in through environment variables and preserve existing STT, TTS, and send_audio behavior when Gemini configuration is absent.
@mrlufepines mrlufepines force-pushed the feat/gemini-voice-whatsapp branch from 0dc8879 to 7460a90 Compare May 30, 2026 15:34
@mrlufepines mrlufepines changed the title feat(voice): Gemini STT + TTS + WhatsApp PTT delivery feat(voice): add Gemini STT and WhatsApp PTT support May 30, 2026
@mrlufepines mrlufepines deleted the feat/gemini-voice-whatsapp branch May 30, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists platform/whatsapp WhatsApp Business adapter tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants