feat(voice): add Gemini STT and WhatsApp PTT support#21540
Closed
mrlufepines wants to merge 1 commit into
Closed
Conversation
7 tasks
Add Gemini as a first-class STT provider, expose configurable Gemini TTS options, and add a WhatsApp send_voice override for native PTT delivery. The changes are opt-in through environment variables and preserve existing STT, TTS, and send_audio behavior when Gemini configuration is absent.
0dc8879 to
7460a90
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Gemini STT, Gemini TTS options, and WhatsApp PTT delivery
Problem
Hermes has WhatsApp and voice-related surfaces, but the end-to-end path for Gemini speech-to-text, configurable Gemini text-to-speech, and native WhatsApp push-to-talk delivery is incomplete.
Fix
Add three coordinated pieces:
tools/transcription_tools.py: add Gemini as a first-class STT provider using the Google Generative Language API.tools/tts_tool.py: expose Gemini TTS model, voice, and output codec configuration, including OGG Opus output.gateway/platforms/whatsapp.py: add asend_voiceoverride that can deliver supported audio as WhatsApp PTT and fall back to the generic audio path when needed.The change is opt-in. Existing STT, TTS, and
send_audiobehavior is unchanged when Gemini configuration is absent.Configuration
Files touched
tools/transcription_tools.pytools/tts_tool.pygateway/platforms/whatsapp.pyVerification
python3 -m py_compileon the three touched files.send_voice.