feat(voice): add Gemini STT and WhatsApp PTT support by mrlufepines · Pull Request #21540 · NousResearch/hermes-agent

mrlufepines · 2026-05-07T22:25:12Z

Add Gemini STT, Gemini TTS options, and WhatsApp PTT delivery

Problem

Hermes has WhatsApp and voice-related surfaces, but the end-to-end path for Gemini speech-to-text, configurable Gemini text-to-speech, and native WhatsApp push-to-talk delivery is incomplete.

Fix

Add three coordinated pieces:

tools/transcription_tools.py: add Gemini as a first-class STT provider using the Google Generative Language API.
tools/tts_tool.py: expose Gemini TTS model, voice, and output codec configuration, including OGG Opus output.
gateway/platforms/whatsapp.py: add a send_voice override that can deliver supported audio as WhatsApp PTT and fall back to the generic audio path when needed.

The change is opt-in. Existing STT, TTS, and send_audio behavior is unchanged when Gemini configuration is absent.

Configuration

GEMINI_STT_API_KEY=...
GEMINI_STT_MODEL=gemini-3-flash
GEMINI_TTS_API_KEY=...
GEMINI_TTS_MODEL=gemini-3.1-flash-tts-preview
GEMINI_TTS_VOICE=Kore

Files touched

tools/transcription_tools.py
tools/tts_tool.py
gateway/platforms/whatsapp.py

Verification

Syntax checked with python3 -m py_compile on the three touched files.
Suggested follow-up: add fixture-based STT and TTS tests plus a WhatsApp adapter test for send_voice.

Add Gemini as a first-class STT provider, expose configurable Gemini TTS options, and add a WhatsApp send_voice override for native PTT delivery. The changes are opt-in through environment variables and preserve existing STT, TTS, and send_audio behavior when Gemini configuration is absent.

alt-glitch added type/feature New feature or request platform/whatsapp WhatsApp Business adapter tool/tts Text-to-speech and transcription P2 Medium — degraded but workaround exists labels May 7, 2026

kshitijk4poor mentioned this pull request May 22, 2026

feat(stt): add register_transcription_provider() plugin hook #30493

Closed

7 tasks

teknium1 mentioned this pull request May 25, 2026

feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493) #31907

Merged

mrlufepines force-pushed the feat/gemini-voice-whatsapp branch from 0dc8879 to 7460a90 Compare May 30, 2026 15:34

mrlufepines changed the title ~~feat(voice): Gemini STT + TTS + WhatsApp PTT delivery~~ feat(voice): add Gemini STT and WhatsApp PTT support May 30, 2026

mrlufepines closed this May 30, 2026

mrlufepines deleted the feat/gemini-voice-whatsapp branch May 30, 2026 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): add Gemini STT and WhatsApp PTT support#21540

feat(voice): add Gemini STT and WhatsApp PTT support#21540
mrlufepines wants to merge 1 commit into
NousResearch:mainfrom
mrlufepines:feat/gemini-voice-whatsapp

mrlufepines commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrlufepines commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Gemini STT, Gemini TTS options, and WhatsApp PTT delivery

Problem

Fix

Configuration

Files touched

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mrlufepines commented May 7, 2026 •

edited

Loading