Bug Description
When connected to a Discord voice channel, the TTS pipeline completes successfully — [Discord] Playing TTS in voice channel is logged, ffmpeg exits with code 0, and the audio file is valid (ElevenLabs confirms generation). However, the user hears nothing in the voice channel. The bot avatar shows no green speaking ring during playback, suggesting Discord is rejecting or not routing the audio stream.
This is already documented as Bug #3 in the discord-voice-troubleshooting skill.
Environment
- Hermes Agent: current upstream main as of 2026-04-27
- Platform: Discord gateway
- TTS Provider: ElevenLabs
- STT: faster-whisper (local)
- session_reset.mode: none
- OS: Ubuntu, host "Neuromancer"
Steps to Reproduce
- Connect bot to a Discord voice channel (
/voice channel or /voice join)
- Speak to the bot in VC — STT transcribes successfully (confirmed in logs)
- Bot generates text response → gateway auto-TTS triggers
play_in_voice_channel()
- Observe: ffmpeg runs and exits 0, but user hears nothing
Expected Behavior
TTS audio should be audible in the voice channel. Discord should show a green speaking ring around the bot avatar during playback.
Actual Behavior
[Discord] Playing TTS in voice channel logged ✓
ffmpeg process ... successfully terminated with return code of 0 logged ✓
- ElevenLabs confirms audio generation ✓
- User hears silence ✗
- No green speaking ring on bot avatar ✗
Log Evidence
2026-04-27 12:31:34,865 INFO gateway.platforms.discord: [Discord] Playing TTS in voice channel (guild=1487436095632441475)
2026-04-27 12:31:51,969 INFO discord.player: ffmpeg process 1273260 successfully terminated with return code of 0.
2026-04-27 12:36:40,324 INFO gateway.platforms.discord: [Discord] Playing TTS in voice channel (guild=1487436095632441475)
2026-04-27 12:36:43,688 INFO discord.player: ffmpeg process 1288621 successfully terminated with return code of 0.
Both instances produced silence for the user.
Suspected Areas
The pipeline from ffmpeg output → Discord voice socket appears to be the gap:
- Discord SPEAKING opcode (op 5) — the bot may not be sending the SPEAKING indicator before streaming audio, causing Discord to ignore the stream
- Channel-level SPEAK permission override — even with correct server invite permissions (274881432640), individual voice channels can override. But the green ring absence suggests Discord never receives the speaking signal, not a permission mute
- Bot volume — Discord client sometimes defaults bot volume to 0. However, the lack of green ring points upstream of this
Related
Question
Is the SPEAKING opcode handshake properly implemented in the play_in_voice_channel path? The absence of the green ring strongly suggests Discord never receives the speaking indicator before/during the audio stream.
Bug Description
When connected to a Discord voice channel, the TTS pipeline completes successfully —
[Discord] Playing TTS in voice channelis logged,ffmpegexits with code 0, and the audio file is valid (ElevenLabs confirms generation). However, the user hears nothing in the voice channel. The bot avatar shows no green speaking ring during playback, suggesting Discord is rejecting or not routing the audio stream.This is already documented as Bug #3 in the
discord-voice-troubleshootingskill.Environment
Steps to Reproduce
/voice channelor/voice join)play_in_voice_channel()Expected Behavior
TTS audio should be audible in the voice channel. Discord should show a green speaking ring around the bot avatar during playback.
Actual Behavior
[Discord] Playing TTS in voice channellogged ✓ffmpeg process ... successfully terminated with return code of 0logged ✓Log Evidence
Both instances produced silence for the user.
Suspected Areas
The pipeline from
ffmpegoutput → Discord voice socket appears to be the gap:Related
discord-voice-troubleshootingskill documents this exact scenario as Known Bug Architecture planning #3Question
Is the SPEAKING opcode handshake properly implemented in the
play_in_voice_channelpath? The absence of the green ring strongly suggests Discord never receives the speaking indicator before/during the audio stream.