-
-
Notifications
You must be signed in to change notification settings - Fork 52.7k
Description
Bug / Feature Gap
Environment: OpenClaw latest (npm), macOS, Discord channel with bot API
Audio config:
{
"provider": "openai",
"model": "gpt-4o-mini-transcribe"
}With whisper-cli fallback. Works perfectly for Telegram voice notes.
Problem: Discord voice messages are received by the agent but the audio transcription pipeline is never triggered. Zero audio/transcription log entries appear. The agent receives the message but cannot access or transcribe the audio content — it only sees an attachment reference.
Expected: Discord voice messages (OGG/Opus attachments with flags & (1 << 13) aka IS_VOICE_MESSAGE) should be detected as audio and routed through tools.media.audio.models for transcription, same as Telegram voice notes.
Evidence from logs:
- Telegram VM at same time: transcribed successfully via OpenAI,
🎤 Heard:prefix in message - Discord VM at same time: no transcription logs, agent responds "Can't hear audio"
- Audio file saved to
~/.openclaw/media/inbound/for Telegram but not for Discord
Workaround: Send voice messages via Telegram instead of Discord, or type messages out.
Discord voice message format: Since 2023, Discord supports voice messages as message attachments with content_type audio/ogg and the IS_VOICE_MESSAGE flag (bit 13) on the attachment. They also include a waveform field and duration_secs.