Skip to content

Discord voice messages not transcribed (audio pipeline not triggered) #30034

@xandorklein

Description

@xandorklein

Bug / Feature Gap

Environment: OpenClaw latest (npm), macOS, Discord channel with bot API

Audio config:

{
  "provider": "openai",
  "model": "gpt-4o-mini-transcribe"
}

With whisper-cli fallback. Works perfectly for Telegram voice notes.

Problem: Discord voice messages are received by the agent but the audio transcription pipeline is never triggered. Zero audio/transcription log entries appear. The agent receives the message but cannot access or transcribe the audio content — it only sees an attachment reference.

Expected: Discord voice messages (OGG/Opus attachments with flags & (1 << 13) aka IS_VOICE_MESSAGE) should be detected as audio and routed through tools.media.audio.models for transcription, same as Telegram voice notes.

Evidence from logs:

  • Telegram VM at same time: transcribed successfully via OpenAI, 🎤 Heard: prefix in message
  • Discord VM at same time: no transcription logs, agent responds "Can't hear audio"
  • Audio file saved to ~/.openclaw/media/inbound/ for Telegram but not for Discord

Workaround: Send voice messages via Telegram instead of Discord, or type messages out.

Discord voice message format: Since 2023, Discord supports voice messages as message attachments with content_type audio/ogg and the IS_VOICE_MESSAGE flag (bit 13) on the attachment. They also include a waveform field and duration_secs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions