Skip to content

Telegram voice messages not auto-transcribed despite tools.media.audio.enabled config (Windows) #22554

@ModalityLabs

Description

@ModalityLabs

Bug

Voice messages (OGG/Opus) sent via Telegram arrive as raw audio attachments and are NOT auto-transcribed, despite having tools.media.audio properly configured.

Config

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          { "provider": "openai", "model": "gpt-4o-mini-transcribe" }
        ]
      }
    }
  }
}

Verified via openclaw config get tools.media — config is loaded correctly.

Environment

  • OS: Windows 10 (10.0.19045) x64
  • OpenClaw: 2026.2.19-2
  • Node: v24.13.0
  • Channel: Telegram (bot, working correctly for text messages)
  • OpenAI API key: Valid and working (manual Whisper API transcription succeeds with the same key)

Steps to Reproduce

  1. Set tools.media.audio.enabled: true with OpenAI provider model
  2. Restart gateway (openclaw gateway restart)
  3. Send a voice message via Telegram to the bot
  4. Voice message arrives as <media:audio> with raw OGG file attachment
  5. No transcription occurs — Body is not replaced with transcript

Expected Behavior

Voice message should be auto-transcribed and the transcript should replace the message body, per the docs at https://docs.openclaw.ai/nodes/audio

Workaround

Manual transcription via ffmpeg + OpenAI Whisper API works fine:

ffmpeg -y -i input.ogg output.wav
# Then POST to /v1/audio/transcriptions with whisper-1

Related

Possibly related to #7899 (Telegram voice messages not transcribed - applyMediaUnderstanding not called)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions