Skip to content

WhatsApp Voice Transcription Not Triggering (2026.3.1) #32200

@tomdailey55

Description

@tomdailey55

Bug Report: WhatsApp Voice Message Transcription Not Triggering

Summary

WhatsApp voice messages arrive but transcription pipeline never triggers. Audio files are received and stored, but tools.media.audio scope rules are not being evaluated correctly (or audio is not being routed to transcription at all).

Version Information

  • OpenClaw Version: 2026.3.1 (updated morning of 2026-03-02)
  • Previous Version: Unknown (working before update)
  • OS: Linux Mint 22.3 "Zena" (Ubuntu Noble based)
  • Node: v22.22.0

Timeline

  1. Before update: Voice transcription working via WhatsApp with whisper.cpp
  2. Morning of 2026-03-02: Updated OpenClaw to 2026.3.1
  3. Post-update: Voice worked briefly
  4. ~1:16 PM EST: System experienced compaction lockup (~90 minutes unresponsive)
  5. After recovery: Voice transcription broken
  6. Post-recovery: Config reverted to backup, scope rules re-applied
  7. Current state: Audio arrives, transcription never triggers

Configuration

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "scope": {
          "default": "allow",
          "rules": [
            { "action": "allow", "match": { "chatType": "dm" } },
            { "action": "allow", "match": { "channel": "whatsapp" } }
          ]
        },
        "maxBytes": 20971520,
        "models": [
          {
            "type": "cli",
            "command": "/home/tom/bin/whisper.cpp/build/bin/whisper-cli",
            "args": [
              "-m",
              "/home/tom/bin/whisper.cpp/models/ggml-large-v3-turbo.bin",
              "-f",
              "{{MediaPath}}"
            ],
            "timeoutSeconds": 120
          }
        ]
      }
    }
  }
}

Config validated: ✅ JSON syntax valid
Config loaded:openclaw config get tools.media.audio shows correct config
Gateway restarted: ✅ Multiple times with clean restarts

Evidence

1. Audio Files Arrive

WhatsApp voice messages are received and stored:

/home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg

MIME type: audio/ogg; codecs=opus

2. whisper.cpp Works Manually

Testing whisper.cpp directly on the same audio files:

$ /home/tom/bin/whisper.cpp/build/bin/whisper-cli \
  -m /home/tom/bin/whisper.cpp/models/ggml-large-v3-turbo.bin \
  -f /home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg

# Output: "i sure hope this fixes it"

✅ Transcription works perfectly (~10 seconds)

3. Gateway Logs Show Audio Arrives

From openclaw logs:

2026-03-02T21:11:23.086Z info web-inbound {"from":"+19412844426","body":"<media:audio>","mediaPath":"/home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg","mediaType":"audio/ogg; codecs=opus"}
2026-03-02T21:11:23.090Z info gateway/channels/whatsapp/inbound Inbound message +19412844426 -> +19412844426 (direct, audio/ogg; codecs=opus, 77 chars)

4. Zero Evidence of Transcription Attempt

Critical: Despite correct config and working whisper.cpp, logs show:

  • ❌ No whisper.cpp execution
  • ❌ No "transcription started" log entries
  • ❌ No audio pipeline activity beyond file receipt
  • ❌ No errors — transcription simply never attempted

Expected Behavior

When WhatsApp voice message arrives:

  1. OpenClaw detects audio attachment
  2. Scope rules evaluated (should match: chatType: "dm", channel: "whatsapp")
  3. whisper.cpp called with audio file path
  4. Transcription replaces message body (or adds {{Transcript}})
  5. Command parsing runs on transcript

Actual Behavior

  1. ✅ Audio file received and stored
  2. ❌ Transcription pipeline never invoked
  3. ❌ Message processed as <media:audio> without transcript
  4. ❌ Agent receives audio but no transcription

Reproduction Steps

  1. Configure OpenClaw 2026.3.1 with WhatsApp channel
  2. Add tools.media.audio config with whisper.cpp CLI model
  3. Add scope rules for DM + WhatsApp channel
  4. Send voice message via WhatsApp
  5. Observe: Audio file arrives, transcription never triggers
  6. Check logs: No evidence of transcription attempt

Workarounds Attempted

  1. Config validation: JSON syntax correct, loaded by gateway
  2. Scope rules: Tried chatType: "dm" (not "private")
  3. Gateway restart: Multiple clean restarts
  4. Config revert: Restored to known-good backup from before incident
  5. Manual transcription test: whisper.cpp works perfectly on audio files
  6. Result: None of these fix the issue

Hypothesis

Post-compaction gateway state corruption OR bug in audio routing logic

Possibilities:

  1. OGG/Opus MIME type (audio/ogg; codecs=opus) not recognized by media detection
  2. Scope rule evaluation bug (rules not matching despite correct syntax)
  3. WhatsApp extension not passing audio to media pipeline
  4. Gateway internal state corrupted during compaction incident (persisted across restarts)

Logs Available

  • Gateway logs showing audio arrival (no transcription attempt)
  • Config dump from openclaw config get tools.media.audio
  • Manual whisper.cpp test results (proves CLI works)
  • Old backup configs where voice worked pre-update

Request

Please investigate:

  1. Is there a known issue with audio transcription in 2026.3.1?
  2. Are OGG/Opus voice messages from WhatsApp supported?
  3. Should scope rules with chatType: "dm" match WhatsApp DMs?
  4. Is there additional debugging we can enable to trace audio pipeline?
  5. Could compaction/gateway state corruption cause persistent audio routing failure?

Additional Context

  • Backup configs exist from before update where voice worked (can compare configs)
  • Compaction incident occurred (~90 min lockup, survived multiple restarts)
  • Voice worked briefly post-update before failing (suggests regression or state issue)
  • Multiple voice messages tested — all fail to trigger transcription

Contact: @tomdailey55 (GitHub)
Date: 2026-03-02
Severity: High — breaks voice I/O for WhatsApp channel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions