-
-
Notifications
You must be signed in to change notification settings - Fork 54.5k
Description
Bug Report: WhatsApp Voice Message Transcription Not Triggering
Summary
WhatsApp voice messages arrive but transcription pipeline never triggers. Audio files are received and stored, but tools.media.audio scope rules are not being evaluated correctly (or audio is not being routed to transcription at all).
Version Information
- OpenClaw Version: 2026.3.1 (updated morning of 2026-03-02)
- Previous Version: Unknown (working before update)
- OS: Linux Mint 22.3 "Zena" (Ubuntu Noble based)
- Node: v22.22.0
Timeline
- Before update: Voice transcription working via WhatsApp with whisper.cpp
- Morning of 2026-03-02: Updated OpenClaw to 2026.3.1
- Post-update: Voice worked briefly
- ~1:16 PM EST: System experienced compaction lockup (~90 minutes unresponsive)
- After recovery: Voice transcription broken
- Post-recovery: Config reverted to backup, scope rules re-applied
- Current state: Audio arrives, transcription never triggers
Configuration
{
"tools": {
"media": {
"audio": {
"enabled": true,
"scope": {
"default": "allow",
"rules": [
{ "action": "allow", "match": { "chatType": "dm" } },
{ "action": "allow", "match": { "channel": "whatsapp" } }
]
},
"maxBytes": 20971520,
"models": [
{
"type": "cli",
"command": "/home/tom/bin/whisper.cpp/build/bin/whisper-cli",
"args": [
"-m",
"/home/tom/bin/whisper.cpp/models/ggml-large-v3-turbo.bin",
"-f",
"{{MediaPath}}"
],
"timeoutSeconds": 120
}
]
}
}
}
}Config validated: ✅ JSON syntax valid
Config loaded: ✅ openclaw config get tools.media.audio shows correct config
Gateway restarted: ✅ Multiple times with clean restarts
Evidence
1. Audio Files Arrive
WhatsApp voice messages are received and stored:
/home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg
MIME type: audio/ogg; codecs=opus
2. whisper.cpp Works Manually
Testing whisper.cpp directly on the same audio files:
$ /home/tom/bin/whisper.cpp/build/bin/whisper-cli \
-m /home/tom/bin/whisper.cpp/models/ggml-large-v3-turbo.bin \
-f /home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg
# Output: "i sure hope this fixes it"✅ Transcription works perfectly (~10 seconds)
3. Gateway Logs Show Audio Arrives
From openclaw logs:
2026-03-02T21:11:23.086Z info web-inbound {"from":"+19412844426","body":"<media:audio>","mediaPath":"/home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg","mediaType":"audio/ogg; codecs=opus"}
2026-03-02T21:11:23.090Z info gateway/channels/whatsapp/inbound Inbound message +19412844426 -> +19412844426 (direct, audio/ogg; codecs=opus, 77 chars)
4. Zero Evidence of Transcription Attempt
Critical: Despite correct config and working whisper.cpp, logs show:
- ❌ No whisper.cpp execution
- ❌ No "transcription started" log entries
- ❌ No audio pipeline activity beyond file receipt
- ❌ No errors — transcription simply never attempted
Expected Behavior
When WhatsApp voice message arrives:
- OpenClaw detects audio attachment
- Scope rules evaluated (should match:
chatType: "dm",channel: "whatsapp") - whisper.cpp called with audio file path
- Transcription replaces message body (or adds
{{Transcript}}) - Command parsing runs on transcript
Actual Behavior
- ✅ Audio file received and stored
- ❌ Transcription pipeline never invoked
- ❌ Message processed as
<media:audio>without transcript - ❌ Agent receives audio but no transcription
Reproduction Steps
- Configure OpenClaw 2026.3.1 with WhatsApp channel
- Add
tools.media.audioconfig with whisper.cpp CLI model - Add scope rules for DM + WhatsApp channel
- Send voice message via WhatsApp
- Observe: Audio file arrives, transcription never triggers
- Check logs: No evidence of transcription attempt
Workarounds Attempted
- ✅ Config validation: JSON syntax correct, loaded by gateway
- ✅ Scope rules: Tried
chatType: "dm"(not"private") - ✅ Gateway restart: Multiple clean restarts
- ✅ Config revert: Restored to known-good backup from before incident
- ✅ Manual transcription test: whisper.cpp works perfectly on audio files
- ❌ Result: None of these fix the issue
Hypothesis
Post-compaction gateway state corruption OR bug in audio routing logic
Possibilities:
- OGG/Opus MIME type (
audio/ogg; codecs=opus) not recognized by media detection - Scope rule evaluation bug (rules not matching despite correct syntax)
- WhatsApp extension not passing audio to media pipeline
- Gateway internal state corrupted during compaction incident (persisted across restarts)
Logs Available
- Gateway logs showing audio arrival (no transcription attempt)
- Config dump from
openclaw config get tools.media.audio - Manual whisper.cpp test results (proves CLI works)
- Old backup configs where voice worked pre-update
Request
Please investigate:
- Is there a known issue with audio transcription in 2026.3.1?
- Are OGG/Opus voice messages from WhatsApp supported?
- Should scope rules with
chatType: "dm"match WhatsApp DMs? - Is there additional debugging we can enable to trace audio pipeline?
- Could compaction/gateway state corruption cause persistent audio routing failure?
Additional Context
- Backup configs exist from before update where voice worked (can compare configs)
- Compaction incident occurred (~90 min lockup, survived multiple restarts)
- Voice worked briefly post-update before failing (suggests regression or state issue)
- Multiple voice messages tested — all fail to trigger transcription
Contact: @tomdailey55 (GitHub)
Date: 2026-03-02
Severity: High — breaks voice I/O for WhatsApp channel