WhatsApp Voice Transcription Not Triggering (2026.3.1)

# Bug Report: WhatsApp Voice Message Transcription Not Triggering

## Summary

WhatsApp voice messages arrive but transcription pipeline never triggers. Audio files are received and stored, but `tools.media.audio` scope rules are not being evaluated correctly (or audio is not being routed to transcription at all).

## Version Information

- **OpenClaw Version:** 2026.3.1 (updated morning of 2026-03-02)
- **Previous Version:** Unknown (working before update)
- **OS:** Linux Mint 22.3 "Zena" (Ubuntu Noble based)
- **Node:** v22.22.0

## Timeline

1. **Before update:** Voice transcription working via WhatsApp with whisper.cpp
2. **Morning of 2026-03-02:** Updated OpenClaw to 2026.3.1
3. **Post-update:** Voice worked briefly
4. **~1:16 PM EST:** System experienced compaction lockup (~90 minutes unresponsive)
5. **After recovery:** Voice transcription broken
6. **Post-recovery:** Config reverted to backup, scope rules re-applied
7. **Current state:** Audio arrives, transcription never triggers

## Configuration

```json
{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "scope": {
          "default": "allow",
          "rules": [
            { "action": "allow", "match": { "chatType": "dm" } },
            { "action": "allow", "match": { "channel": "whatsapp" } }
          ]
        },
        "maxBytes": 20971520,
        "models": [
          {
            "type": "cli",
            "command": "/home/tom/bin/whisper.cpp/build/bin/whisper-cli",
            "args": [
              "-m",
              "/home/tom/bin/whisper.cpp/models/ggml-large-v3-turbo.bin",
              "-f",
              "{{MediaPath}}"
            ],
            "timeoutSeconds": 120
          }
        ]
      }
    }
  }
}
```

**Config validated:** ✅ JSON syntax valid  
**Config loaded:** ✅ `openclaw config get tools.media.audio` shows correct config  
**Gateway restarted:** ✅ Multiple times with clean restarts

## Evidence

### 1. Audio Files Arrive

WhatsApp voice messages are received and stored:
```
/home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg
```
MIME type: `audio/ogg; codecs=opus`

### 2. whisper.cpp Works Manually

Testing whisper.cpp directly on the same audio files:
```bash
$ /home/tom/bin/whisper.cpp/build/bin/whisper-cli \
  -m /home/tom/bin/whisper.cpp/models/ggml-large-v3-turbo.bin \
  -f /home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg

# Output: "i sure hope this fixes it"
```
✅ Transcription works perfectly (~10 seconds)

### 3. Gateway Logs Show Audio Arrives

From `openclaw logs`:
```
2026-03-02T21:11:23.086Z info web-inbound {"from":"+19412844426","body":"<media:audio>","mediaPath":"/home/tom/.openclaw/media/inbound/1edd552e-e05d-44b1-b51a-773215a2a715.ogg","mediaType":"audio/ogg; codecs=opus"}
2026-03-02T21:11:23.090Z info gateway/channels/whatsapp/inbound Inbound message +19412844426 -> +19412844426 (direct, audio/ogg; codecs=opus, 77 chars)
```

### 4. Zero Evidence of Transcription Attempt

**Critical:** Despite correct config and working whisper.cpp, logs show:
- ❌ No whisper.cpp execution
- ❌ No "transcription started" log entries
- ❌ No audio pipeline activity beyond file receipt
- ❌ No errors — transcription simply never attempted

## Expected Behavior

When WhatsApp voice message arrives:
1. OpenClaw detects audio attachment
2. Scope rules evaluated (should match: `chatType: "dm"`, `channel: "whatsapp"`)
3. whisper.cpp called with audio file path
4. Transcription replaces message body (or adds `{{Transcript}}`)
5. Command parsing runs on transcript

## Actual Behavior

1. ✅ Audio file received and stored
2. ❌ Transcription pipeline never invoked
3. ❌ Message processed as `<media:audio>` without transcript
4. ❌ Agent receives audio but no transcription

## Reproduction Steps

1. Configure OpenClaw 2026.3.1 with WhatsApp channel
2. Add `tools.media.audio` config with whisper.cpp CLI model
3. Add scope rules for DM + WhatsApp channel
4. Send voice message via WhatsApp
5. Observe: Audio file arrives, transcription never triggers
6. Check logs: No evidence of transcription attempt

## Workarounds Attempted

1. ✅ **Config validation:** JSON syntax correct, loaded by gateway
2. ✅ **Scope rules:** Tried `chatType: "dm"` (not `"private"`)
3. ✅ **Gateway restart:** Multiple clean restarts
4. ✅ **Config revert:** Restored to known-good backup from before incident
5. ✅ **Manual transcription test:** whisper.cpp works perfectly on audio files
6. ❌ **Result:** None of these fix the issue

## Hypothesis

**Post-compaction gateway state corruption OR bug in audio routing logic**

Possibilities:
1. OGG/Opus MIME type (`audio/ogg; codecs=opus`) not recognized by media detection
2. Scope rule evaluation bug (rules not matching despite correct syntax)
3. WhatsApp extension not passing audio to media pipeline
4. Gateway internal state corrupted during compaction incident (persisted across restarts)

## Logs Available

- Gateway logs showing audio arrival (no transcription attempt)
- Config dump from `openclaw config get tools.media.audio`
- Manual whisper.cpp test results (proves CLI works)
- Old backup configs where voice worked pre-update

## Request

Please investigate:
1. Is there a known issue with audio transcription in 2026.3.1?
2. Are OGG/Opus voice messages from WhatsApp supported?
3. Should scope rules with `chatType: "dm"` match WhatsApp DMs?
4. Is there additional debugging we can enable to trace audio pipeline?
5. Could compaction/gateway state corruption cause persistent audio routing failure?

## Additional Context

- **Backup configs exist** from before update where voice worked (can compare configs)
- **Compaction incident occurred** (~90 min lockup, survived multiple restarts)
- **Voice worked briefly post-update** before failing (suggests regression or state issue)
- **Multiple voice messages tested** — all fail to trigger transcription

---

**Contact:** @tomdailey55 (GitHub)  
**Date:** 2026-03-02  
**Severity:** High — breaks voice I/O for WhatsApp channel


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WhatsApp Voice Transcription Not Triggering (2026.3.1) #32200

Bug Report: WhatsApp Voice Message Transcription Not Triggering

Summary

Version Information

Timeline

Configuration

Evidence

1. Audio Files Arrive

2. whisper.cpp Works Manually

3. Gateway Logs Show Audio Arrives

4. Zero Evidence of Transcription Attempt

Expected Behavior

Actual Behavior

Reproduction Steps

Workarounds Attempted

Hypothesis

Logs Available

Request

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

WhatsApp Voice Transcription Not Triggering (2026.3.1) #32200

Description

Bug Report: WhatsApp Voice Message Transcription Not Triggering

Summary

Version Information

Timeline

Configuration

Evidence

1. Audio Files Arrive

2. whisper.cpp Works Manually

3. Gateway Logs Show Audio Arrives

4. Zero Evidence of Transcription Attempt

Expected Behavior

Actual Behavior

Reproduction Steps

Workarounds Attempted

Hypothesis

Logs Available

Request

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions