WhatsApp voice messages broken on all model providers - audio sent as image with undefined MIME type

# Bug Report: WhatsApp Voice Messages Broken on All Model Providers

## Summary

WhatsApp voice messages fail on all model providers (Google Antigravity, OpenAI, Anthropic/OpenRouter) because OpenClaw sends audio content with incorrect MIME type formatting, treating audio as image content.

## Environment

- **OpenClaw Version**: 2026.2.9
- **OS**: Ubuntu 22.04 (Linux 5.15.0-151-generic x86_64)
- **Node**: 22.22.0
- **WhatsApp Channel**: Baileys (web)

## Steps to Reproduce

1. Configure OpenClaw with WhatsApp channel
2. Set any model provider as primary (tested: google-antigravity, openai, openrouter)
3. Send a voice message from WhatsApp (phone or desktop)
4. Observe error in response

## Expected Behavior

Voice messages should either:
1. Be transcribed automatically before sending to models that don't support audio, OR
2. Be sent with correct audio MIME type to models that do support audio (e.g., GPT-4o)

## Actual Behavior

OpenClaw sends audio content as if it were an image, with wrong/undefined MIME type:

### Error Messages by Provider

| Provider | Error |
|----------|-------|
| **Google Antigravity** | `Cloud Code Assist API error (400): Unsupported MIME type:` |
| **OpenAI GPT-4o** | `HTTP 400: Invalid 'input[1].content[0].image_url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,...'), but got unsupported MIME type 'undefined'.` |
| **OpenRouter/Claude** | `messages.0.content.0.image.source.base64.media_type: Input should be 'image/jpeg', 'image/gif', 'image/webp' or 'image/png'` |

## Technical Details

- WhatsApp voice messages arrive as `audio/ogg; codecs=opus`
- Files are correctly downloaded to `~/.openclaw/media/inbound/*.ogg`
- When building the API request, OpenClaw appears to:
  1. Detect media content
  2. Treat it as an image (wrong content type handling)
  3. Set MIME type incorrectly or as `undefined`
  4. Send to model API, which rejects it

## Log Evidence

```
Inbound message +447432727972 -> +447432727972 (direct, audio/ogg; codecs=opus, 67 chars)
```

The audio is correctly identified on inbound, but incorrectly formatted on outbound to model API.

## Suggested Fix

Option A: **Auto-transcription**
- Add config option: `channels.whatsapp.transcribeAudio: true`
- When enabled, auto-invoke `openai-whisper-api` skill for voice messages before sending to model
- Send transcribed text instead of audio content

Option B: **Proper audio content handling**
- Detect audio MIME types (`audio/*`)
- For models with native audio support (GPT-4o, Gemini), send with correct audio content type
- For models without audio support, fall back to transcription

## Workaround

None currently available. Text messages work fine; only voice messages are affected.

## Additional Context

- The `openai-whisper-api` skill is installed and working for manual transcription
- A `transcribe-with-retry.sh` script exists but isn't auto-invoked
- This affects both phone and desktop WhatsApp voice messages
- Issue started appearing after testing multiple model providers

## Related

- WhatsApp media download 0-byte issue (intermittent, separate bug)


Provider	Error
Google Antigravity	`Cloud Code Assist API error (400): Unsupported MIME type:`
OpenAI GPT-4o	`HTTP 400: Invalid 'input[1].content[0].image_url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,...'), but got unsupported MIME type 'undefined'.`
OpenRouter/Claude	`messages.0.content.0.image.source.base64.media_type: Input should be 'image/jpeg', 'image/gif', 'image/webp' or 'image/png'`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WhatsApp voice messages broken on all model providers - audio sent as image with undefined MIME type #13924

Bug Report: WhatsApp Voice Messages Broken on All Model Providers

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Messages by Provider

Technical Details

Log Evidence

Suggested Fix

Workaround

Additional Context

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

WhatsApp voice messages broken on all model providers - audio sent as image with undefined MIME type #13924

Description

Bug Report: WhatsApp Voice Messages Broken on All Model Providers

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Messages by Provider

Technical Details

Log Evidence

Suggested Fix

Workaround

Additional Context

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions