fix(media): normalize MIME kind detection for WhatsApp audio transcription by Lucenx9 · Pull Request #32280 · openclaw/openclaw

Lucenx9 · 2026-03-02T23:03:09Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem: media attachment kind detection used raw MIME strings, which could fail on MIME values with mixed casing/whitespace and parameters.
Why it matters: when MIME classification fails and filename extension is missing/ambiguous, audio attachments can be skipped and transcription never starts.
What changed: normalized MIME before kindFromMime classification, and added a regression test for WhatsApp-style audio/ogg; codecs=opus with scope rules (chatType: "dm", channel: "whatsapp").
What did NOT change (scope boundary): no change to provider selection, execution model order, or WhatsApp transport/download path.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related WhatsApp Voice Transcription Not Triggering (2026.3.1) #32200

User-visible / Behavior Changes

Audio transcription trigger is now more robust when inbound MIME contains casing/whitespace/parameter variations.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: Ubuntu 24.04 (dev env)
Runtime/container: Node 22 + pnpm
Model/provider: mocked provider in tests (Vitest)
Integration/channel (if any): WhatsApp-style MIME scenario in media-understanding tests
Relevant config (redacted): tools.media.audio.scope.rules with chatType: "dm" and channel: "whatsapp"

Steps

Provide an audio attachment with MIME " Audio/Ogg; codecs=opus " and non-audio filename (voice-note).
Run media understanding with audio enabled and WhatsApp-like scope rules.
Confirm audio transcription is applied.

Expected

Audio is classified correctly and transcription runs.

Actual

With this patch: transcription runs and transcript is applied.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- pnpm vitest run src/media-understanding/apply.test.ts src/media/mime.test.ts
- Added regression case: transcribes WhatsApp audio with parameterized MIME despite casing/whitespace
Edge cases checked:
- MIME with parameters and mixed casing/whitespace
- scope rules including chatType: "dm" + channel: "whatsapp"
What you did not verify:
- Full end-to-end WhatsApp runtime on a live account in this environment

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
- Revert commit fix(media): normalize MIME kind detection for audio transcription.
Files/config to restore:
- src/media/mime.ts
- src/media-understanding/apply.test.ts (test-only)
Known bad symptoms reviewers should watch for:
- Unexpected media kind changes for non-audio MIME inputs.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- MIME normalization could alter classification behavior for unusual mixed-case MIME strings.
- Mitigation:
  - Added regression coverage and kept behavior scoped to kindFromMime normalization only.

AI Assistance

AI-assisted: Yes (Codex)
Testing level: Lightly tested (targeted Vitest suites above)

greptile-apps · 2026-03-02T23:08:09Z

Greptile Summary

This PR normalizes MIME types before kindFromMime classification in src/media/mime.ts, fixing audio attachment kind detection for WhatsApp-style MIME strings with mixed casing, leading/trailing whitespace, and codec parameters (e.g. " Audio/Ogg; codecs=opus "). The companion regression test in apply.test.ts verifies the MIME normalization path, though there is a scope rule shadowing issue in the test setup (see inline comment).

Changes:

src/media/mime.ts: kindFromMime now wraps its input in normalizeMimeType before calling mediaKindFromMime — consistent with the normalization already applied in detectMimeImpl for headerMime.
src/media-understanding/apply.test.ts: Adds a regression test for WhatsApp audio with parameterized MIME (" Audio/Ogg; codecs=opus ").

Issue found:

The regression test sets ctx.ChatType = "direct" alongside scope rules that include { chatType: "dm" }. Since "dm" normalizes to "direct", Rule 1 matches before the intended channel: "whatsapp" Rule 2 is ever evaluated. The WhatsApp channel scope rule is not actually exercised by this test, weakening its value as a WhatsApp-specific regression case.

Confidence Score: 4/5

Safe to merge — the production fix is correct and minimal; the only issue is a test design flaw that doesn't affect runtime behaviour.
The one-line change to kindFromMime is correct, well-scoped, and consistent with existing normalization in detectMimeImpl. The only concern is the regression test which does not fully exercise the WhatsApp channel scope rule it implies, reducing test coverage confidence for that specific path. No production behaviour is broken.
src/media-understanding/apply.test.ts — the WhatsApp channel scope rule (Rule 2) is shadowed by the chatType rule (Rule 1) and is never evaluated in the new test case.

_{Last reviewed commit: 154703a}

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-02T23:08:13Z

src/media-understanding/apply.test.ts

+    ctx.ChatType = "direct";
+    ctx.Surface = "whatsapp";
+
+    const cfg: OpenClawConfig = {
+      tools: {
+        media: {
+          audio: {
+            enabled: true,
+            maxBytes: 1024 * 1024,
+            scope: {
+              default: "deny",
+              rules: [
+                { action: "allow", match: { chatType: "direct" } },
+                { action: "allow", match: { channel: "whatsapp" } },


WhatsApp channel scope rule is never evaluated in this test

The test is titled "transcribes WhatsApp audio with parameterized MIME despite casing/whitespace" and includes two scope rules to represent a WhatsApp-like configuration:

{ action: "allow", match: { chatType: "dm" } } — normalizes to "direct"

{ action: "allow", match: { channel: "whatsapp" } } — intended to cover WhatsApp surface

However, ctx.ChatType is set to "direct", and normalizeChatType("dm") also resolves to "direct" (see src/channels/chat-type.ts). The scope resolver iterates rules in order, so Rule 1 matches first and returns "allow" — Rule 2 (channel: "whatsapp") is never reached.

This means the test passes regardless of whether the WhatsApp channel rule (or even ctx.Surface) is present. If the intent is to specifically validate that the channel: "whatsapp" rule enables transcription, the chatType on the context should NOT match Rule 1. For example, omitting ctx.ChatType (leaving it undefined) would force evaluation to fall through to the channel rule:

Suggested change

ctx.ChatType = "direct";

ctx.Surface = "whatsapp";

const cfg: OpenClawConfig = {

tools: {

media: {

audio: {

enabled: true,

maxBytes: 1024 * 1024,

scope: {

default: "deny",

rules: [

{ action: "allow", match: { chatType: "direct" } },

{ action: "allow", match: { channel: "whatsapp" } },

const ctx = await createAudioCtx({

fileName: "voice-note",

mediaType: " Audio/Ogg; codecs=opus ",

});

ctx.Surface = "whatsapp";

// ChatType intentionally not set — scope relies on channel: "whatsapp" rule

const cfg: OpenClawConfig = {

tools: {

media: {

audio: {

enabled: true,

maxBytes: 1024 * 1024,

scope: {

default: "deny",

rules: [

{ action: "allow", match: { channel: "whatsapp" } },

],

},

Prompt To Fix With AI

This is a comment left during a code review. Path: src/media-understanding/apply.test.ts Line: 369-382 Comment: **WhatsApp channel scope rule is never evaluated in this test** The test is titled "transcribes WhatsApp audio with parameterized MIME despite casing/whitespace" and includes two scope rules to represent a WhatsApp-like configuration: 1. `{ action: "allow", match: { chatType: "dm" } }` — normalizes to `"direct"` 2. `{ action: "allow", match: { channel: "whatsapp" } }` — intended to cover WhatsApp surface However, `ctx.ChatType` is set to `"direct"`, and `normalizeChatType("dm")` also resolves to `"direct"` (see `src/channels/chat-type.ts`). The scope resolver iterates rules in order, so Rule 1 matches first and returns `"allow"` — Rule 2 (`channel: "whatsapp"`) is never reached. This means the test passes regardless of whether the WhatsApp channel rule (or even `ctx.Surface`) is present. If the intent is to specifically validate that the `channel: "whatsapp"` rule enables transcription, the `chatType` on the context should NOT match Rule 1. For example, omitting `ctx.ChatType` (leaving it `undefined`) would force evaluation to fall through to the channel rule: ```suggestion const ctx = await createAudioCtx({ fileName: "voice-note", mediaType: " Audio/Ogg; codecs=opus ", }); ctx.Surface = "whatsapp"; // ChatType intentionally not set — scope relies on channel: "whatsapp" rule const cfg: OpenClawConfig = { tools: { media: { audio: { enabled: true, maxBytes: 1024 * 1024, scope: { default: "deny", rules: [ { action: "allow", match: { channel: "whatsapp" } }, ], }, ``` How can I resolve this? If you propose a fix, please make it concise.

Good catch — updated. The test now relies only on the channel: "whatsapp" scope rule (no matching chatType rule), so it explicitly exercises the WhatsApp channel path.

…Lucenx9

…Lucenx9)

steipete · 2026-03-02T23:31:29Z

Landed via temp rebase onto main.

Gate: pnpm -s vitest run src/media/mime.test.ts src/media-understanding/apply.test.ts && pnpm -s tsgo
Land commit: fae7947
Merge commit: de77a36

Thanks @Lucenx9!

mizoz

LGTM ✅ - Fixes MIME kind detection for WhatsApp audio. The fix normalizes MIME type before classification, handling parameterized MIME strings like audio/ogg; codecs=opus with mixed casing/whitespace. The regression test covers the WhatsApp-specific scenario well.

@Lucenx9

…(thanks @Lucenx9)

@Lucenx9

…(thanks @Lucenx9)

@Lucenx9

…(thanks @Lucenx9)

openclaw-barnacle bot added the size: XS label Mar 2, 2026

Lucenx9 mentioned this pull request Mar 2, 2026

WhatsApp Voice Transcription Not Triggering (2026.3.1) #32200

Open

greptile-apps bot reviewed Mar 2, 2026

View reviewed changes

Lucenx9 added 3 commits March 2, 2026 23:30

fix(media): normalize MIME kind detection for audio transcription

339623d

test(media): use direct chatType in WhatsApp MIME regression case

1d037a0

test(media): ensure WhatsApp scope rule is exercised in MIME regression

86c19d2

openclaw-barnacle bot added size: M and removed size: XS labels Mar 2, 2026

test: harden MIME normalization regression coverage (#32280) (thanks @…

fae7947

…Lucenx9)

steipete force-pushed the fix/whatsapp-voice-transcription-32200 branch from bb3e053 to fae7947 Compare March 2, 2026 23:31

steipete merged commit de77a36 into openclaw:main Mar 2, 2026

openclaw-barnacle bot added size: XS and removed size: M labels Mar 2, 2026

mizoz approved these changes Mar 3, 2026

View reviewed changes

dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026

test: harden MIME normalization regression coverage (openclaw#32280) …

8d0b2c0

…(thanks @Lucenx9)

OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026

test: harden MIME normalization regression coverage (openclaw#32280) …

785038c

…(thanks @Lucenx9)

zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026

test: harden MIME normalization regression coverage (openclaw#32280) …

39fd770

…(thanks @Lucenx9)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(media): normalize MIME kind detection for WhatsApp audio transcription#32280

fix(media): normalize MIME kind detection for WhatsApp audio transcription#32280
steipete merged 4 commits intoopenclaw:mainfrom
Lucenx9:fix/whatsapp-voice-transcription-32200

Lucenx9 commented Mar 2, 2026

Uh oh!

greptile-apps bot commented Mar 2, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Mar 2, 2026

Uh oh!

Lucenx9 Mar 2, 2026

Uh oh!

steipete commented Mar 2, 2026 •

edited

Loading

Uh oh!

mizoz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Lucenx9 commented Mar 2, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

AI Assistance

Uh oh!

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Lucenx9 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mizoz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steipete commented Mar 2, 2026 •

edited

Loading