fix(media): normalize MIME kind detection for WhatsApp audio transcription#32280
Conversation
Greptile SummaryThis PR normalizes MIME types before Changes:
Issue found:
Confidence Score: 4/5
Last reviewed commit: 154703a |
| ctx.ChatType = "direct"; | ||
| ctx.Surface = "whatsapp"; | ||
|
|
||
| const cfg: OpenClawConfig = { | ||
| tools: { | ||
| media: { | ||
| audio: { | ||
| enabled: true, | ||
| maxBytes: 1024 * 1024, | ||
| scope: { | ||
| default: "deny", | ||
| rules: [ | ||
| { action: "allow", match: { chatType: "direct" } }, | ||
| { action: "allow", match: { channel: "whatsapp" } }, |
There was a problem hiding this comment.
WhatsApp channel scope rule is never evaluated in this test
The test is titled "transcribes WhatsApp audio with parameterized MIME despite casing/whitespace" and includes two scope rules to represent a WhatsApp-like configuration:
{ action: "allow", match: { chatType: "dm" } }— normalizes to"direct"{ action: "allow", match: { channel: "whatsapp" } }— intended to cover WhatsApp surface
However, ctx.ChatType is set to "direct", and normalizeChatType("dm") also resolves to "direct" (see src/channels/chat-type.ts). The scope resolver iterates rules in order, so Rule 1 matches first and returns "allow" — Rule 2 (channel: "whatsapp") is never reached.
This means the test passes regardless of whether the WhatsApp channel rule (or even ctx.Surface) is present. If the intent is to specifically validate that the channel: "whatsapp" rule enables transcription, the chatType on the context should NOT match Rule 1. For example, omitting ctx.ChatType (leaving it undefined) would force evaluation to fall through to the channel rule:
| ctx.ChatType = "direct"; | |
| ctx.Surface = "whatsapp"; | |
| const cfg: OpenClawConfig = { | |
| tools: { | |
| media: { | |
| audio: { | |
| enabled: true, | |
| maxBytes: 1024 * 1024, | |
| scope: { | |
| default: "deny", | |
| rules: [ | |
| { action: "allow", match: { chatType: "direct" } }, | |
| { action: "allow", match: { channel: "whatsapp" } }, | |
| const ctx = await createAudioCtx({ | |
| fileName: "voice-note", | |
| mediaType: " Audio/Ogg; codecs=opus ", | |
| }); | |
| ctx.Surface = "whatsapp"; | |
| // ChatType intentionally not set — scope relies on channel: "whatsapp" rule | |
| const cfg: OpenClawConfig = { | |
| tools: { | |
| media: { | |
| audio: { | |
| enabled: true, | |
| maxBytes: 1024 * 1024, | |
| scope: { | |
| default: "deny", | |
| rules: [ | |
| { action: "allow", match: { channel: "whatsapp" } }, | |
| ], | |
| }, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/apply.test.ts
Line: 369-382
Comment:
**WhatsApp channel scope rule is never evaluated in this test**
The test is titled "transcribes WhatsApp audio with parameterized MIME despite casing/whitespace" and includes two scope rules to represent a WhatsApp-like configuration:
1. `{ action: "allow", match: { chatType: "dm" } }` — normalizes to `"direct"`
2. `{ action: "allow", match: { channel: "whatsapp" } }` — intended to cover WhatsApp surface
However, `ctx.ChatType` is set to `"direct"`, and `normalizeChatType("dm")` also resolves to `"direct"` (see `src/channels/chat-type.ts`). The scope resolver iterates rules in order, so Rule 1 matches first and returns `"allow"` — Rule 2 (`channel: "whatsapp"`) is never reached.
This means the test passes regardless of whether the WhatsApp channel rule (or even `ctx.Surface`) is present. If the intent is to specifically validate that the `channel: "whatsapp"` rule enables transcription, the `chatType` on the context should NOT match Rule 1. For example, omitting `ctx.ChatType` (leaving it `undefined`) would force evaluation to fall through to the channel rule:
```suggestion
const ctx = await createAudioCtx({
fileName: "voice-note",
mediaType: " Audio/Ogg; codecs=opus ",
});
ctx.Surface = "whatsapp";
// ChatType intentionally not set — scope relies on channel: "whatsapp" rule
const cfg: OpenClawConfig = {
tools: {
media: {
audio: {
enabled: true,
maxBytes: 1024 * 1024,
scope: {
default: "deny",
rules: [
{ action: "allow", match: { channel: "whatsapp" } },
],
},
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — updated. The test now relies only on the channel: "whatsapp" scope rule (no matching chatType rule), so it explicitly exercises the WhatsApp channel path.
bb3e053 to
fae7947
Compare
mizoz
left a comment
There was a problem hiding this comment.
LGTM ✅ - Fixes MIME kind detection for WhatsApp audio. The fix normalizes MIME type before classification, handling parameterized MIME strings like audio/ogg; codecs=opus with mixed casing/whitespace. The regression test covers the WhatsApp-specific scenario well.
Summary
Describe the problem and fix in 2–5 bullets:
kindFromMimeclassification, and added a regression test for WhatsApp-styleaudio/ogg; codecs=opuswith scope rules (chatType: "dm",channel: "whatsapp").Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
Security Impact (required)
No)No)No)No)No)Yes, explain risk + mitigation:Repro + Verification
Environment
tools.media.audio.scope.ruleswithchatType: "dm"andchannel: "whatsapp"Steps
" Audio/Ogg; codecs=opus "and non-audio filename (voice-note).Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
pnpm vitest run src/media-understanding/apply.test.ts src/media/mime.test.tstranscribes WhatsApp audio with parameterized MIME despite casing/whitespacechatType: "dm"+channel: "whatsapp"Compatibility / Migration
Yes)No)No)Failure Recovery (if this breaks)
fix(media): normalize MIME kind detection for audio transcription.src/media/mime.tssrc/media-understanding/apply.test.ts(test-only)Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.kindFromMimenormalization only.AI Assistance