-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Closed
Description
Summary
Voice messages (OGG audio) leak 300KB+ of raw binary into context as <file mime="text/plain"> blocks after successful transcription.
Root Cause
In src/media-understanding/apply.ts:367, audio files were only skipped from file extraction if they failed the text heuristic:
if (!forcedTextMimeResolved && kind === "audio" && !textLike) { continue; }Why this fails for OGG:
looksLikeUtf8Text()samples first 4KB, returns true if >85% printable chars- OGG files start with
OggSmagic bytes (valid ASCII: 0x4F 0x67 0x67 0x53) - Compressed audio data often has >85% printable bytes in the sample
- When
textLikeis true, the skip is bypassed → binary becomes a file block
Fix
Remove && !textLike — audio files should always be skipped from file extraction:
if (!forcedTextMimeResolved && kind === "audio") { continue; }Audio should be transcribed or skipped entirely, never included as raw binary.
Reproduction
- Send voice message via Telegram/WhatsApp
- Transcription succeeds (transcript appears)
- Raw OGG binary also appears as
<file mime="text/plain">block
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels