Skip to content

[Bug]: Voice message binary leaks into context after transcription #7333

@Diaspar4u

Description

@Diaspar4u

Summary

Voice messages (OGG audio) leak 300KB+ of raw binary into context as <file mime="text/plain"> blocks after successful transcription.

Root Cause

In src/media-understanding/apply.ts:367, audio files were only skipped from file extraction if they failed the text heuristic:

if (!forcedTextMimeResolved && kind === "audio" && !textLike) { continue; }

Why this fails for OGG:

  • looksLikeUtf8Text() samples first 4KB, returns true if >85% printable chars
  • OGG files start with OggS magic bytes (valid ASCII: 0x4F 0x67 0x67 0x53)
  • Compressed audio data often has >85% printable bytes in the sample
  • When textLike is true, the skip is bypassed → binary becomes a file block

Fix

Remove && !textLike — audio files should always be skipped from file extraction:

if (!forcedTextMimeResolved && kind === "audio") { continue; }

Audio should be transcribed or skipped entirely, never included as raw binary.

Reproduction

  1. Send voice message via Telegram/WhatsApp
  2. Transcription succeeds (transcript appears)
  3. Raw OGG binary also appears as <file mime="text/plain"> block

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions