Skip to content

[Bug]: LINE voice messages (M4A) not transcribed — detectContentType() classifies M4A as video/mp4 #29751

@Digihankie

Description

@Digihankie

Summary

detectContentType() in src/line/download.ts checks the MPEG-4 ftyp magic bytes before the M4A-specific check, causing LINE voice messages (M4A / AAC-LC) to be classified as video/mp4 instead of audio/mp4. This makes isAudioAttachment() return false, so the entire audio transcription pipeline is skipped and the agent receives no transcript.

Root Cause

The MP4 check at bytes 4–7 (ftyp) fires for all MPEG-4 containers, including M4A. The M4A-specific check below it is dead code because it also matches bytes 4–7:

// Matches ANY ftyp box, including M4A → returns "video/mp4"
if (buffer[4] === 0x66 && buffer[5] === 0x74 && buffer[6] === 0x79 && buffer[7] === 0x70) {
  return "video/mp4";
}
// Dead code: never reached for M4A files
if (buffer[0] === 0x00 && buffer[1] === 0x00 && buffer[2] === 0x00) {
  if (buffer[4] === 0x66 && buffer[5] === 0x74 && buffer[6] === 0x79 && buffer[7] === 0x70) {
    return "audio/mp4";
  }
}

M4A magic bytes: 00 00 00 1c 66 74 79 70 4d 34 41 20 — positions 4–7 are ftyp, so the MP4 rule fires first.

Downstream Impact (verified from source)

  1. detectContentType()"video/mp4" (wrong)
  2. getExtensionForContentType("video/mp4").mp4 (wrong extension)
  3. mediaKindFromMime("video/mp4")"video" (src/media/constants.ts: mime.startsWith("video/"))
  4. resolveAttachmentKind() → returns "video" — never reaches isAudioFileName() fallback (src/media-understanding/attachments.ts)
  5. isAudioAttachment()false
  6. selectAttachments({capability: "audio"}) → empty array
  7. transcribeFirstAudio() in audio-preflight.tsfirstAudio = undefined → returns undefined
  8. No ctx.Transcript set, agent receives raw placeholder without transcription

Note: The file-type library (used in src/media/mime.ts sniffMime()) correctly identifies M4A as audio/x-m4a, but it is never called because the LINE download path uses its own detectContentType() implementation.

Suggested Fix

Option A — Check the ftyp sub-brand to distinguish M4A from MP4 video:

if (buffer[4] === 0x66 && buffer[5] === 0x74 && buffer[6] === 0x79 && buffer[7] === 0x70) {
  const subBrand = String.fromCharCode(buffer[8], buffer[9], buffer[10], buffer[11]);
  if (subBrand === 'M4A ' || subBrand === 'M4B ') {
    return "audio/mp4";
  }
  return "video/mp4";
}

Option B — Use the existing file-type library (already a dependency) via detectMime() from src/media/mime.ts instead of reimplementing magic-byte detection.

Steps to reproduce

  1. Configure a LINE channel with tools.media.audio in auto-detection mode (OPENAI_API_KEY present, no explicit enabled: false).
  2. Send a voice message from the LINE mobile app to the bot.
  3. Observe the downloaded file and its detected content type in verbose logs.
  4. Check whether transcribeFirstAudio() produces a transcript.

Expected behavior

LINE voice messages (M4A) should be detected as audio/mp4, saved with .m4a extension, and automatically transcribed via the audio understanding pipeline (Whisper / OpenAI). The agent should receive the transcript in ctx.Transcript.

Actual behavior

Voice message is saved as .mp4 with MIME video/mp4. The audio transcription pipeline is completely skipped:

  • isAudioAttachment() returns false
  • selectAttachments({capability: "audio"}) returns empty array
  • transcribeFirstAudio() returns undefined
  • Agent receives <media:audio> placeholder + [media attached: /tmp/openclaw/line-media-xxx.mp4 (video/mp4)] but no transcript

The file-type library in src/media/mime.ts correctly identifies M4A as audio/x-m4a, but it is never invoked because the LINE download path uses its own detectContentType() implementation.

OpenClaw version

2026.2.26

Operating system

Linux (NVIDIA Jetson AGX Orin, aarch64, Ubuntu-based JetPack)

Install method

docker (openclaw:local-docker image)

Logs, screenshots, and evidence

# Actual LINE voice message file identified by `file` command
$ file /tmp/openclaw/line-media-1772274968613-*.mp4
Apple iTunes ALAC/AAC-LC (.M4A) Audio

# Magic bytes confirm ftyp M4A sub-brand at offset 8
$ xxd /tmp/openclaw/line-media-*.mp4 | head -1
00000000: 0000 001c 6674 7970 4d34 4120 0000 0000  ....ftypM4A ....


The `ftyp` box (bytes 4–7) is identical for both MP4 video and M4A audio. The distinguishing factor is the sub-brand at bytes 8–11: `M4A ` for audio vs `isom`/`mp42` for video.

Impact and severity

  • Affected: All LINE channel users who send voice messages
  • Severity: High — completely blocks voice message understanding on LINE
  • Frequency: 100% reproducible (every LINE voice message)
  • Consequence: Agent cannot process voice messages at all; users get no response to voice input. This is platform-specific to LINE (Telegram/WhatsApp use different audio formats).

Additional information

Related issues:

This is likely a latent bug introduced when detectContentType() was first written. LINE is the only channel that uses M4A (AAC-LC in MPEG-4 container) for voice messages, which is why it wasn't caught by earlier OGG/Opus fixes for Telegram.

The existing file-type npm package (already a dependency) handles this correctly via fileTypeFromBuffer(). The simplest fix would be to delegate to it from downloadLineMedia() or to check the ftyp sub-brand at bytes 8–11.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions