Skip to content

Improve audio MIME normalization and validation in multimodal file reads #21635

@junaiddshaukat

Description

@junaiddshaukat

Summary

Audio files currently rely on raw mime.getType() results during file reading, which can lead to inconsistent or overly generic MIME handling for multimodal inputs.

This is especially relevant for supported audio formats used with Gemini multimodal flows.

Problem

Today, audio handling in packages/core/src/utils/fileUtils.ts has a few gaps:

  • Common audio MIME aliases like audio/x-wav or audio/mp3 are not normalized before being returned as inline data.
  • Supported audio files may be missed when MIME lookup is absent or inconsistent.
  • Unsupported audio MIME types can flow through the read path and fail later with less clear downstream errors.

Proposed Change

  • Normalize common audio MIME aliases to a canonical supported MIME type.
  • Add extension-based fallback for supported audio formats such as:
    • .mp3
    • .wav
    • .aiff / .aif
    • .aac
    • .ogg
    • .flac
  • Return a clear early error when an audio file is detected but its format is not supported for inline multimodal reading.
  • Add focused tests for:
    • extension fallback
    • MIME normalization
    • unsupported audio rejection

Why This Matters

This improves reliability for multimodal/audio workflows and lays better groundwork for future voice-related work in Gemini CLI.

Affected Area

  • packages/core/src/utils/fileUtils.ts
  • packages/core/src/utils/fileUtils.test.ts

Metadata

Metadata

Labels

area/coreIssues related to User Interface, OS Support, Core Functionalityhelp wantedWe will accept PRs from all issues marked as "help wanted". Thanks for your support!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions