Summary
Audio files currently rely on raw mime.getType() results during file reading, which can lead to inconsistent or overly generic MIME handling for multimodal inputs.
This is especially relevant for supported audio formats used with Gemini multimodal flows.
Problem
Today, audio handling in packages/core/src/utils/fileUtils.ts has a few gaps:
- Common audio MIME aliases like
audio/x-wav or audio/mp3 are not normalized before being returned as inline data.
- Supported audio files may be missed when MIME lookup is absent or inconsistent.
- Unsupported audio MIME types can flow through the read path and fail later with less clear downstream errors.
Proposed Change
- Normalize common audio MIME aliases to a canonical supported MIME type.
- Add extension-based fallback for supported audio formats such as:
.mp3
.wav
.aiff / .aif
.aac
.ogg
.flac
- Return a clear early error when an audio file is detected but its format is not supported for inline multimodal reading.
- Add focused tests for:
- extension fallback
- MIME normalization
- unsupported audio rejection
Why This Matters
This improves reliability for multimodal/audio workflows and lays better groundwork for future voice-related work in Gemini CLI.
Affected Area
packages/core/src/utils/fileUtils.ts
packages/core/src/utils/fileUtils.test.ts
Summary
Audio files currently rely on raw
mime.getType()results during file reading, which can lead to inconsistent or overly generic MIME handling for multimodal inputs.This is especially relevant for supported audio formats used with Gemini multimodal flows.
Problem
Today, audio handling in
packages/core/src/utils/fileUtils.tshas a few gaps:audio/x-wavoraudio/mp3are not normalized before being returned as inline data.Proposed Change
.mp3.wav.aiff/.aif.aac.ogg.flacWhy This Matters
This improves reliability for multimodal/audio workflows and lays better groundwork for future voice-related work in Gemini CLI.
Affected Area
packages/core/src/utils/fileUtils.tspackages/core/src/utils/fileUtils.test.ts