Skip to content

fix: verify audio/video MIME types with content check#16907

Merged
Adib234 merged 2 commits intogoogle-gemini:mainfrom
maru0804:fix/16888-adp-mime-type-detection
Jan 28, 2026
Merged

fix: verify audio/video MIME types with content check#16907
Adib234 merged 2 commits intogoogle-gemini:mainfrom
maru0804:fix/16888-adp-mime-type-detection

Conversation

@maru0804
Copy link
Copy Markdown
Contributor

Summary

Files with extensions like .adp (Eclipse 4diac Adapter Type definitions) are incorrectly identified as audio/adpcm by the mime-types library, causing the CLI to treat them as binary audio files and send them to the API with an incorrect MIME type, resulting in 400 INVALID_ARGUMENT errors.

Changes

  • Add content-based verification for audio/video MIME types in detectFileType()
  • If the file content is actually text (not binary), treat it as text instead of audio/video
  • Update tests to use actual binary content for audio/video file tests

Technical Details

The isBinaryFile() check is heuristic-based (BOM detection + null-byte/non-printable ratio, sampling first 4KB). BOM-encoded UTF-16/32 files are handled correctly. This approach is similar to how TypeScript files (.ts, .mts, .cts) are handled to avoid MPEG transport stream misidentification.

Fixes #16888

…16888)

.adp files are misidentified as audio/adpcm by mime-types library.
Add content-based check for audio/video: if not binary, treat as text.

Fixes google-gemini#16888
@maru0804 maru0804 requested a review from a team as a code owner January 17, 2026 03:53
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @maru0804, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the accuracy of file type detection by implementing a content-based verification step for files that are initially identified as audio or video. This change prevents incorrect MIME type assignments for text-based files that might share extensions with multimedia formats, thereby resolving API errors caused by misclassified uploads.

Highlights

  • MIME Type Misidentification Fix: Addresses an issue where the mime-types library incorrectly identifies certain text files (e.g., .adp files) as audio/video, leading to 400 INVALID_ARGUMENT errors when sent to the API.
  • Content-Based Verification: Introduces a content-based check using isBinaryFile() for files initially classified as audio/ or video/. If the file content is determined to be text, it is reclassified as 'text' to prevent misidentification.
  • Updated Test Cases: Modified existing tests and added new ones to validate the improved file type detection logic. This includes using actual binary content for audio/video file tests and a specific test for .adp files containing XML content.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@maru0804
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request resolves an issue where text-based files were misidentified as audio/video. While the implementation is clean and tested, a security audit identified a pre-existing high-severity Path Traversal vulnerability in the detectFileType function, now exercised by the new code path. This vulnerability, which allows reading arbitrary files due to a lack of path validation, aligns with the rule requiring sanitization of user-provided file paths. A recommendation to remediate this has been provided.

Comment on lines +310 to 318
if (
lookedUpMimeType.startsWith('audio/') ||
lookedUpMimeType.startsWith('video/')
) {
if (!(await isBinaryFile(filePath))) {
return 'text';
}
return lookedUpMimeType.startsWith('audio/') ? 'audio' : 'video';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The detectFileType function is vulnerable to a path traversal attack. The filePath parameter, which can be controlled by a user, is passed to the isBinaryFile function without proper sanitization. The isBinaryFile function then uses this path to open and read a file from the filesystem. An attacker could provide a malicious path (e.g., ../../../../etc/passwd) to read arbitrary files on the system, leading to sensitive information disclosure. The new logic to verify audio/video files introduces a new code path that is vulnerable to this attack.

Remediation:
Before using the filePath in any file system operations, validate that it is within the intended project directory. This check should be performed in the callers of detectFileType, such as processSingleFileContent, to ensure all file operations are protected.

Example fix in processSingleFileContent:

if (!isWithinRoot(filePath, rootDirectory)) {
  return {
    llmContent: 'File path is outside the workspace.',
    returnDisplay: 'Error: File path is outside the workspace.',
    error: `Path is not in workspace: ${filePath}`,
    errorType: ToolErrorType.PATH_NOT_IN_WORKSPACE,
  };
}
References
  1. Sanitize user-provided file paths used in file system operations to prevent path traversal vulnerabilities.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path traversal concern is valid, but this is handled at the caller level. In read-file.ts, there's already a validation check:

if (!workspaceContext.isPathWithinWorkspace(resolvedPath) && !isWithinTempDir) {
  return `File path must be within one of the workspace directories...`;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that processSingleFileContent in read-file.ts includes a workspaceContext.isPathWithinWorkspace check, which mitigates the path traversal vulnerability for calls originating from that function. However, the detectFileType function itself, and specifically the isBinaryFile function it calls, still performs file system operations on the filePath parameter without internal validation. This means that if detectFileType or isBinaryFile were to be called from other parts of the codebase that do not implement the same isPathWithinWorkspace check, a path traversal vulnerability would still exist. For example, packages/core/src/services/fileSystemService.ts exports an isBinaryFile function that directly calls fileUtils.isBinaryFile without performing path validation. To ensure robust security, it is a best practice for utility functions that perform file system operations to either validate their inputs internally or to explicitly document the requirement for callers to pre-validate paths. Given the current architecture, consider adding the isPathWithinWorkspace check directly within fileUtils.isBinaryFile or ensuring that all direct and indirect callers of isBinaryFile (including detectFileType) perform this critical validation. This remains a high severity concern due to the potential for sensitive information disclosure if an unvalidated path reaches these file system operations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be addressed?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the misidentification of certain text-based files as audio/video by introducing a content-based verification step. However, a high-severity Path Traversal vulnerability was identified: file handling functions use filePath in file system operations without validating that the path is within the expected workspace, potentially allowing an attacker to read arbitrary files from the server's filesystem. Additionally, there's a high-severity concern regarding error handling within the new logic; if the isBinaryFile check fails, binary files could be silently misclassified as text, leading to garbled content being sent to the LLM.

Comment thread packages/core/src/utils/fileUtils.ts
@gemini-cli gemini-cli Bot added area/core Issues related to User Interface, OS Support, Core Functionality priority/p2 Important but can be addressed in a future release. labels Jan 19, 2026
@jacob314 jacob314 added the help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! label Jan 22, 2026
@Adib234 Adib234 self-requested a review January 24, 2026 14:42
@diplfranzhoepfinger
Copy link
Copy Markdown

@maru0804 @Adib234 how to test it ?

@Adib234 Adib234 enabled auto-merge January 28, 2026 14:58
@Adib234 Adib234 added this pull request to the merge queue Jan 28, 2026
Merged via the queue into google-gemini:main with commit 06c7dc5 Jan 28, 2026
25 checks passed
sidwan02 pushed a commit to sidwan02/gemini-cli-gemma that referenced this pull request Feb 6, 2026
kuishou68 pushed a commit to iOfficeAI/aioncli that referenced this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! priority/p2 Important but can be addressed in a future release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CLI incorrectly treats .adp files as binary audio (MIME-type mismatch)

4 participants