fix: verify audio/video MIME types with content check#16907
fix: verify audio/video MIME types with content check#16907Adib234 merged 2 commits intogoogle-gemini:mainfrom
Conversation
…16888) .adp files are misidentified as audio/adpcm by mime-types library. Add content-based check for audio/video: if not binary, treat as text. Fixes google-gemini#16888
Summary of ChangesHello @maru0804, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the accuracy of file type detection by implementing a content-based verification step for files that are initially identified as audio or video. This change prevents incorrect MIME type assignments for text-based files that might share extensions with multimedia formats, thereby resolving API errors caused by misclassified uploads. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request resolves an issue where text-based files were misidentified as audio/video. While the implementation is clean and tested, a security audit identified a pre-existing high-severity Path Traversal vulnerability in the detectFileType function, now exercised by the new code path. This vulnerability, which allows reading arbitrary files due to a lack of path validation, aligns with the rule requiring sanitization of user-provided file paths. A recommendation to remediate this has been provided.
| if ( | ||
| lookedUpMimeType.startsWith('audio/') || | ||
| lookedUpMimeType.startsWith('video/') | ||
| ) { | ||
| if (!(await isBinaryFile(filePath))) { | ||
| return 'text'; | ||
| } | ||
| return lookedUpMimeType.startsWith('audio/') ? 'audio' : 'video'; | ||
| } |
There was a problem hiding this comment.
The detectFileType function is vulnerable to a path traversal attack. The filePath parameter, which can be controlled by a user, is passed to the isBinaryFile function without proper sanitization. The isBinaryFile function then uses this path to open and read a file from the filesystem. An attacker could provide a malicious path (e.g., ../../../../etc/passwd) to read arbitrary files on the system, leading to sensitive information disclosure. The new logic to verify audio/video files introduces a new code path that is vulnerable to this attack.
Remediation:
Before using the filePath in any file system operations, validate that it is within the intended project directory. This check should be performed in the callers of detectFileType, such as processSingleFileContent, to ensure all file operations are protected.
Example fix in processSingleFileContent:
if (!isWithinRoot(filePath, rootDirectory)) {
return {
llmContent: 'File path is outside the workspace.',
returnDisplay: 'Error: File path is outside the workspace.',
error: `Path is not in workspace: ${filePath}`,
errorType: ToolErrorType.PATH_NOT_IN_WORKSPACE,
};
}References
- Sanitize user-provided file paths used in file system operations to prevent path traversal vulnerabilities.
There was a problem hiding this comment.
The path traversal concern is valid, but this is handled at the caller level. In read-file.ts, there's already a validation check:
if (!workspaceContext.isPathWithinWorkspace(resolvedPath) && !isWithinTempDir) {
return `File path must be within one of the workspace directories...`;
}
There was a problem hiding this comment.
You are correct that processSingleFileContent in read-file.ts includes a workspaceContext.isPathWithinWorkspace check, which mitigates the path traversal vulnerability for calls originating from that function. However, the detectFileType function itself, and specifically the isBinaryFile function it calls, still performs file system operations on the filePath parameter without internal validation. This means that if detectFileType or isBinaryFile were to be called from other parts of the codebase that do not implement the same isPathWithinWorkspace check, a path traversal vulnerability would still exist. For example, packages/core/src/services/fileSystemService.ts exports an isBinaryFile function that directly calls fileUtils.isBinaryFile without performing path validation. To ensure robust security, it is a best practice for utility functions that perform file system operations to either validate their inputs internally or to explicitly document the requirement for callers to pre-validate paths. Given the current architecture, consider adding the isPathWithinWorkspace check directly within fileUtils.isBinaryFile or ensuring that all direct and indirect callers of isBinaryFile (including detectFileType) perform this critical validation. This remains a high severity concern due to the potential for sensitive information disclosure if an unvalidated path reaches these file system operations.
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the misidentification of certain text-based files as audio/video by introducing a content-based verification step. However, a high-severity Path Traversal vulnerability was identified: file handling functions use filePath in file system operations without validating that the path is within the expected workspace, potentially allowing an attacker to read arbitrary files from the server's filesystem. Additionally, there's a high-severity concern regarding error handling within the new logic; if the isBinaryFile check fails, binary files could be silently misclassified as text, leading to garbled content being sent to the LLM.
Summary
Files with extensions like
.adp(Eclipse 4diac Adapter Type definitions) are incorrectly identified asaudio/adpcmby themime-typeslibrary, causing the CLI to treat them as binary audio files and send them to the API with an incorrect MIME type, resulting in400 INVALID_ARGUMENTerrors.Changes
detectFileType()textinstead ofaudio/videoTechnical Details
The
isBinaryFile()check is heuristic-based (BOM detection + null-byte/non-printable ratio, sampling first 4KB). BOM-encoded UTF-16/32 files are handled correctly. This approach is similar to how TypeScript files (.ts,.mts,.cts) are handled to avoid MPEG transport stream misidentification.Fixes #16888