fix: treat only .zip as archive; avoid unzipping ZIP-based container …#410
Merged
qin-ctx merged 1 commit intovolcengine:mainfrom Mar 4, 2026
Merged
Conversation
|
|
qin-ctx
reviewed
Mar 4, 2026
openviking/utils/media_processor.py
Outdated
| resource_name=file_path.stem, | ||
| ) | ||
| # Check if it's a zip file | ||
| if zipfile.is_zipfile(file_path): |
Collaborator
There was a problem hiding this comment.
旧代码没有清理,或者只改一行就可以?
if ext == ".zip" and zipfile.is_zipfile(file_path):
53968ca to
5b7f542
Compare
5b7f542 to
23c96aa
Compare
qin-ctx
approved these changes
Mar 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: treat only .zip as archive; avoid unzipping ZIP-based container formats (OOXML)
Description
English:
This PR fixes an issue where ZIP-based container formats (such as
.docx,.xlsx,.pptx) were incorrectly identified as generic ZIP archives and unzipped, bypassing their dedicated parsers. The fix introduces a specific check for the.zipfile extension before attempting to unzip, ensuring that only true ZIP archives are treated as compressed directories. This allows Office documents (which are technically ZIP files) to fall through to their specialized parsers (e.g., viaregistry.get_parser_for_file).中文:
本 PR 修复了一个问题:基于 ZIP 容器格式的文件(如
.docx,.xlsx,.pptx)被错误地识别为普通 ZIP 压缩包并被解压,从而绕过了它们专用的解析器。该修复在尝试解压之前引入了针对.zip文件扩展名的特定检查,确保只有真正的 ZIP 归档文件才会被作为压缩目录处理。这使得 Office 文档(本质上是 ZIP 文件)能够正确地传递给它们专门的解析器(例如通过registry.get_parser_for_file)。Related Issue
Fixes #407
Type of Change
Changes Made
English:
.zipextension combined withzipfile.is_zipfileinopenviking/utils/media_processor.py..ziphandling to prevent generic ZIP detection from intercepting OOXML files.中文:
openviking/utils/media_processor.py中添加了结合.zip扩展名和zipfile.is_zipfile的检查。.zip文件,防止通用的 ZIP 检测逻辑拦截 OOXML 文件。Testing
Checklist
Screenshots (if applicable)
Additional Notes