-
Notifications
You must be signed in to change notification settings - Fork 125
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't workingsecuritySecurity-related changes or concernsSecurity-related changes or concernsskillsCopilot skill packages (SKILL.md)Copilot skill packages (SKILL.md)
Description
Summary
extract_image() in extract_content.py writes img.blob bytes from PPTX image parts directly to disk without content validation, size limits, or path sanitization. A crafted PPTX file could deliver oversized payloads or exploit downstream consumers of the extracted images.
Location
- File:
.github/skills/experimental/powerpoint/scripts/extract_content.py - Function:
extract_image()—img.blobwrite to disk
Risk Assessment
- Severity: HIGH
- Attack Vector: A crafted PPTX file containing malicious image blobs (oversized, polyglot, or path-traversal filenames) could abuse the extraction process
- Impact: Disk exhaustion via oversized blobs, path traversal if filenames are derived from PPTX metadata, or delivery of malicious content disguised as images
- CVSS Category: CWE-434 (Unrestricted Upload of File with Dangerous Type) / CWE-22 (Path Traversal)
Current Behavior
# Writes raw blob bytes from PPTX image part directly to output path
with open(output_path, 'wb') as f:
f.write(img.blob)The image blob is written without:
- Validating that the content is actually an image (magic bytes / content-type verification)
- Enforcing maximum file size limits
- Sanitizing the output path against directory traversal
- Checking for polyglot files (files valid as both image and executable formats)
Expected Behavior
Image extraction should include defensive checks:
- Size limit: Reject blobs exceeding a reasonable maximum (e.g., 50 MB)
- Content validation: Verify magic bytes match expected image formats (PNG, JPEG, EMF, WMF, SVG)
- Path sanitization: Ensure output paths are confined to the expected output directory (no
../traversal) - Filename sanitization: Strip or reject filenames containing path separators or null bytes
RPI Framework
task-researcher
- Identify all locations where PPTX blob data is written to disk
- Determine what PPTX metadata controls the output filename/path
- Catalog the image formats that python-pptx can extract (PNG, JPEG, EMF, WMF, TIFF, SVG, etc.)
- Assess maximum reasonable image sizes for presentation content
task-planner
- Define size limits and content validation strategy
- Plan path sanitization approach (os.path.realpath containment check)
- Determine if content-type verification is sufficient or if magic byte checking is needed
task-implementor
- Add blob size validation before write
- Add output path containment check (resolved path must be under expected directory)
- Add content-type / magic byte verification for known image formats
- Add tests with oversized blobs, path traversal filenames, and non-image content
- Sanitize any filename derived from PPTX metadata
Acceptance Criteria
- Blob size is validated against a configurable maximum before writing to disk
- Output paths are verified to be within the expected output directory (no path traversal)
- Content validation confirms blob data matches expected image formats
- Filenames derived from PPTX metadata are sanitized (no path separators, null bytes, or special characters)
- Tests verify rejection of oversized blobs, path traversal attempts, and non-image content
- No raw blob data is written to arbitrary locations without validation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingsecuritySecurity-related changes or concernsSecurity-related changes or concernsskillsCopilot skill packages (SKILL.md)Copilot skill packages (SKILL.md)