-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Overview
Establish a comprehensive security testing and fuzzing strategy for the PowerPoint skill Python codebase (~4,700 lines across 14 modules under .github/skills/experimental/powerpoint/scripts/).
This initiative implements Scenario D (Hypothesis + pip-audit) from the ClusterFuzzLite research evaluation, leveraging the existing CodeQL Python analysis for SAST coverage. The approach was selected for its highest value-to-effort ratio given the codebase's structured-input nature and size.
Background
A thorough security-focused codebase analysis identified 5 security findings (1 CRITICAL, 3 HIGH, 1 MODERATE), a complete absence of property-based or fuzz testing (all ~300+ tests are deterministic), and no dependency CVE scanning. ClusterFuzzLite was evaluated but rejected as the primary approach due to structured input mismatch, Python >=3.11 incompatibility with the default base image, and disproportionate setup complexity.
Three-Phase Implementation
Phase 1: Hypothesis Property Tests (High Priority)
Add hypothesis>=6.100 to dev dependencies and write property tests targeting priority modules:
validate_slides.py/validate_deck.py— input validation robustnessbuild_deck.py— element builder dispatch with arbitrary element definitionspptx_colors.py— hex color parsing edge casespptx_tables.py— merge bounds and out-of-range handling
Phase 2: pip-audit Dependency CVE Scanning (High Priority)
Add pip-audit CI step to scan pyproject.toml dependencies (python-pptx, pyyaml, pymupdf, lxml) for known CVEs using open vulnerability databases (PyPI Advisory Database, OSV).
Phase 3: OSSF Scorecard Fuzzing Compliance (Medium Priority)
Add a thin Atheris wrapper using the polyglot pattern so that import atheris is detectable by OSSF Scorecard's Fuzzing check. Hypothesis alone scores 0/10 since Scorecard only recognizes import atheris for Python.
Security Findings to Address
| Severity | Finding | Location |
|---|---|---|
| CRITICAL | Arbitrary code execution via importlib |
build_deck.py |
| HIGH | XML parsing (XXE vector) via lxml.etree.fromstring() |
extract_content.py |
| HIGH | Untrusted binary blob writes | extract_content.py |
| HIGH | PyMuPDF C extension attack surface | export_slides.py, render_pdf_images.py |
| MODERATE | Recursive processing without depth limits | Multiple modules |
Existing Security Coverage
- CodeQL:
security-extended,security-and-qualityquery suites foractionsandpython— runs on every PR, on-demand, and weekly - OpenSSF Scorecard: Weekly runs on Sundays + push to main
- gitleaks: Secret scanning (devcontainer-only)
Sub-Issues
This epic tracks the following work items:
- Hypothesis property tests for priority modules
- pip-audit dependency CVE scanning
- Atheris wrapper for Scorecard compliance
- importlib code execution remediation (CRITICAL)
- lxml XXE vector remediation (HIGH)
- Untrusted blob writes remediation (HIGH)
- PyMuPDF attack surface assessment (HIGH)
- Recursive processing depth limits (MODERATE)
- Contribute Hypothesis detection to ossf/scorecard (upstream)
Acceptance Criteria
- All three implementation phases have corresponding issues with clear task-* RPI deliverables
- All five security findings have corresponding issues with severity labels and remediation guidance
- Sub-issue relationships are established for dependency tracking
- Labels are applied consistently across all child issues