Skip to content

epic: Python Security Testing & Fuzzing Initiative for PowerPoint Skill #1012

@WilliamBerryiii

Description

@WilliamBerryiii

Overview

Establish a comprehensive security testing and fuzzing strategy for the PowerPoint skill Python codebase (~4,700 lines across 14 modules under .github/skills/experimental/powerpoint/scripts/).

This initiative implements Scenario D (Hypothesis + pip-audit) from the ClusterFuzzLite research evaluation, leveraging the existing CodeQL Python analysis for SAST coverage. The approach was selected for its highest value-to-effort ratio given the codebase's structured-input nature and size.

Background

A thorough security-focused codebase analysis identified 5 security findings (1 CRITICAL, 3 HIGH, 1 MODERATE), a complete absence of property-based or fuzz testing (all ~300+ tests are deterministic), and no dependency CVE scanning. ClusterFuzzLite was evaluated but rejected as the primary approach due to structured input mismatch, Python >=3.11 incompatibility with the default base image, and disproportionate setup complexity.

Three-Phase Implementation

Phase 1: Hypothesis Property Tests (High Priority)

Add hypothesis>=6.100 to dev dependencies and write property tests targeting priority modules:

  • validate_slides.py / validate_deck.py — input validation robustness
  • build_deck.py — element builder dispatch with arbitrary element definitions
  • pptx_colors.py — hex color parsing edge cases
  • pptx_tables.py — merge bounds and out-of-range handling

Phase 2: pip-audit Dependency CVE Scanning (High Priority)

Add pip-audit CI step to scan pyproject.toml dependencies (python-pptx, pyyaml, pymupdf, lxml) for known CVEs using open vulnerability databases (PyPI Advisory Database, OSV).

Phase 3: OSSF Scorecard Fuzzing Compliance (Medium Priority)

Add a thin Atheris wrapper using the polyglot pattern so that import atheris is detectable by OSSF Scorecard's Fuzzing check. Hypothesis alone scores 0/10 since Scorecard only recognizes import atheris for Python.

Security Findings to Address

Severity Finding Location
CRITICAL Arbitrary code execution via importlib build_deck.py
HIGH XML parsing (XXE vector) via lxml.etree.fromstring() extract_content.py
HIGH Untrusted binary blob writes extract_content.py
HIGH PyMuPDF C extension attack surface export_slides.py, render_pdf_images.py
MODERATE Recursive processing without depth limits Multiple modules

Existing Security Coverage

  • CodeQL: security-extended,security-and-quality query suites for actions and python — runs on every PR, on-demand, and weekly
  • OpenSSF Scorecard: Weekly runs on Sundays + push to main
  • gitleaks: Secret scanning (devcontainer-only)

Sub-Issues

This epic tracks the following work items:

  • Hypothesis property tests for priority modules
  • pip-audit dependency CVE scanning
  • Atheris wrapper for Scorecard compliance
  • importlib code execution remediation (CRITICAL)
  • lxml XXE vector remediation (HIGH)
  • Untrusted blob writes remediation (HIGH)
  • PyMuPDF attack surface assessment (HIGH)
  • Recursive processing depth limits (MODERATE)
  • Contribute Hypothesis detection to ossf/scorecard (upstream)

Acceptance Criteria

  • All three implementation phases have corresponding issues with clear task-* RPI deliverables
  • All five security findings have corresponding issues with severity labels and remediation guidance
  • Sub-issue relationships are established for dependency tracking
  • Labels are applied consistently across all child issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity-related changes or concernsskillsCopilot skill packages (SKILL.md)testingTest infrastructure and test files

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions