-
Notifications
You must be signed in to change notification settings - Fork 125
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't workingsecuritySecurity-related changes or concernsSecurity-related changes or concernsskillsCopilot skill packages (SKILL.md)Copilot skill packages (SKILL.md)
Description
Summary
_resolve_theme_colors() in extract_content.py calls lxml.etree.fromstring() on raw XML blobs extracted from PPTX relationship parts without explicit parser configuration. The default lxml parser may resolve external entities, creating a potential XML External Entity (XXE) injection vector.
Location
- File:
.github/skills/experimental/powerpoint/scripts/extract_content.py - Function:
_resolve_theme_colors()—lxml.etree.fromstring()call on PPTX relationship part blobs
Risk Assessment
- Severity: HIGH
- Attack Vector: A crafted PPTX file containing a malicious theme XML blob with external entity declarations could trigger XXE when the file is processed
- Impact: Information disclosure (local file read), server-side request forgery (SSRF), or denial of service via recursive entity expansion (billion laughs)
- CVSS Category: CWE-611 (Improper Restriction of XML External Entity Reference)
Current Behavior
tree = etree.fromstring(blob) # Parses XML from PPTX relationship partThe XML blob comes from PPTX internal parts (relationship.target_part.blob). While PPTX files are ZIP archives with XML content, a deliberately crafted PPTX can embed malicious XML with DTD declarations containing external entity references.
Expected Behavior
XML parsing should use a hardened parser configuration that disables external entity resolution:
parser = etree.XMLParser(resolve_entities=False, no_network=True)
tree = etree.fromstring(blob, parser=parser)Or use defusedxml for defense-in-depth:
from defusedxml.lxml import fromstring
tree = fromstring(blob)RPI Framework
task-researcher
- Identify all
etree.fromstring()call sites in the codebase - Determine which calls process external/untrusted input vs internal data
- Check if
defusedxmlis available as a dependency or if parser configuration is sufficient - Review lxml's default parser behavior for the installed version
task-planner
- Decide between explicit parser configuration vs
defusedxmlwrapper - Plan consistent XML parsing hardening across all call sites
- Evaluate whether
defusedxmlshould be added as a dependency
task-implementor
- Configure all
etree.fromstring()calls withresolve_entities=False, no_network=True - Alternatively, replace with
defusedxml.lxml.fromstring() - Add tests with XML payloads containing entity declarations to verify they are blocked
- Verify CodeQL XXE query (
py/xxe) passes after remediation
Acceptance Criteria
- All
lxml.etree.fromstring()calls processing external input use a hardened parser configuration - External entity resolution is disabled (
resolve_entities=False) ordefusedxmlis used - Network access during XML parsing is disabled (
no_network=True) - Tests verify that XML with external entity declarations does not trigger entity resolution
- CodeQL
py/xxequery does not flag any call sites after remediation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingsecuritySecurity-related changes or concernsSecurity-related changes or concernsskillsCopilot skill packages (SKILL.md)Copilot skill packages (SKILL.md)