fix(skills): harden XML parsing and blob writes in powerpoint extract by WilliamBerryiii · Pull Request #1053 · microsoft/hve-core

WilliamBerryiii · 2026-03-15T21:17:17Z

Description

Hardens the PowerPoint content extraction skill against two classes of security vulnerability:

XXE entity expansion (security(powerpoint): lxml XXE vector in extract_content.py theme color resolution #1014) — _resolve_theme_colors() now creates an XMLParser with resolve_entities=False and no_network=True, preventing external entity injection and network-based SSRF. Malformed XML is caught and logged rather than raising.
Unrestricted blob writes (security(powerpoint): untrusted blob writes in extract_content.py image extraction #1016) — _save_image_blob() is rewritten with defense-in-depth:
- Content-type allowlist limits output to bmp, gif, jpeg, png, tiff, and x-wmf
- 100 MB size cap prevents decompression-bomb writes
- Path containment via os.path.realpath blocks directory-traversal attacks
- WMF magic-byte validation rejects blobs that claim image/x-wmf but lack a valid Aldus placeable or standard WMF header prefix (historical CVE-2005-4560 mitigation)

Both fixes follow a fail-closed design: invalid inputs are rejected or logged, never silently passed through.

Related Issue(s)

Closes #1014
Closes #1016

Type of Change

Select all that apply:

Code & Documentation:

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update

Infrastructure & Configuration:

AI Artifacts:

Reviewed contribution with prompt-builder agent and addressed all feedback
Copilot instructions (.github/instructions/*.instructions.md)
Copilot prompt (.github/prompts/*.prompt.md)
Copilot agent (.github/agents/*.agent.md)
Copilot skill (.github/skills/*/SKILL.md)

Other:

Script/automation (.ps1, .sh, .py)
Other (please describe):

Testing

9 new regression tests added to test_extract_content.py covering every hardening layer:
- Delegation: extract_image correctly delegates to _save_image_blob
- Content-type allowlist: parametrized across 5 allowed and 5 rejected MIME types
- Size limit: boundary tests at exactly 100 MB and 100 MB + 1
- Path containment: traversal attempt (../escape.png) is rejected
- WMF magic bytes: valid Aldus placeable, valid standard prefix, invalid magic, and too-short blob
- XXE prevention: entity expansion attempt raises no error and produces no expansion
- Predefined entities: &, <, etc. are preserved correctly
Full test suite: 591 tests passing

Checklist

Required Checks

Documentation is updated (if applicable)
Files follow existing naming conventions
Changes are backwards compatible (if applicable)
Tests added for new functionality (if applicable)

Required Automated Checks

The following validation commands must pass before merging:

Markdown linting: npm run lint:md
Spell checking: npm run spell-check
Frontmatter validation: npm run lint:frontmatter
Skill structure validation: npm run validate:skills
Link validation: npm run lint:md-links (12 pre-existing broken links in unrelated files; 0 in changed files)
PowerShell analysis: npm run lint:ps
Plugin freshness: npm run plugin:generate (no plugin changes in this PR)

Security Considerations

This PR does not contain any sensitive or NDA information
Any new dependencies have been reviewed for security issues
Security-related scripts follow the principle of least privilege

Additional Notes

The no_network=True flag on XMLParser is an additional hardening measure that blocks SSRF attempts via DTD fetches, complementing the resolve_entities=False defense.
WMF validation addresses the historical CVE-2005-4560 attack vector where crafted WMF files could trigger arbitrary code execution.
All guards are fail-closed: unrecognized content types, oversized blobs, path escapes, and invalid WMF headers all result in ValueError with descriptive messages.

github-actions · 2026-03-15T21:17:29Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

- add XMLParser(resolve_entities=False) to _resolve_theme_colors for XXE prevention - rewrite _save_image_blob with content-type allowlist, size limit, path containment - refactor extract_image to delegate blob writes through secured helper - add 9 regression tests covering XXE and blob-write vectors - gitignore .hypothesis test artifact directories 🔒 - Generated by Copilot

…-blob-validation-1014-1016

codecov-commenter · 2026-03-15T21:44:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.12%. Comparing base (f85fe02) to head (b13b881).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1053      +/-   ##
==========================================
+ Coverage   88.01%   88.12%   +0.10%     
==========================================
  Files          45       45              
  Lines        7886     7931      +45     
==========================================
+ Hits         6941     6989      +48     
+ Misses        945      942       -3

Flag	Coverage Δ
pester	`86.91% <ø> (ø)`
pytest	`90.72% <100.00%> (+0.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...experimental/powerpoint/scripts/extract_content.py	`83.88% <100.00%> (+1.74%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

katriendg

The XXE and path-traversal hardening is solid, and the test coverage is thorough. One blocker needs to be resolved before merge, plus two smaller items below.

🔴 Blocker — EMF images missing from allowlist (`extract_content.py` lines ~42–51)

_CONTENT_TYPE_TO_EXT omits image/emf and image/x-emf (Enhanced Metafile Format). This is a hard regression for real-world PPTX files. python-pptx's own source explicitly notes:

"Images produced by PowerPoint itself are generally EMF"

Charts, SmartArt, and diagrams exported from PowerPoint commonly embed EMF blobs. With this PR, any such file will crash the entire extraction run with:

ValueError: Unsupported image content type: image/x-emf

The error propagates entirely uncaught: _save_image_blob → extract_image → _extract_shape_by_type → extract_slide (catches only AttributeError, TypeError) → main(). A single EMF shape anywhere in a presentation aborts all output.

Suggested fix — add both EMF MIME types to the allowlist, with optional magic-byte validation (EMF header: \x01\x00\x00\x00 at offset 0, b" EMF" at offset 40):

_CONTENT_TYPE_TO_EXT: dict[str, str] = {
    "image/bmp": "bmp",
    "image/emf": "emf",       # add
    "image/gif": "gif",
    "image/jpeg": "jpg",
    "image/png": "png",
    "image/tiff": "tiff",
    "image/x-emf": "emf",    # add — common python-pptx MIME type for EMF
    "image/x-wmf": "wmf",
}

The rejected-type test already includes "image/x-emf" as an expected failure, which confirms the omission — that test entry should be removed once EMF is added to the allowlist.

⚠️ Medium — Unguarded `ValueError` crashes full extraction on any validation failure (`extract_content.py` lines ~114–129)

All four guard clauses in _save_image_blob raise ValueError with no upstream handler. For the content-type and size cases this means a single malformed-but-benign image (e.g., an oversized embedded asset or an unusual MIME type) aborts the whole extraction. The fail-closed design is correct for the path-traversal and WMF-magic cases (active security threats), but for the size and type cases a log-and-skip pattern would be more resilient without weakening the security boundary.

Consider catching ValueError in _extract_shape_by_type (or extract_image) and returning None with a logged warning for the non-security cases, or documenting the hard-crash behaviour as intentional in the function docstring so callers know they must handle it.

💡 Low — Duplicate XXE test coverage (`test_extract_content.py` lines ~1073 and ~1099)

test_xxe_entity_not_resolved and test_resolve_theme_colors_xxe_payload_blocked exercise the exact same code path (_resolve_theme_colors with a SYSTEM entity), differing only in the target path (/dev/null vs /etc/passwd). One of the two is redundant. Consolidating them into a single parametrized test would sharpen intent without losing coverage.

The branch is also behind main and will need a rebase before merge.

katriendg · 2026-03-18T09:17:18Z

@WilliamBerryiii I did a full automated review here, including posting the comments :) YOLO - I believe the 'blocker' one is accurate.

WilliamBerryiii · 2026-03-18T17:08:26Z

@WilliamBerryiii I did a full automated review here, including posting the comments :) YOLO - I believe the 'blocker' one is accurate.

Ah ha ha ha ha ... this is solid ... I was testing with one of the decks made by the tool which was using wmf and I neglected to test with a prebuilt standard PPTX deck where the blocker would have surfaced. Addressing now.

- add image/emf and image/x-emf to allowlist with magic-byte validation - replace bare ValueError with _ImageSecurityError subclass - consolidate duplicate XXE tests into parametrized coverage - add SVG sanitization via hardened XMLParser with DTD rejection - add cairosvg SVG-to-PNG conversion with 7 new tests 🔒 - Generated by Copilot

.github/skills/experimental/powerpoint/scripts/extract_content.py

WilliamBerryiii · 2026-03-19T01:09:53Z

@katriendg All three items from your review have been addressed in 476bca6. Here's the breakdown:

🔴 Blocker — EMF images missing from allowlist

Resolved. Added image/emf and image/x-emf to _CONTENT_TYPE_TO_EXT with a new _validate_emf_magic_bytes() guard that verifies the EMF header (\x01\x00\x00\x00 at offset 0) before accepting the blob. Removed the image/x-emf entry from the rejected-type test since it's now in the allowlist.

⚠️ Medium — Unguarded ValueError crashes full extraction

Resolved. Introduced _ImageSecurityError(ValueError) as a dedicated subclass for security-critical validation failures (path traversal, WMF magic-byte mismatch). extract_image now catches plain ValueError with log-and-skip behavior for benign cases (unsupported type, oversize blob) while letting _ImageSecurityError propagate uncaught to callers. This preserves fail-closed semantics for the security boundary without aborting extraction on non-threatening validation failures.

💡 Low — Duplicate XXE test coverage

Resolved. Consolidated test_xxe_entity_not_resolved and test_resolve_theme_colors_xxe_payload_blocked into a single parametrized test_xxe_payload_blocked covering multiple entity-injection paths. Removed the redundant test.

Also included in this push: SVG-to-PNG extraction support via cairosvg with XML sanitization through the hardened XMLParser and explicit DTD-declaration rejection. 7 new tests added. 608 total tests passing, lint clean.

🔧 - Generated by Copilot

- Add missing blank line after _ImageSecurityError class definition - Unwrap single-line raise statement in _validate_emf_magic_bytes

katriendg

Thanks, also for the notes!

🤖 I have created a release *beep* *boop* --- ## [3.2.0](hve-core-v3.1.46...hve-core-v3.2.0) (2026-03-20) ### ✨ Features * add -OutputPath parameter to Validate-MarkdownFrontmatter.ps1 ([#1134](#1134)) ([fdf1bcf](fdf1bcf)), closes [#1006](#1006) * add action version consistency scan workflow ([#1127](#1127)) ([4229df1](4229df1)) * **agent:** MVE Experiment Designer ([#976](#976)) ([70f86ca](70f86ca)) * **agents:** add ADO Backlog Manager orchestrator agent ([#800](#800)) ([fae3987](fae3987)) * **agents:** add meeting analyst agent for transcript analysis using work-iq ([#502](#502)) ([5345b5b](5345b5b)) * **agents:** add quick-reference line to RPI Phase 5 suggestions ([#897](#897)) ([9a90f39](9a90f39)) * **agents:** add RAI Planner, enhance SSSC Planner, and redesign Security Planner ([#979](#979)) ([06f826c](06f826c)) * **agents:** add symmetric cross-system handoff to GitHub Backlog Manager ([#952](#952)) ([ba34a35](ba34a35)) * **agents:** Functional Code Review Agent — pre-PR functional correctness reviewer ([#733](#733)) ([9cf63b7](9cf63b7)) * **build:** add Python extensions and uv 0.10.8 to devcontainer ([#920](#920)) ([9ca0579](9ca0579)) * **build:** add uv ecosystem to Dependabot configuration ([#913](#913)) ([2a4bd39](2a4bd39)) * **build:** enable npm pinning enforcement in dependency scan ([#838](#838)) ([4e9e31f](4e9e31f)) * **build:** migrate attestation actions to v4.1.0 and add SBOM verification docs ([#841](#841)) ([ca1e65b](ca1e65b)) * **collections:** add four new validator checks (orphan, duplicate, companion, coverage) ([#869](#869)) ([1a96b73](1a96b73)) * **devcontainer,security:** add enterprise artifact hub configuration ([#1032](#1032)) ([1d56d25](1d56d25)) * **docs:** add Rust coding standards and guidelines ([#809](#809)) ([d4c4899](d4c4899)) * **extension:** add Microsoft logo icon to VS Code Marketplace listings ([#906](#906)) ([82aca41](82aca41)) * **github:** add declarative label management ([#953](#953)) ([a1a6845](a1a6845)) * **instructions:** add ADO backlog shared infrastructure ([#786](#786)) ([1914078](1914078)) * **instructions:** add ADO backlog sprint planning and capacity tracking ([#788](#788)) ([d6fb77d](d6fb77d)) * **instructions:** add ADO triage workflow and prompt ([#787](#787)) ([cde0190](cde0190)) * **instructions:** add shared story quality conventions and sprint planning ([#803](#803)) ([a2f18e3](a2f18e3)) * **prompts:** add ADO discovery and work item prompts with agent routing ([#790](#790)) ([7e74523](7e74523)) * **prompts:** add security review prompts ([#1118](#1118)) ([ad30967](ad30967)) * **scripts:** add dynamic Python skill discovery for lint/test ([#957](#957)) ([0a90f57](0a90f57)) * **scripts:** add Get-StandardTimestamp utility to CIHelpers module ([#1126](#1126)) ([b273a4b](b273a4b)) * **scripts:** add Python copyright header validation ([#905](#905)) ([67df902](67df902)) * **scripts:** add Python skill support to Validate-SkillStructure ([#903](#903)) ([68479d9](68479d9)) * **scripts:** add workflow npm command scanning to dependency pinning ([#837](#837)) ([6b5ae06](6b5ae06)) * **security:** add basic security reviewer agent with owasp skills ([#1008](#1008)) ([cb1fd05](cb1fd05)) * **security:** add sigstore attestation bundles and fix component-detection action ([#1148](#1148)) ([f79c272](f79c272)) * **skills:** add Atheris fuzz harness with CI workflow integration ([#1102](#1102)) ([d337e1d](d337e1d)) * **skills:** add PowerPoint automation skill with YAML-driven deck generation ([#868](#868)) ([00465cd](00465cd)) * **skills:** convert hve-core-installer agent to self-contained skill ([#846](#846)) ([1d821fb](1d821fb)) * **skills:** enhance pr-reference skill with flexible filtering and base branch detection ([#1095](#1095)) ([26a32ea](26a32ea)) * **workflows:** add devcontainer infrastructure change log workflow ([#899](#899)) ([8aca446](8aca446)) * **workflows:** add milestone auto-close on stable and pre-release publishes ([#834](#834)) ([79362b1](79362b1)) * **workflows:** add ms.date documentation freshness checking ([#969](#969)) ([3ed441c](3ed441c)) * **workflows:** add Python linting CI workflow with Ruff ([#951](#951)) ([f89f0eb](f89f0eb)) * **workflows:** add Python testing CI workflow with pytest and Codecov ([#934](#934)) ([5e8306f](5e8306f)) * **workflows:** add uv and Python package sync to copilot-setup-steps ([#921](#921)) ([45d517d](45d517d)) ### 🐛 Bug Fixes * **build:** override Linguist vendored flag for Python skill files ([#1155](#1155)) ([0eee5b6](0eee5b6)) * **build:** override serialize-javascript to >=7.0.3 for RCE fix ([#876](#876)) ([e49039a](e49039a)) * **build:** resolve Pinned-Dependencies alerts for vsce npm commands in extension workflows ([#782](#782)) ([89dad9d](89dad9d)) * **build:** update undici and yauzl overrides for security audit ([#1030](#1030)) ([2c2f92f](2c2f92f)) * **docs:** add CLI Plugins to install.md navigation surfaces ([#902](#902)) ([79d6595](79d6595)) * **docs:** add sidebar ordering for Design Thinking documentation ([#832](#832)) ([551fddc](551fddc)), closes [#830](#830) * **docs:** graduate design-thinking to preview and correct stale collection references ([#831](#831)) ([5110e35](5110e35)) * **docs:** include project-planning in UX Designer install guidance ([#908](#908)) ([e7aa9bc](e7aa9bc)) * **docs:** remediate writing-style convention violations ([#865](#865)) ([68b04bc](68b04bc)) * **docs:** remove draft content announcement banner ([#825](#825)) ([b45de80](b45de80)) * **docs:** remove unbounded path-to-regexp override breaking SSG ([#1153](#1153)) ([d810018](d810018)) * **docs:** use actual clone paths instead of folder display names in multi-root workspace settings ([#984](#984)) ([5dbab82](5dbab82)) * **instructions:** replace black with ruff in uv-projects ([#898](#898)) ([b0c06d9](b0c06d9)) * **scripts:** cover .github/ skill files in copyright header validation ([#1055](#1055)) ([#1098](#1098)) ([27fbd33](27fbd33)) * **scripts:** eliminate phantom git changes from plugin generation ([#1035](#1035)) ([e49a1b5](e49a1b5)) * **scripts:** enable JSON log output for lint:version-consistency ([#1033](#1033)) ([52b0885](52b0885)) * **security:** calculate compliance score from total scanned dependencies ([#930](#930)) ([c112c3d](c112c3d)) * **skills:** add AST validation and namespace restriction for content-extra.py ([#1027](#1027)) ([c50c7a3](c50c7a3)) * **skills:** add depth limits to recursive PowerPoint processing functions ([#1028](#1028)) ([bf08994](bf08994)) * **skills:** harden XML parsing and blob writes in powerpoint extract ([#1053](#1053)) ([89d24b1](89d24b1)) * **skills:** resolve ruff lint and format violations in powerpoint skill ([#1048](#1048)) ([17bbe7a](17bbe7a)) * **workflows:** add uv.lock dependencies submission have fork-skip condition ([#1109](#1109)) ([dec56ac](dec56ac)) * **workflows:** automate weekly SHA staleness check with issue creation ([#975](#975)) ([1ea4caa](1ea4caa)) * **workflows:** close Codecov integration gaps for Pester and pytest flags ([#1106](#1106)) ([cca29b7](cca29b7)) * **workflows:** propagate uv sync errors in copilot-setup-steps ([#961](#961)) ([df88d7c](df88d7c)) * **workflows:** resolve release-please skip cascade and Python project discovery ([#1043](#1043)) ([79993e2](79993e2)) * **workflows:** scan only commit subjects for breaking change detection ([#1157](#1157)) ([a38a657](a38a657)) ### 📚 Documentation * clarify HVE Core Extension vs Installer messaging across documentation ([#965](#965)) ([0fceb8f](0fceb8f)) * **docs:** add ADO integration user documentation ([#935](#935)) ([ec89302](ec89302)) * **docs:** add Project Planning agent documentation ([#936](#936)) ([3a3a0fd](3a3a0fd)) * **onboarding:** overhaul marketplace onboarding and documentation site ([#982](#982)) ([4309e10](4309e10)) ### ♻️ Refactoring * **build:** merge code-review collection into coding-standards ([#863](#863)) ([8027e7b](8027e7b)) * **workflows:** rename release pipeline workflows and add marketplace automation triggers ([#829](#829)) ([b6397f4](b6397f4)) ### 🔧 Maintenance * **build:** add clean:logs npm script ([#1122](#1122)) ([f85fe02](f85fe02)), closes [#988](#988) * **build:** add JSON reporter for cspell ([#1123](#1123)) ([6d59f67](6d59f67)) * **ci:** add multi-arch support to copilot-setup-steps binary downloads ([#955](#955)) ([8d0c706](8d0c706)) * **deps-dev:** bump cspell from 9.6.4 to 9.7.0 in the npm-dependencies group ([#839](#839)) ([3fa16ff](3fa16ff)) * **deps:** bump actions/dependency-review-action from 4.8.3 to 4.9.0 in the github-actions group across 1 directory ([#942](#942)) ([1a9b858](1a9b858)) * **deps:** bump cairosvg from 2.8.2 to 2.9.0 in /.github/skills/experimental/powerpoint ([#1025](#1025)) ([f4deda7](f4deda7)) * **deps:** bump dompurify from 3.3.1 to 3.3.2 in /docs/docusaurus ([#924](#924)) ([d2060d6](d2060d6)) * **deps:** bump svgo from 3.3.2 to 3.3.3 in /docs/docusaurus ([#880](#880)) ([6dc2406](6dc2406)) * **deps:** bump the github-actions group across 1 directory with 4 updates ([#1100](#1100)) ([2290dc0](2290dc0)) * **deps:** bump the github-actions group with 6 updates ([#840](#840)) ([f57bc01](f57bc01)) * **docs:** correct New-MsDateReport table rendering and refresh stale docs ([#1114](#1114)) ([c2b806f](c2b806f)) * **settings:** remove orphaned Checkov config and stale gitignore entries ([#870](#870)) ([98fcd74](98fcd74)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: hve-core-release-please[bot] <254602402+hve-core-release-please[bot]@users.noreply.github.com> Co-authored-by: Bill Berry <wberry@microsoft.com>

WilliamBerryiii requested a review from a team as a code owner March 15, 2026 21:17

WilliamBerryiii force-pushed the fix/pptx-security-xxe-blob-validation-1014-1016 branch from 7dc2faa to efd34b0 Compare March 15, 2026 21:36

Merge remote-tracking branch 'origin/main' into fix/pptx-security-xxe…

a67c19a

…-blob-validation-1014-1016

This was referenced Mar 15, 2026

security(skills): audit codebase for unhardened XML parsers beyond PowerPoint skill #1056

Open

fix(docs): resolve 12 pre-existing broken links found during validation #1058

Closed

katriendg requested changes Mar 18, 2026

View reviewed changes

github-advanced-security bot found potential problems Mar 19, 2026

View reviewed changes

.github/skills/experimental/powerpoint/scripts/extract_content.py Fixed Show fixed Hide fixed

Bill Berry and others added 4 commits March 18, 2026 18:17

fix(skills): add comment to empty except clause in _resolve_theme_colors

31cf06b

🔧 - Generated by Copilot

Merge branch 'main' into fix/pptx-security-xxe-blob-validation-1014-1016

a6c325f

style(skills): fix ruff format violations in extract_content.py

a48e56c

- Add missing blank line after _ImageSecurityError class definition - Unwrap single-line raise statement in _validate_emf_magic_bytes

Merge branch 'main' into fix/pptx-security-xxe-blob-validation-1014-1016

b13b881

katriendg approved these changes Mar 19, 2026

View reviewed changes

WilliamBerryiii merged commit 89d24b1 into main Mar 19, 2026
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skills): harden XML parsing and blob writes in powerpoint extract#1053

fix(skills): harden XML parsing and blob writes in powerpoint extract#1053
WilliamBerryiii merged 7 commits intomainfrom
fix/pptx-security-xxe-blob-validation-1014-1016

WilliamBerryiii commented Mar 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 15, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 15, 2026 •

edited

Loading

Uh oh!

katriendg left a comment

Uh oh!

katriendg commented Mar 18, 2026

Uh oh!

WilliamBerryiii commented Mar 18, 2026

Uh oh!

Uh oh!

WilliamBerryiii commented Mar 19, 2026

Uh oh!

katriendg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WilliamBerryiii commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Type of Change

Testing

Checklist

Required Checks

Required Automated Checks

Security Considerations

Additional Notes

Uh oh!

github-actions bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

codecov-commenter commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

katriendg left a comment

Choose a reason for hiding this comment

🔴 Blocker — EMF images missing from allowlist (extract_content.py lines ~42–51)

⚠️ Medium — Unguarded ValueError crashes full extraction on any validation failure (extract_content.py lines ~114–129)

💡 Low — Duplicate XXE test coverage (test_extract_content.py lines ~1073 and ~1099)

Uh oh!

katriendg commented Mar 18, 2026

Uh oh!

WilliamBerryiii commented Mar 18, 2026

Uh oh!

Uh oh!

WilliamBerryiii commented Mar 19, 2026

🔴 Blocker — EMF images missing from allowlist

⚠️ Medium — Unguarded ValueError crashes full extraction

💡 Low — Duplicate XXE test coverage

Uh oh!

katriendg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WilliamBerryiii commented Mar 15, 2026 •

edited

Loading

github-actions bot commented Mar 15, 2026 •

edited

Loading

codecov-commenter commented Mar 15, 2026 •

edited

Loading

🔴 Blocker — EMF images missing from allowlist (`extract_content.py` lines ~42–51)

⚠️ Medium — Unguarded `ValueError` crashes full extraction on any validation failure (`extract_content.py` lines ~114–129)

💡 Low — Duplicate XXE test coverage (`test_extract_content.py` lines ~1073 and ~1099)