fix: harden XML parsing with defusedxml across gateway, skills, and tests#28585
fix: harden XML parsing with defusedxml across gateway, skills, and tests#28585lanzhi-lee wants to merge 3 commits into
Conversation
|
Please use PULL_REQUEST_TEMPLATE.md |
There was a problem hiding this comment.
Pull request overview
Hardens XML parsing across three call sites by replacing xml.etree.ElementTree parsing with defusedxml equivalents, to defend against XXE/DTD/entity-expansion attacks on potentially attacker-controlled XML inputs (WeCom callbacks, RSS feeds, and .docx tracked-change extraction). Adds defusedxml==0.7.1 to the [messaging] extra.
Changes:
- Replace
ET.fromstringwithdefusedxml.ElementTree.fromstringingateway/platforms/wecom_callback.py(decrypt + event build paths). - Apply the same hardening to
optional-skills/devops/watchers/scripts/watch_rss.pyandskills/productivity/powerpoint/scripts/office/helpers/simplify_redlines.py(via_safe_parse). - Declare
defusedxml==0.7.1in the[messaging]optional-dependency group (and correspondinguv.lockupdate).
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| gateway/platforms/wecom_callback.py | Swap ET.fromstring for safe defusedxml variant in WeCom decrypt + event paths. |
| optional-skills/devops/watchers/scripts/watch_rss.py | Parse RSS/Atom feeds via defusedxml. |
| skills/productivity/powerpoint/scripts/office/helpers/simplify_redlines.py | Parse word/document.xml via defusedxml parse. |
| pyproject.toml | Add defusedxml==0.7.1 to [messaging] extra. |
| uv.lock | Lockfile update for defusedxml. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ests Replace xml.etree.ElementTree.fromstring/parse with defusedxml equivalents to prevent XXE injection in all code paths that parse external or untrusted XML: - gateway/platforms/wecom_callback.py: WeCom callback payloads - optional-skills/devops/watchers/scripts/watch_rss.py: RSS/Atom feeds - skills/productivity/powerpoint/.../simplify_redlines.py: .docx XML - skills/research/arxiv/scripts/search_arxiv.py: arXiv API responses - tests/acp/test_registry_manifest.py: SVG icon parsing - tests/gateway/test_wecom_callback.py: test XML parsing Promote defusedxml to a core dependency (zero transitive deps) since it is used across gateway, skills, optional-skills, and tests that have no relation to any single optional-dependency group. Catch DefusedXmlException alongside ET.ParseError in watch_rss.py and simplify_redlines.py so hostile XML triggers the existing graceful fallback instead of an uncaught traceback. Co-Authored-By: MiMo-V2.5-Pro <noreply@xiaomi.com>
402ad11 to
2c3e7d7
Compare
|
Done — updated the PR description to use the template. Thanks! |
|
@austinpickett Updated — the PR description now uses the template and covers all changed files. Thanks for the review! |
austinpickett
left a comment
There was a problem hiding this comment.
Code Review — fix: harden XML parsing with defusedxml
8 files, +29/-16. Replaces stdlib xml.etree.ElementTree parsing with defusedxml to prevent XXE attacks. Clean, well-scoped security hardening.
⚠️ Minor fix needed
(item 1) Dead import in wecom_callback.py — after replacing both ET.fromstring() calls with _safe_fromstring, the from xml.etree import ElementTree as ET import (line 21) is completely unused. No remaining ET. references in the file. Remove it.
💡 Suggestions
(item 2) Four documentation files still show xml.etree.ElementTree in code examples (research-arxiv.md and SKILL.md). While not production code, they contradict the security posture. Could be a follow-up PR.
✅ Looks Good
- All parsing call sites migrated: 8
ET.fromstring()/ET.parse()calls across 5 production files and 2 test files correctly replaced withdefusedxmlequivalents - Exception handling updated:
DefusedXmlExceptioncorrectly added alongsideET.ParseErrorin except clauses - XML construction left alone:
wecom_crypto.pycorrectly keepsET.Element,ET.SubElement,ET.tostringon stdlib — these are generation, not parsing defusedxml==0.7.1properly pinned — latest release, zero transitive deps- Protections: XXE, billion laughs, DTD retrieval, processing instructions all blocked by default
Verdict: Approve with (item 1) dead-import cleanup. The core security changes are correct and complete. LGTM.
Reviewed by Hermes Agent
Co-Authored-By: MiMo-V2.5-Pro <noreply@xiaomi.com>
Co-Authored-By: MiMo-V2.5-Pro <noreply@xiaomi.com>
|
@austinpickett All review items addressed:
Ready for another look when you get a chance. Thanks! |
What does this PR do?
Hardens all XML parsing call sites across the codebase by replacing
xml.etree.ElementTree.fromstring/parsewith theirdefusedxmlequivalents. The stdlib parser resolves external entities by default, which could allow an attacker to read local files, perform SSRF, or trigger denial-of-service via entity expansion (XXE).defusedxmlis the Python-recommended drop-in replacement that blocks external entity resolution, DTD fetching, and entity expansion attacks by default.Related Issue
Type of Change
Changes Made
gateway/platforms/wecom_callback.py— replaceET.fromstringwithdefusedxml.ElementTree.fromstringin_decrypt_requestand_build_event(WeCom callback payloads)optional-skills/devops/watchers/scripts/watch_rss.py— replaceET.fromstringwithdefusedxml.fromstring; widenexceptto also catchDefusedXmlExceptionskills/productivity/powerpoint/scripts/office/helpers/simplify_redlines.py— replaceET.parsewithdefusedxml.parse; widen bothexcept ET.ParseErrorclauses to also catchDefusedXmlExceptionskills/research/arxiv/scripts/search_arxiv.py— replaceET.fromstringwithdefusedxml.fromstring(arXiv API responses)tests/acp/test_registry_manifest.py— replaceET.fromstringwithdefusedxml.fromstringtests/gateway/test_wecom_callback.py— replaceET.fromstringwithdefusedxml.fromstringpyproject.toml+uv.lock— promotedefusedxml==0.7.1to a core dependency (zero transitive deps), since it is now used across gateway, skills, optional-skills, and tests that have no relation to any single optional-dependency groupHow to Test
pytest tests/ -q— all existing tests should passdefusedxml.fromstringraisesEntitiesForbiddenwhen given XML with<!DOCTYPE ...>declarations:watch_rss.py: verify a malicious feed XML with DTD triggers the gracefulsys.exit(2)path instead of an uncaught tracebackChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
cli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/A🤖 Generated with Claude Code