Skip to content

fix: harden XML parsing with defusedxml across gateway, skills, and tests#28585

Open
lanzhi-lee wants to merge 3 commits into
NousResearch:mainfrom
lanzhi-lee:fix/wecom-callback-xxe
Open

fix: harden XML parsing with defusedxml across gateway, skills, and tests#28585
lanzhi-lee wants to merge 3 commits into
NousResearch:mainfrom
lanzhi-lee:fix/wecom-callback-xxe

Conversation

@lanzhi-lee

@lanzhi-lee lanzhi-lee commented May 19, 2026

Copy link
Copy Markdown

What does this PR do?

Hardens all XML parsing call sites across the codebase by replacing xml.etree.ElementTree.fromstring / parse with their defusedxml equivalents. The stdlib parser resolves external entities by default, which could allow an attacker to read local files, perform SSRF, or trigger denial-of-service via entity expansion (XXE).

defusedxml is the Python-recommended drop-in replacement that blocks external entity resolution, DTD fetching, and entity expansion attacks by default.

Related Issue

Type of Change

  • 🔒 Security fix

Changes Made

  • gateway/platforms/wecom_callback.py — replace ET.fromstring with defusedxml.ElementTree.fromstring in _decrypt_request and _build_event (WeCom callback payloads)
  • optional-skills/devops/watchers/scripts/watch_rss.py — replace ET.fromstring with defusedxml.fromstring; widen except to also catch DefusedXmlException
  • skills/productivity/powerpoint/scripts/office/helpers/simplify_redlines.py — replace ET.parse with defusedxml.parse; widen both except ET.ParseError clauses to also catch DefusedXmlException
  • skills/research/arxiv/scripts/search_arxiv.py — replace ET.fromstring with defusedxml.fromstring (arXiv API responses)
  • tests/acp/test_registry_manifest.py — replace ET.fromstring with defusedxml.fromstring
  • tests/gateway/test_wecom_callback.py — replace ET.fromstring with defusedxml.fromstring
  • pyproject.toml + uv.lock — promote defusedxml==0.7.1 to a core dependency (zero transitive deps), since it is now used across gateway, skills, optional-skills, and tests that have no relation to any single optional-dependency group

How to Test

  1. Run pytest tests/ -q — all existing tests should pass
  2. Confirm defusedxml.fromstring raises EntitiesForbidden when given XML with <!DOCTYPE ...> declarations:
    from defusedxml.ElementTree import fromstring
    fromstring('<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root>&xxe;</root>')
    # raises EntitiesForbidden
  3. For watch_rss.py: verify a malicious feed XML with DTD triggers the graceful sys.exit(2) path instead of an uncaught traceback

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've tested on my platform: macOS 15.3

Documentation & Housekeeping

  • I've updated relevant documentation — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

🤖 Generated with Claude Code

@lanzhi-lee lanzhi-lee requested a review from a team May 19, 2026 07:07
@alt-glitch alt-glitch added type/security Security vulnerability or hardening P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/wecom WeCom / WeChat Work adapter labels May 19, 2026
@austinpickett austinpickett requested a review from Copilot May 19, 2026 10:34
@austinpickett

Copy link
Copy Markdown
Collaborator

Please use PULL_REQUEST_TEMPLATE.md

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens XML parsing across three call sites by replacing xml.etree.ElementTree parsing with defusedxml equivalents, to defend against XXE/DTD/entity-expansion attacks on potentially attacker-controlled XML inputs (WeCom callbacks, RSS feeds, and .docx tracked-change extraction). Adds defusedxml==0.7.1 to the [messaging] extra.

Changes:

  • Replace ET.fromstring with defusedxml.ElementTree.fromstring in gateway/platforms/wecom_callback.py (decrypt + event build paths).
  • Apply the same hardening to optional-skills/devops/watchers/scripts/watch_rss.py and skills/productivity/powerpoint/scripts/office/helpers/simplify_redlines.py (via _safe_parse).
  • Declare defusedxml==0.7.1 in the [messaging] optional-dependency group (and corresponding uv.lock update).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
gateway/platforms/wecom_callback.py Swap ET.fromstring for safe defusedxml variant in WeCom decrypt + event paths.
optional-skills/devops/watchers/scripts/watch_rss.py Parse RSS/Atom feeds via defusedxml.
skills/productivity/powerpoint/scripts/office/helpers/simplify_redlines.py Parse word/document.xml via defusedxml parse.
pyproject.toml Add defusedxml==0.7.1 to [messaging] extra.
uv.lock Lockfile update for defusedxml.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread optional-skills/devops/watchers/scripts/watch_rss.py
Comment thread optional-skills/devops/watchers/scripts/watch_rss.py
Comment thread pyproject.toml Outdated
…ests

Replace xml.etree.ElementTree.fromstring/parse with defusedxml
equivalents to prevent XXE injection in all code paths that parse
external or untrusted XML:

- gateway/platforms/wecom_callback.py: WeCom callback payloads
- optional-skills/devops/watchers/scripts/watch_rss.py: RSS/Atom feeds
- skills/productivity/powerpoint/.../simplify_redlines.py: .docx XML
- skills/research/arxiv/scripts/search_arxiv.py: arXiv API responses
- tests/acp/test_registry_manifest.py: SVG icon parsing
- tests/gateway/test_wecom_callback.py: test XML parsing

Promote defusedxml to a core dependency (zero transitive deps) since
it is used across gateway, skills, optional-skills, and tests that have
no relation to any single optional-dependency group.

Catch DefusedXmlException alongside ET.ParseError in watch_rss.py and
simplify_redlines.py so hostile XML triggers the existing graceful
fallback instead of an uncaught traceback.

Co-Authored-By: MiMo-V2.5-Pro <noreply@xiaomi.com>
@lanzhi-lee lanzhi-lee force-pushed the fix/wecom-callback-xxe branch from 402ad11 to 2c3e7d7 Compare May 19, 2026 11:19
@lanzhi-lee lanzhi-lee changed the title fix(gateway): use defusedxml to prevent XXE in WeCom callback fix: harden XML parsing with defusedxml across gateway, skills, and tests May 19, 2026
@lanzhi-lee

Copy link
Copy Markdown
Author

Done — updated the PR description to use the template. Thanks!

@lanzhi-lee

Copy link
Copy Markdown
Author

@austinpickett Updated — the PR description now uses the template and covers all changed files. Thanks for the review!

austinpickett
austinpickett previously approved these changes May 20, 2026

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — fix: harden XML parsing with defusedxml

8 files, +29/-16. Replaces stdlib xml.etree.ElementTree parsing with defusedxml to prevent XXE attacks. Clean, well-scoped security hardening.


⚠️ Minor fix needed

(item 1) Dead import in wecom_callback.py — after replacing both ET.fromstring() calls with _safe_fromstring, the from xml.etree import ElementTree as ET import (line 21) is completely unused. No remaining ET. references in the file. Remove it.


💡 Suggestions

(item 2) Four documentation files still show xml.etree.ElementTree in code examples (research-arxiv.md and SKILL.md). While not production code, they contradict the security posture. Could be a follow-up PR.


✅ Looks Good

  • All parsing call sites migrated: 8 ET.fromstring()/ET.parse() calls across 5 production files and 2 test files correctly replaced with defusedxml equivalents
  • Exception handling updated: DefusedXmlException correctly added alongside ET.ParseError in except clauses
  • XML construction left alone: wecom_crypto.py correctly keeps ET.Element, ET.SubElement, ET.tostring on stdlib — these are generation, not parsing
  • defusedxml==0.7.1 properly pinned — latest release, zero transitive deps
  • Protections: XXE, billion laughs, DTD retrieval, processing instructions all blocked by default

Verdict: Approve with (item 1) dead-import cleanup. The core security changes are correct and complete. LGTM.

Reviewed by Hermes Agent

lanzhi-lee and others added 2 commits May 21, 2026 09:46
Co-Authored-By: MiMo-V2.5-Pro <noreply@xiaomi.com>
Co-Authored-By: MiMo-V2.5-Pro <noreply@xiaomi.com>
@lanzhi-lee

Copy link
Copy Markdown
Author

@austinpickett All review items addressed:

  • item 1: Removed the unused import from
  • item 2: Updated all 4 code examples in and to use

Ready for another look when you get a chance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants