-
-
Notifications
You must be signed in to change notification settings - Fork 831
Support additional namespaces in XML documents #728
Description
This issue proposes a feature which adds support for sanitizing XML documents with custom namespaces
Background & Context
I want to use DOMPurify to sanitize XML documents with custom namespaces; however, it does not support that. All elements with custom namespaces are stripped out whether or not they are included in the ADD_TAGS allowlist.
Example
Input: XML document with custom namespaces
<section xmlns="http://www.ibm.com/events" xmlns:bk="urn:loc.gov:books" xmlns:pi="urn:personalInformation"
xmlns:isbn='urn:ISBN:0-395-36341-6'>
<title>Book-Signing Event</title>
<signing>
<bk:author pi:title="Mr" pi:name="Jim Ross" />
<bk:book bk:title="Writing COBOL for Fun and Profit" isbn:number="0426070806" />
<comment>What a great issue!</comment>
<dirty onload="alert()"/>
</signing>
</section>Current approach with DOMPurify
DOMPurify.sanitize(dirty, {
PARSER_MEDIA_TYPE: "application/xhtml+xml",
ADD_ATTR: ['pi:title', 'bk:title', 'isbn:number'],
ADD_TAGS: ['section', 'title', 'signing', 'bk:author', 'bk:book', 'comment'],
})Given Output (incorrect):
''
Result is empty because _checkValidNamespace(element) only checks the element's namespaceURI against the standard XHTML, SVG, and MATHML_NAMESPACES. Any other namespace, however valid, is removed. Properly structured XML usually include namespaces, so this will be an important use case.
Feature
To solve this, we propose a new CONFIG option ADD_NAMESPACES (additional namespaces) to allow users to specify valid namespaces for the input string. This builds on @tosmolka's work in Add PARSER_MEDIA_TYPE option to support strict XHTML documents.
Using proposed feature in DOMPurify
DOMPurify.sanitize(dirty, {
PARSER_MEDIA_TYPE: "application/xhtml+xml",
ADD_TAGS: ['section', 'title', 'signing', 'bk:author', 'bk:book', 'comment'],
ADD_ATTR: ['pi:title', 'bk:title', 'isbn:number'],
ADD_NAMESPACES: ["http://www.ibm.com/events", "urn:loc.gov:books", "urn:personalInformation", "urn:ISBN:0-395-36341-6"]
})Expected & Given Output (correct):
<section xmlns="http://www.ibm.com/events">
<title>Book-Signing Event</title>
<signing>
<bk:author xmlns:bk="urn:loc.gov:books"
xmlns:pi="urn:personalInformation" pi:title="Mr"/>
<bk:book xmlns:bk="urn:loc.gov:books" bk:title="Writing COBOL for Fun and Profit"/>
<comment>What a great issue!</comment>
</signing>
</section>Only the unrecognized tag <dirty> is removed