Skip to content

Support additional namespaces in XML documents #728

@kevin-deyoungster

Description

@kevin-deyoungster

This issue proposes a feature which adds support for sanitizing XML documents with custom namespaces

Background & Context

I want to use DOMPurify to sanitize XML documents with custom namespaces; however, it does not support that. All elements with custom namespaces are stripped out whether or not they are included in the ADD_TAGS allowlist.

Example

Input: XML document with custom namespaces
<section xmlns="http://www.ibm.com/events" xmlns:bk="urn:loc.gov:books" xmlns:pi="urn:personalInformation"
  xmlns:isbn='urn:ISBN:0-395-36341-6'>
  <title>Book-Signing Event</title>
  <signing>
    <bk:author pi:title="Mr" pi:name="Jim Ross" />
    <bk:book bk:title="Writing COBOL for Fun and Profit" isbn:number="0426070806" />
    <comment>What a great issue!</comment>
    <dirty onload="alert()"/>
  </signing>
</section>
Current approach with DOMPurify
DOMPurify.sanitize(dirty, {
    PARSER_MEDIA_TYPE: "application/xhtml+xml",
    ADD_ATTR: ['pi:title', 'bk:title', 'isbn:number'],
    ADD_TAGS: ['section', 'title', 'signing', 'bk:author', 'bk:book', 'comment'],
})
Given Output (incorrect):
''

Result is empty because _checkValidNamespace(element) only checks the element's namespaceURI against the standard XHTML, SVG, and MATHML_NAMESPACES. Any other namespace, however valid, is removed. Properly structured XML usually include namespaces, so this will be an important use case.

Feature

To solve this, we propose a new CONFIG option ADD_NAMESPACES (additional namespaces) to allow users to specify valid namespaces for the input string. This builds on @tosmolka's work in Add PARSER_MEDIA_TYPE option to support strict XHTML documents.

Using proposed feature in DOMPurify
DOMPurify.sanitize(dirty, {
    PARSER_MEDIA_TYPE: "application/xhtml+xml",
    ADD_TAGS: ['section', 'title', 'signing', 'bk:author', 'bk:book', 'comment'],
    ADD_ATTR: ['pi:title', 'bk:title', 'isbn:number'],
    ADD_NAMESPACES: ["http://www.ibm.com/events", "urn:loc.gov:books", "urn:personalInformation", "urn:ISBN:0-395-36341-6"]
})
Expected & Given Output (correct):
<section xmlns="http://www.ibm.com/events">
  <title>Book-Signing Event</title>
  <signing>
    <bk:author xmlns:bk="urn:loc.gov:books"
      xmlns:pi="urn:personalInformation" pi:title="Mr"/>
    <bk:book xmlns:bk="urn:loc.gov:books" bk:title="Writing COBOL for Fun and Profit"/>
    <comment>What a great issue!</comment>
  </signing>
</section>

Only the unrecognized tag <dirty> is removed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions