Skip to content

XML parsing crashes if the text contains any illegal unicode characters #375

@RBusarow

Description

@RBusarow

The most likely culprit seems to be non-printable control characters.

GitHub probably won't retain it, but there's a zero-width space (\u200B) after the style tag...

<?xml version="1.0" encoding="utf-8"?>
<resources>​
  <style name="Foo">
  </style>
  ​
</resources>

XML 1.0 has a spec for allowed characters: https://www.w3.org/TR/xml/#charsets

Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]	/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions