Skip to content

[core] PMD/CPD produces invalid XML (insufficient escaping/wrong encoding) #2615

@maikelsteneker

Description

@maikelsteneker

Affects PMD Version: PMD 6.24.0

Description:

When running on a file with a different encoding than the system default, PMD can produce invalid XML.

Steps to reproduce:

Please provide detailed steps for how we can reproduce the bug.

  1. Use the following UTF-8 encoded source file: MyClass.java.zip
  2. Invoke PMD on Windows 10 as follows: pmd -d C:/tmp/MyClass.java -rulesets category/java/errorprone.xml/AvoidDuplicateLiterals -format xml -language java > out.xml
    This produces the following XML output: out.xml.zip
    The encoding of this file should be UTF-8 (as indicated at the start of the file), but contains files in a different encoding. Running xmllint on it produces:
    /tmp/out.xml:8: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0xE9 0xE9 0xE9 The String literal "��������" appears 5 times in this file; the first ^

In fact, in my case, these characters are in the Windows-1255 encoding. I think PMD uses this encoding because it uses the system property file.encoding. One workaround is to use the same encoding as the file (UTF-8 by default).

Running PMD through: [CLI]

Metadata

Metadata

Assignees

Labels

a:bugPMD crashes or fails to analyse a file.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions