Skip to content

[core] Fix XMLRenderer encoding issues#2633

Merged
oowekyala merged 10 commits into
pmd:masterfrom
adangel:issue-2615
Jul 23, 2020
Merged

[core] Fix XMLRenderer encoding issues#2633
oowekyala merged 10 commits into
pmd:masterfrom
adangel:issue-2615

Conversation

@adangel

@adangel adangel commented Jul 5, 2020

Copy link
Copy Markdown
Member

Describe the PR

This PR tries to fix #2615 . In order to use the correct encoding through the rendering process, the new API Renderer::setReportFile is added.

I'm not entirely sure about CPD renderer, whether we have a similar issue there. But as far as I can see, only the default platform encoding is used, there is no way to change the encoding. There is also no way to specify a output file as a command line option, so CPD renderers always render to Sysout and hence should use the default platform encoding.

Related issues

Ready?

  • Added unit tests for fixed bug/feature
  • Passing all unit tests
  • Complete build ./mvnw clean verify passes (checked automatically by travis)
  • Added (in-code) documentation (if needed).

adangel added 4 commits July 3, 2020 21:11
The XMLRenderer uses by default UTF-8 encoding, but the
writer uses the system default encoding, which doesn't work
well together.

Provide new experimental API Renderer::setReportFile, so that
renderer implementations can create their own writers.
The default implementation in AbstractRenderer is backwards compatible.
@adangel adangel added this to the 6.26.0 milestone Jul 5, 2020
Comment thread pmd-core/src/main/java/net/sourceforge/pmd/cpd/XMLRenderer.java Outdated
Comment thread pmd-core/src/main/java/net/sourceforge/pmd/renderers/XMLRenderer.java Outdated
@ghost

ghost commented Jul 5, 2020

Copy link
Copy Markdown
1 Message
📖 This changeset introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset introduces 106260 new violations, 0 new errors and 0 new configuration errors,
removes 122673 violations, 0 errors and 0 configuration errors.
Full report
This changeset introduces 106260 new violations, 0 new errors and 0 new configuration errors,
removes 122673 violations, 0 errors and 0 configuration errors.
Full report

Generated by 🚫 Danger

@adangel adangel added the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Jul 5, 2020
@adangel

adangel commented Jul 5, 2020

Copy link
Copy Markdown
Member Author

This PR showed actually a bug in pmd-regression-tester: With this PR, we are now escaping certain characters in XML, which we previously didn't:

before:

<?xml version="1.0" encoding="UTF-8"?>
<pmd xmlns="http://pmd.sourceforge.net/report/2.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://pmd.sourceforge.net/report/2.0.0 http://pmd.sourceforge.net/report_2_0_0.xsd"
    version="6.24.0" timestamp="2020-07-05T21:26:00.982">
<file name="/home/andreas/Downloads/chunk-diff-issue-2615/src/TestFile.java">
<violation beginline="2" endline="2" begincolumn="21" endcolumn="35" rule="MethodArgumentCouldBeFinal" ruleset="Code Style" class="TestFile" method="bar" variable="fileName" externalInfoUrl="https://pmd.github.io/pmd-6.24.0/pmd_rules_java_codestyle.html#methodargumentcouldbefinal" priority="3">
Parameter 'fileName' is not assigned and could be declared final
</violation>
</file>
</pmd>

after:

<?xml version="1.0" encoding="UTF-8"?>
<pmd xmlns="http://pmd.sourceforge.net/report/2.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://pmd.sourceforge.net/report/2.0.0 http://pmd.sourceforge.net/report_2_0_0.xsd"
    version="6.26.0-SNAPSHOT" timestamp="2020-07-05T21:26:32.079">
<file name="/home/andreas/Downloads/chunk-diff-issue-2615/src/TestFile.java">
<violation beginline="2" endline="2" begincolumn="21" endcolumn="35" rule="MethodArgumentCouldBeFinal" ruleset="Code Style" class="TestFile" method="bar" variable="fileName" externalInfoUrl="https://pmd.github.io/pmd/pmd_rules_java_codestyle.html#methodargumentcouldbefinal" priority="3">
Parameter &apos;fileName&apos; is not assigned and could be declared final
</violation>
</file>
</pmd>

pmd-regression-tester doesn't parse the character data correctly - with entities, the character data between tags (where the description is), is now split into multiple chunks of character data, which needs to be accumulated. The parser currently only takes the last character data. Hence the difference in the violations, e.g. before: Parameter 'fileName' is not assigned and could be declared final, after: is not assigned and could be declared final.

I'll fix that tomorrow.

adangel added 4 commits July 6, 2020 21:44
In order to properly support different encodings, a OutputStream
is needed. Then Java will take care of unmappaple characters
and encode them as entities for XML.

For backwards compatibility, a writer is still created and exposed.
And use it in both CPD/PMD XMLRenderers.
@adangel adangel removed the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Jul 9, 2020
@adangel

adangel commented Jul 9, 2020

Copy link
Copy Markdown
Member Author

I think, this is ready now.
I refactored the XMLRenderer for PMD to use XMLStreamWriter instead of manually creating XML via StringBuilder...
I learned that the standard XML libraries mostly work correctly, at least in terms of encoding - the only thing to keep in mind is: it works only, if you provide a OutputStream - if you provide a Writer, then now charset specific encoding is happening (like dealing with unmappable characters). Invalid characters are still kept unfortunately, hence I created a new StringUtil method....

I've added the experimental API Renderer::setReportFile which is good for now. But for PMD 7, we should rethink the API: Maybe we can say, that each renderer works with a OutputStream and has to know the encoding (falling back to platform default). It would be nice, to have the Renderer API decoupled from File - at least in unit tests, you often just want to render into memory.

@adangel

adangel commented Jul 16, 2020

Copy link
Copy Markdown
Member Author

@oowekyala oowekyala self-assigned this Jul 19, 2020

@oowekyala oowekyala left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now. As we said last time, the added API is experimental. We can iron it out in PMD 7.

I'll merge once the travis build goes green

Comment thread pmd-core/src/main/java/net/sourceforge/pmd/renderers/Renderer.java
@oowekyala oowekyala merged commit d239987 into pmd:master Jul 23, 2020
@adangel adangel deleted the issue-2615 branch July 24, 2020 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[core] PMD/CPD produces invalid XML (insufficient escaping/wrong encoding)

2 participants