Skip to content

[core] Refactor CPD token filtering#1039

Merged
adangel merged 12 commits into
pmd:masterfrom
jsotuyod:issue-695
Apr 23, 2018
Merged

[core] Refactor CPD token filtering#1039
adangel merged 12 commits into
pmd:masterfrom
jsotuyod:issue-695

Conversation

@jsotuyod

@jsotuyod jsotuyod commented Apr 15, 2018

Copy link
Copy Markdown
Member
  • Define a generic TokenFilter interface in pmd-core
  • Provide a base, extension-friendly JavaCCTokenFilter to process and filter JavaCC token streams, honoring CPD-OFF and CPD-ON comments
  • Refactor the JavaTokenizer to use JavaCCTokenFilter by extending it and adding custom Java-specific token filters
  • Resolves [core] Extend comment-based suppression to all JavaCC languages #695

Supported languages:

  • C/C++ support
  • Java
  • Ecmascript
  • JSP - Can't be supported as is, comments are actual AST nodes, changing them to special tokens would be a breaking API change as PMD rules visit those nodes and analyze them
  • Matlab
  • Objective-C
  • PL/SQL
  • Python
  • Visualforce - Whitespace is handled very weirdly, needs major refactors to the grammar, not within the scope of this PR
  • Velocity - Doesn't currently support CPD, not within the scope of this PR

 - Define a generic `TokenFilter` interface in pmd-core
 - Provide a base, extension-friendly `JavaCCTokenFilter` to process and
filter JavaCC token streams, honoring `CPD-OFF` and `CPD-ON` comments
 - Refactor the `JavaTokenizer` to use `JavaCCTokenFilter` by extending
it and adding custom Java-specific token filters
@jsotuyod jsotuyod added an:enhancement An improvement on existing features / rules is:WIP For PRs that are not fully ready, or issues that are actively being tackled in:cpd Affects the copy-paste detector labels Apr 15, 2018
@jsotuyod jsotuyod added this to the 6.3.0 milestone Apr 15, 2018
@jsotuyod

Copy link
Copy Markdown
Member Author

The core is done, I'm starting to add it to different languages.
CPP is done, but needed changes to the grammar, as comments were dropped. Other languages currently using JavaCC will follow.

@jsotuyod jsotuyod added is:WIP For PRs that are not fully ready, or issues that are actively being tackled and removed is:WIP For PRs that are not fully ready, or issues that are actively being tackled labels Apr 22, 2018
@jsotuyod

Copy link
Copy Markdown
Member Author

@adangel @oowekyala this is ready now. I left out 3 JavaCC based languages, the reasons for each are detailed on the PR notes.

@oowekyala oowekyala mentioned this pull request Apr 22, 2018
64 tasks

@adangel adangel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks good to me. I'll merge it tomorrow.

Comment thread docs/pages/release_notes.md Outdated
Back in PMD 5.6.0 we introduced the ability to suppress CPD warnings in Java using comments, by
including `CPD-OFF` (to start ignoring code), or `CPD-ON` (to resume analysis) during CPD execution.
This has proved to be much more flexible and versatile than the old annotation-based approach,
and has since been the prefered way to suppress CPD warnings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: preferred

Comment thread docs/pages/release_notes.md Outdated
This has proved to be much more flexible and versatile than the old annotation-based approach,
and has since been the prefered way to suppress CPD warnings.

On this ocassion, we are xtending support for comment-based suppressions to many other languages:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: extending

.getDefaultVersion().getLanguageVersionHandler();
reader = new StringReader(buffer.toString());
TokenManager tokenManager = languageVersionHandler
TokenFilter tokenFilter = new JavaCCTokenFilter(languageVersionHandler

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed, that we are using Rhino for parsing javascript, but have an extra es5 javacc grammar for tokenizing. I think, we should streamline this and use both Rhino for parsing and tokenizing. This could be done together with #699. What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, at the very least for consistency.

@adangel adangel self-assigned this Apr 23, 2018
@adangel adangel merged commit f855b57 into pmd:master Apr 23, 2018
@jsotuyod jsotuyod deleted the issue-695 branch April 23, 2018 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

an:enhancement An improvement on existing features / rules in:cpd Affects the copy-paste detector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants