Skip to content

[core] Token scheme generalization#679

Merged
jsotuyod merged 16 commits into
pmd:masterfrom
gibarsin:tokenGeneralization
Oct 30, 2017
Merged

[core] Token scheme generalization#679
jsotuyod merged 16 commits into
pmd:masterfrom
gibarsin:tokenGeneralization

Conversation

@gibarsin

@gibarsin gibarsin commented Oct 21, 2017

Copy link
Copy Markdown
Contributor

Summary

It is intended to a make language-independent token interface to be accessed in a generic way to access them when making auto-fixes or comment-based CPD suppressions.

Any Token should have access to the next token according to the input stream which created it, as well as comment-type tokens which are to be used by each language. Furthermore, these tokens should have a text representation and the limits to where it belongs to the file. This limits are represented by the tuple (beginLine, beginColumn, endLine, endColumn).

These are ideas are reflected by changing the current type of GenericToken to an interface type declaring these methods and having the JavaCC Token classes be modified to suit this interface.

Changes

  • Changed GenericToken from class to interface type and declarated methods to be implemented in every language-specific Token class
  • Updated Token classes after being generated by JavaCC using Ant tasks, to comply with the implementation of the GenericToken. The addressed languages are Java, JSP, VF, C/C++, Javascript, Matlab, Objective-C, PL/SQL, Python and VM.

@jsotuyod jsotuyod self-assigned this Oct 22, 2017
@jsotuyod

Copy link
Copy Markdown
Member

Thanks for the PR. This looks very interesting, and is something I've been looking forward in order to bring comment-based CPD suppression to other languages for quite some time (see #250)

Current implementations of GenericToken (Java, JSP, VisualForce) only return a new instance of the RegionByLine when needed, which addresses the following issues:

I can't help but wonder if this is actually a good idea. Both JavaCC and Antlr expose the token's location on the text directly. What are we seeing they are not? These guys have been working with tokens for ages, and I don't presume to know better than they do on this topic.

To take into account. This is a breaking API change, meaning it has to make it for 6.0.0, or wait until 7.0.0. Moreover, whatever form this change takes (either a new object or not) it will have to remain supported for the whole major version it gets in. I'm not so comfortable with the "let's do this and measure memory usage once the feature is ready" approach unless we are certain of an issue we can avoid this way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concept of special token is a little too JavaCC-related... Antlr just sends the tokens to a different channel.

Maybe we should be a little more semantic here on what we expect from grammars for PMD. Maybe something such as getPrevComment() or something of the sort?

Of course the docs would need to be updated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so in the cases of auto-fixes or CPD we want to preserve comments so what you say fits in perfectly.

However, if we go through that way and the case is presented where a non-comment and "non-regular" token is needed to be used, the solution would be either to add a specific method for that case or to deprecate this method and make a more general one.

I currently cannot foresee if PMD will ever need to address this cases, so I'm just speculating.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A special token (JavaCC) is a token that:

  1. can appear at any point in the grammar
  2. is not part of the AST itself

I can think of no thing other than comments that may fit this criteria under a programming language perspective. Remember, what we define here is jut what we expect from PMD grammars. I feel comfortable with such semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, I have just updated the interface's method. I will now follow to update the PR's comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it simple, getNext()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gets the token's text

Comment thread pmd-java/src/main/ant/alljavacc.xml Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little too generic... remember this will replace all occurrences of specialToken; with your contents

Comment thread pmd-jsp/src/main/ant/alljavacc.xml Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@gibarsin

Copy link
Copy Markdown
Contributor Author

Let me address this by doing more research in ANTLR's code about why did they take the design decision of exposing directly those values.

@jsotuyod

Copy link
Copy Markdown
Member

I see the change is impacted on Java, JSP and VF, but not on C/C++, Javascript, Matlab, Objective-C, PL/SQL, Python or VM. Any reason for this?

@gibarsin

Copy link
Copy Markdown
Contributor Author

Reason at the time was that there did not exist a dependency over the GenericToken class for those languages. However, the whole reason for this generalization is to make it language-independent, so it doesn't make sense those cases not to have their GenericToken interface dependency. I'll put myself to work to impact these changes on those languages. Thanks for noticing.

@gibarsin gibarsin force-pushed the tokenGeneralization branch 2 times, most recently from 60b8875 to e6487f6 Compare October 22, 2017 15:31
@jsotuyod jsotuyod added the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Oct 23, 2017
@gibarsin

Copy link
Copy Markdown
Contributor Author

For the record, the decision to have a separated class for storing the tuple (beginLine, beginColumn, endLine, endColumn) of a Token has been discontinued, due to the lack of proof that the creation of these instances do or do not affect the memory/cpu performance when running the analysis. The solution is to have a separated method for each of the fields in this tuple, directly exposed in the GenericToken interface.

@jsotuyod jsotuyod removed the is:WIP For PRs that are not fully ready, or issues that are actively being tackled label Oct 23, 2017
@jsotuyod

Copy link
Copy Markdown
Member

@adangel I'm having trouble figuring out why Travis fails with an IncompatibleClassChangeError analyzing pmd-visualforce here https://travis-ci.org/pmd/pmd/jobs/291306568#L8386

The change to GenericToken is the same to all modules, yet only pmd-visualforce is failing. Also, Maven works if we do mvn clean pmd:pmd, but fails if the code is compiled before pmd runs. Is this maybe an issue on how the Maven plugin populates the auxclasspath?

@jsotuyod

Copy link
Copy Markdown
Member

@adangel upon further review, I believe this is down to the parent-first classloading strategy (which may cause trouble when analyzing PMD itself AND when analyzing a third-party app that uses a different version of a dependency we use). However, I'm still not certain why this doesn't happen on some modules such as pmd-java...

@jsotuyod

Copy link
Copy Markdown
Member

Depends on #680

@gibarsin gibarsin force-pushed the tokenGeneralization branch from 99f8e22 to 91b8a22 Compare October 28, 2017 17:44
@jsotuyod jsotuyod added an:enhancement An improvement on existing features / rules in:pmd-internals Affects PMD's internals labels Oct 28, 2017
@jsotuyod

Copy link
Copy Markdown
Member

@adangel in my opinion this is ready to be merged. However, this means Travis will now fail on master, and all new PRs until we can bump the PMD version we use to analyze PMD from 5.8.1 to a 6.0.0 version.

Unfortunately, doing so is non-trivial, and we can't even use a snapshot in the meantime. The maven plugin 3.8 creates instances of RuleSetFactory directly, and those constructors were changed as part of #680 which is the PR that changed classloading strategy for auxclasspath.

I see 3 alternatives:

  • we hold this PR until we are ready to publish PMD 6.0.0 (with ruleset reorganization being the biggest pending issue)
  • we merge as is and deal with broken Travis until we release 6.0.0
  • any chance to update the PMD plugin to work against the latest PMD snapshot, and publish a snapshot of that plugin we may use until we finally release 6.0.0?

@adangel

adangel commented Oct 29, 2017

Copy link
Copy Markdown
Member

@jsotuyod
I vote against merging right now and leave master broken.

One alternative would be, to temporarily skip the pmd plugin (-Dpmd.skip=true). Given, that we anyway don't enforce PMD on PMD (see #361), this sounds doable.

any chance to update the PMD plugin to work against the latest PMD snapshot, and publish a snapshot of that plugin we may use until we finally release 6.0.0?

Possibly: The current snapshots are available here: https://repository.apache.org/content/groups/snapshots/org/apache/maven/plugins/maven-pmd-plugin/3.9.0-SNAPSHOT/
However, I'm not sure, whether this will work: I just tried to set the property pmdVersion in our pom.xml to 6.0.0-SNAPSHOT (which should be available at least in my local repo, since I ran mvn clean install before), but maven sees now a cyclic dependency... So, I didn't even run into the changed API problem yet...

Btw. I still don't know, why the problem only occurs with the visualforce module and not earlier (e.g. java). It also appears with plsql.

@jsotuyod

Copy link
Copy Markdown
Member

@adangel sounds reasonable, I'll merge disabling PMD check on Travis then.

However, I'm not sure, whether this will work: I just tried to set the property pmdVersion in our pom.xml to 6.0.0-SNAPSHOT (which should be available at least in my local repo, since I ran mvn clean install before), but maven sees now a cyclic dependency... So, I didn't even run into the changed API problem yet...

Yes, you would have to fix it at a particular snapshot build for it to work.

Possibly: The current snapshots are available here: https://repository.apache.org/content/groups/snapshots/org/apache/maven/plugins/maven-pmd-plugin/3.9.0-SNAPSHOT/

is this already compatible with the latest 6.0.0?

@jsotuyod jsotuyod merged commit 91b8a22 into pmd:master Oct 30, 2017
jsotuyod added a commit that referenced this pull request Oct 30, 2017
@jsotuyod

Copy link
Copy Markdown
Member

Merged as part of 6.0.0, PMD checks temporarily disabled from Travis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

an:enhancement An improvement on existing features / rules in:autofixes Affects the autofixes framework in:pmd-internals Affects PMD's internals

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants