[core] Token scheme generalization#679
Conversation
|
Thanks for the PR. This looks very interesting, and is something I've been looking forward in order to bring comment-based CPD suppression to other languages for quite some time (see #250)
I can't help but wonder if this is actually a good idea. Both JavaCC and Antlr expose the token's location on the text directly. What are we seeing they are not? These guys have been working with tokens for ages, and I don't presume to know better than they do on this topic. To take into account. This is a breaking API change, meaning it has to make it for 6.0.0, or wait until 7.0.0. Moreover, whatever form this change takes (either a new object or not) it will have to remain supported for the whole major version it gets in. I'm not so comfortable with the "let's do this and measure memory usage once the feature is ready" approach unless we are certain of an issue we can avoid this way. |
There was a problem hiding this comment.
the concept of special token is a little too JavaCC-related... Antlr just sends the tokens to a different channel.
Maybe we should be a little more semantic here on what we expect from grammars for PMD. Maybe something such as getPrevComment() or something of the sort?
Of course the docs would need to be updated
There was a problem hiding this comment.
Ok so in the cases of auto-fixes or CPD we want to preserve comments so what you say fits in perfectly.
However, if we go through that way and the case is presented where a non-comment and "non-regular" token is needed to be used, the solution would be either to add a specific method for that case or to deprecate this method and make a more general one.
I currently cannot foresee if PMD will ever need to address this cases, so I'm just speculating.
There was a problem hiding this comment.
A special token (JavaCC) is a token that:
- can appear at any point in the grammar
- is not part of the AST itself
I can think of no thing other than comments that may fit this criteria under a programming language perspective. Remember, what we define here is jut what we expect from PMD grammars. I feel comfortable with such semantics.
There was a problem hiding this comment.
Perfect, I have just updated the interface's method. I will now follow to update the PR's comment
There was a problem hiding this comment.
this seems a little too generic... remember this will replace all occurrences of specialToken; with your contents
|
Let me address this by doing more research in ANTLR's code about why did they take the design decision of exposing directly those values. |
|
I see the change is impacted on Java, JSP and VF, but not on C/C++, Javascript, Matlab, Objective-C, PL/SQL, Python or VM. Any reason for this? |
|
Reason at the time was that there did not exist a dependency over the GenericToken class for those languages. However, the whole reason for this generalization is to make it language-independent, so it doesn't make sense those cases not to have their GenericToken interface dependency. I'll put myself to work to impact these changes on those languages. Thanks for noticing. |
60b8875 to
e6487f6
Compare
|
For the record, the decision to have a separated class for storing the tuple (beginLine, beginColumn, endLine, endColumn) of a Token has been discontinued, due to the lack of proof that the creation of these instances do or do not affect the memory/cpu performance when running the analysis. The solution is to have a separated method for each of the fields in this tuple, directly exposed in the GenericToken interface. |
|
@adangel I'm having trouble figuring out why Travis fails with an The change to |
|
@adangel upon further review, I believe this is down to the parent-first classloading strategy (which may cause trouble when analyzing PMD itself AND when analyzing a third-party app that uses a different version of a dependency we use). However, I'm still not certain why this doesn't happen on some modules such as pmd-java... |
|
Depends on #680 |
99f8e22 to
91b8a22
Compare
|
@adangel in my opinion this is ready to be merged. However, this means Travis will now fail on master, and all new PRs until we can bump the PMD version we use to analyze PMD from 5.8.1 to a 6.0.0 version. Unfortunately, doing so is non-trivial, and we can't even use a snapshot in the meantime. The maven plugin 3.8 creates instances of I see 3 alternatives:
|
|
@jsotuyod One alternative would be, to temporarily skip the pmd plugin (
Possibly: The current snapshots are available here: https://repository.apache.org/content/groups/snapshots/org/apache/maven/plugins/maven-pmd-plugin/3.9.0-SNAPSHOT/ Btw. I still don't know, why the problem only occurs with the visualforce module and not earlier (e.g. java). It also appears with plsql. |
|
@adangel sounds reasonable, I'll merge disabling PMD check on Travis then.
Yes, you would have to fix it at a particular snapshot build for it to work.
is this already compatible with the latest 6.0.0? |
|
Merged as part of 6.0.0, PMD checks temporarily disabled from Travis. |
Summary
It is intended to a make language-independent token interface to be accessed in a generic way to access them when making auto-fixes or comment-based CPD suppressions.
Any Token should have access to the next token according to the input stream which created it, as well as comment-type tokens which are to be used by each language. Furthermore, these tokens should have a text representation and the limits to where it belongs to the file. This limits are represented by the tuple (beginLine, beginColumn, endLine, endColumn).
These are ideas are reflected by changing the current type of GenericToken to an interface type declaring these methods and having the JavaCC Token classes be modified to suit this interface.
Changes