Skip to content

[core] Use annotations to resolve rule dependencies to frameworks#437

Closed
oowekyala wants to merge 4 commits into
pmd:masterfrom
oowekyala:annotations
Closed

[core] Use annotations to resolve rule dependencies to frameworks#437
oowekyala wants to merge 4 commits into
pmd:masterfrom
oowekyala:annotations

Conversation

@oowekyala

@oowekyala oowekyala commented Jun 11, 2017

Copy link
Copy Markdown
Member

Changes description:

Rules written in Java can now use annotations to specify their dependencies to a specific framework (@UsesDfa, @UsesTypeResolution, @UsesMetrics). The concept can be extended to other languages.

Rationale:

Atm these dependencies must be specified as an xml attribute in the ruleset. However, the dependencies really are part of the implementation of the rules, and the rules can't work properly without them. For example, DataflowAnomalyAnalysisRule throws hundreds of NPEs if the dfa="true" attribute is ommited in the xml. Correct me if I'm wrong, but that's not a feature, that's a bug.

I think it's safer that those dependencies be specified with the implementation of the rule (and annotations are an elegant way to do so). That way, the user need not worry about them (the implementation is none of their concern), and their rulesets can be a little bit cleaner. And of course old rulesets still work.

Possible improvements and discussion

  • One thing could justify the need for a dfa attribute, and it would be precisely to set it to false: should a user not want to use DFA (for performance issues, or benchmarking), setting the attribute to false should deactivate the rule (to prevent its malfunction). I could implement that if you agree.
  • Do you think the class attribute of a <rule> tag is necessary? I understand it helps the user find the class if they want to modify it, but wouldn't rulesets be easier to make and read if we could identify the class of the rule from its name?

@jsotuyod

Copy link
Copy Markdown
Member

@oowekyala this is really interesting. I have to confess I've been thinking about this myself (actually, I even spent some time pondering on this earlier today!).

Pros of this approach:

  • ease of use
  • self containment of the rule (keep all implementation details in a single place)

Cons of this approach:

  • the startup penalty of using reflection (not necessarily significant)
  • the possible inconsistencies (have the XML say it's false, and the annotation it's true, and you end up with bugs, misconfigured rules enabling features they don't really need or a very confused developer)
  • this alone doesn't guarantee self-containment yet. Messages, description, and rule setup (class reference / site url) are all still done in the ruleset.xml
  • there is no way to kill the XML tags AFAIK, since XPath rules wouldn't be able to use annotations; and I don't think we can tell ahead if an XPath expression uses a function short of awefull and hacky regular expressions; so we are stuck with 2 different ways to do the same, and therefore loose consistency

I think this is very interesting, but needs a deep discussion and a proper planning on how such feature should work / rollout plan shall we decide to proceed with it.

@jsotuyod jsotuyod added the a:suggestion An idea, with little analysis on feasibility, to be considered label Jun 12, 2017
@oowekyala

oowekyala commented Jun 13, 2017

Copy link
Copy Markdown
Member Author

@jsotuyod Thanks for your input!

I don't think the startup penalty would be significant. Besides, we already use reflection through instantiating rules with newInstance ^^

I agree we must think more deeply about how to tackle inconsistencies between the ruleset and the implementation of the rules. Dependencies are part of the implementation of a rule, so it makes sense that Java rules declare theirs inside the class and not the ruleset. However XPath rules have their implementation contained in the rulesets files, so we could make it compulsory for xpath rules to declare their dependencies, something like declaring an xpath rule like the following:

<rule name="xxx"
      description="xxxx" >
      <xpath>
          <dependencies typeResolution="true" /> <!-- the others are implicitly false -->
          <expression>
 <![CDATA[
//ForStatement (...)
 ]]>
          </expression>
      </xpath>
</rule>

This would make it implicit that the class is net.sourceforge.pmd.lang.rule.XPathRule, but make the dependencies explicit (the dependencies tag would be required) and warn programmers to declare them. Moreover it would encapsulate the actual implementation of XPath rules (instances of XPathRule with an xpath property), which might do us some good should we tweak it oneday.

Then we could drop the metrics, dfa and typeResolution attributes from the rule tag itself, instead specifying the dependencies only in the class for Java rules and only in the xpath/dependencies tag for XPath rules.

We could enforce stricter validation assertions in the XML schema, for example making sure that

  • no class attribute or xpath tag are specified in a rule that has a ref attribute.
  • no class attribute is specified when there is an xpath tag and conversely

Custom java rules would still use the class attribute and annotations.

this alone doesn't guarantee self-containment yet. Messages, description, and rule setup (class reference / site url) are all still done in the ruleset.xml

Message, description and site URL are metadata, so I don't think they should stick with the implementation of a rule. We could however broaden the contract of the Rule interface to include methods that return description and default message as strings ---but then xpath rules would have to declare them in the xpath tag.

All in all, this approach would make implementation stick with implementation, and metadata with metadata, which I think is great. Plus, that would be a best effort to ensure that rule dependencies can't be misconfigured in the ruleset.
...
Such a change in the schema would be IMO for the better, but obviously not retrocompatible (although old rulesets could be ported somewhat easily to the new version). This would only be appropriate for a major release. It would imply modifying the schema, parse XPath rules separately in the RuleSetFactory, and refactoring all rulesets to conform to the new schema.

(Besides, RuleSetFactory could benefit from a little refreshing ---the study the other day saying it was a god class is probably not so wrong.)

@adangel

adangel commented Jun 16, 2017

Copy link
Copy Markdown
Member

Nice idea, indeed!

Just my 2 cents:

  • Users of rules shouldn't be bothered by the implementation - and they aren't right now: When using a PMD-provided rule, you just reference the rule (and you should not define a new rule referencing the same class). Say, you want to use "CollapsibleIfStatements" from the java basic ruleset, you'd use in your custom ruleset:

      <rule ref="rulesets/java/basic.xml#CollapsibleIfStatements" />
    

    You can override some parts, like priority, description, message, properties ... . I'm not sure, whether we currently prevent the users from overriding the class attribute, dfa attribute, typeresolution attributes. This should not be possible.

  • In terms of possible inconsistencies - the annotation would always need to be the "truth". Disabling typeresolution for a rule, which depends on typeresolution will make the rule failing (that would good) or run the rule and yield many false positives (bad).

  • Could we somehow detect automatically for XPath rules, whether they depend on type resolution? E.g. they could call specific helper XPath functions or access specific attributes of nodes (type). That way, we could move this entirely to the implementation side out of the meta data.

@jsotuyod

Copy link
Copy Markdown
Member
  • Could we somehow detect automatically for XPath rules, whether they depend on type resolution? E.g. they could call specific helper XPath functions or access specific attributes of nodes (type). That way, we could move this entirely to the implementation side out of the meta data.

I've been researching this a bit and there may be a way to do this...

For Saxon we can write a visitor, start from xpathExpression.getInternalExpression() and recursively iterateSubExpressions(), looking for expressions of type FunctionCall (or one of it's subtypes more accurate?), cast it, and we may check the function being referenced before evaluation.

For Jaxen it's a little more complicated. We would need to take the BaseXPath.getExpr(), and recursively ask if it's an instance of BinaryExpression or whatever to cast it and be able to transverse it, and then again check for FunctionCallExpr. Upon finding one, we may get the prefix and function name to know which one we are referring to.

Of course we would want all of this to be done just once upon creation.

Also of note, during this research I found that xpath expressions are recreated (and therefore re-parsed) each time they are evaluated. I'm not certain this is actually needed.

@oowekyala

Copy link
Copy Markdown
Member Author

@jsotuyod Out of curiosity, I wonder why we still support Jaxen. I mean, from what I understand, XPath 2.0 is a superset of XPath 1.0 so maybe there's no point in maintaining support for both separately? This would make the change you describe easier to implement in case we decide to proceed with it

@jsotuyod

jsotuyod commented Nov 1, 2017

Copy link
Copy Markdown
Member

@oowekyala I'm not sure why, but our current implementation of XPath 2.0 is not necessarily compatible with XPath 1.0. All XPath rules are currently defaulting to 1.0 (Jaxen), and most of the time, changing them to 2.0 (or sometimes even 1.0 compatibility mode) will result in failure to match nodes. This may be an issue on our implementation (specially due to the shortcuts taken to use the Rule Chain), but at this point I'm not certain.

For future releases we may check on those difference, check performance differences between both implementations, and deprecate whichever one we want to drop (also, we use a very old Saxon version, but updating is non-trivial). For the time being we are stuck with supporting both.

@oowekyala

Copy link
Copy Markdown
Member Author

Interesting, thanks!

@oowekyala

Copy link
Copy Markdown
Member Author

Closing this because of clutter. The discussion may be continued here or on an RFC when the time comes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:suggestion An idea, with little analysis on feasibility, to be considered

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants