Skip to content

[core] Match abstract types in XPath queries #1787

@oowekyala

Description

@oowekyala

Closely related to #1785

Turning some nodes into interfaces like in #1759 removes some nodes from the AST, and makes us rely more on the Java way of abstracting stuff (interfaces). XPath uses other mechanisms and we must also cater to them so as not to lose expressivity.

Today an XPath expression //Expression would match most expression nodes (though not all, see footnote 1).

Turning ASTExpression into an interface improves the Java side of the API, but now there's no more way to match all expressions. One possible solution that's already in #1759 is to expose an attribute on the relevant nodes:

https://github.com/oowekyala/pmd/blob/f978a1e71171dd30a40573f9ef6f944e032d8b2f/pmd-java/src/main/java/net/sourceforge/pmd/lang/java/ast/ASTExpression.java#L41-L48

This makes it possible to match expressions with //*[@Expression=true()]

Doing that for every interesting supertype adds noise to the Java API though, and I don't think it's a viable solution in the long run. It's also not very readable.

Towards a better solution

The XPath specification mandates that an element have a single local name. This makes it impossible to specify that eg an InstanceOfExpression can be matched by a set of names, eg both the names InstanceOfExpression and Expression. So that means there's no way a segment /Expression/ in the new grammar matches an InstanceOfExpression.

So we can rule that out.

XPath Schema

XPath has a type system and some syntax to match some elements by type. The syntax is described here. If we could do that, then a test for an expression could look like //element(*, Expression).

This syntax is nice, but schema aware processing is only available in the professional editions of Saxon.

So we can't implement that directly with Saxon

Some custom function

We could also expose a custom function, that tests directly the type of the node.

E.g.

  • //*[nodeIs('net.sourceforge.pmd.lang.java.ast.ASTExpression)],
  • or with some suger //*[nodeIs('Expression')], where we add the necessary prefix to the string 'Expression'

XPath expression rewrite

If we have #1243, then we can actually rewrite segments like //element(*, Expression) to some type test like above, and thus mimic the normal XPath syntax transparently.

Closed set of matchable supertypes

The above solution would technically allow to match any java type. Do we want that though? Should someone be able to match AbstractJavaNode, Node, or some interfaces that are just there for implementation convenience, like AnnotatableNode? That would increase coupling between XPath queries and the Java code...

If the answer to the above questions are no, then we should design a way to specify which type should be matchable. I expect this to use exactly the same info as #1786, so implementing it could allow us to do that.


Footnote 1: There are some corner cases in the current grammar where no ASTExpression node is pushed, even if the node is an expression. That happens eg for the concise try-with-resources statement which pushes a bare Name...

Metadata

Metadata

Assignees

No one assigned

    Labels

    in:astAbout the AST structure or API, the parsing stepin:xpathRelating to xpath support at large, eg Jaxen / Saxon, custom functions, attribute resolution

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions