Skip to content

[core] Jaxen adapter implements document nodes incorrectly #1938

@oowekyala

Description

@oowekyala

The XPath spec expects that the data model (our adapter implementation) respect the XPath data model spec. Among other things, this mandates a separation between regular element nodes, and the document node:

  • Element nodes are regular XML elements eg <foo>text</foo>. In our model those correspond to AST nodes.
  • The document node is virtual. It is not an element. It is always the root of the DOM. Perhaps surprisingly, it may have several children. The key restriction, is that only a single child may be an element node. This means there can be children comments, or processing instructions. (Processing instructions are not well known but basically are those <? ?> delimited nodes, eg FXML imports)

The point is, that equating the node that is at the top of a tree and the document node doesn't respect this specification. Our Jaxen adapter does this at the moment, and for that reason, there is yet another inconsistency between our XPath implementations. The document node contains a root element node - this is why the XPath kind test form document-node(element(CompilationUnit)) exists.

Explanation of the linked example To understand what's happening in the linked example, consider the following:
  • As per XPath specification:
    • the expression / (a single slash) is perfectly valid and yields the special document node (!= root element).
    • Starting a path expression with a slash evaluates the expression against the document node.
    • A segment in a path expression with just a name (eg UserClass) implicitly uses the child axis (so is equivalent to child::UserClass)
  • So what's the difference between Jaxen and Saxon here?
  • With our Jaxen adapter, the document node is the root element node. So /UserClass expands to /child::UserClass. The problem is that our implementation considers that /, the document node, is the root UserClass. So /child::UserClass actually only yields UserClass that are children of the root of the AST -> meaning, not the root UserClass, but only nested classes. So to actually get the root element you have to use /self::UserClass
  • Out Saxon adapter does the right thing and the document node is properly separated from the root UserClass, so that / evaluates to a DocumentNode instance whose only child is the root UserClass. So /UserClass actually yields the top-level class.

This is yet another nail in the coffin of our Jaxen adapter.

Anyway we should be paying attention to that when refactoring XPath support, and in particular:

  • Check that the expression / evaluates to the synthetic document node
  • Check the expression (/)[self::document-node()] yields a non-empty sequence
  • Check that eg for Java, the expression (/)[self::document-node(element(CompilationUnit))] yields a non-empty sequence

Metadata

Metadata

Assignees

No one assigned

    Labels

    in:xpathRelating to xpath support at large, eg Jaxen / Saxon, custom functions, attribute resolutionwas:wontfix

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions