Skip to content

[java] Make nodes have access to their text#2166

Merged
oowekyala merged 35 commits into
pmd:java-grammarfrom
oowekyala:node-text-charseq
Jan 7, 2020
Merged

[java] Make nodes have access to their text#2166
oowekyala merged 35 commits into
pmd:java-grammarfrom
oowekyala:node-text-charseq

Conversation

@oowekyala

Copy link
Copy Markdown
Member

Java nodes now have a method getText(), which provides access to the full text of the node. This is mostly intended as a debugging aid, and not as a way of writing rules (hence the @NoAttribute).

This implementation uses:

  • a special token class, that is based on a start/end offset instead of lines and columns. This is more economic memory-wise, and allows us to cut a piece of the file text easily.
    • one downside of this representation is that getBeginLine/getEndline for nodes is less efficient. I don't think this will be a real problem though
  • a special CharStream that records text offsets of tokens

Both of these classes are added to pmd-core instead of generated by Javacc then hacked through Ant search/replace. I think, most of our javacc parser implementations could use the same token implementations and get the feature for nearly free. It's also nice to remove the duplication of the Token class (currently one per javacc module).

Future work:

  • Refactor and use the Document API that was introduced and abandoned in pmd-core, to represent text regions of a document.
  • Make a separate AbstractNode class, that doesn't have those beginLine, endLine, etc. fields. These can be fetched from the tokens directly
  • Share this feature with other language modules

@oowekyala oowekyala added the in:ast About the AST structure or API, the parsing step label Dec 16, 2019
@oowekyala oowekyala added this to the 7.0.0 milestone Dec 16, 2019
@ghost

ghost commented Dec 16, 2019

Copy link
Copy Markdown
1 Message
📖 No java rules are changed!

Generated by 🚫 Danger

@oowekyala oowekyala merged commit ac14db9 into pmd:java-grammar Jan 7, 2020
@oowekyala oowekyala deleted the node-text-charseq branch January 7, 2020 14:39
oowekyala added a commit that referenced this pull request Jan 10, 2020
* Make Java nodes text-available
* Introduce shared JavaccToken in pmd-core
* Use factory to produce char streams

Tests are still on java-grammar,
since they use the DSL & newer
AST structure.

This is to prepare for other changes
that concern all javacc languages and
should not be done on java-grammar
@adangel adangel mentioned this pull request Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in:ast About the AST structure or API, the parsing step

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant