Skip to content

[apex] Replace Jorje with fully open source front-end #3766

@aaronhurst-google

Description

@aaronhurst-google

Description

PMD utilizes the Apex Jorje library to parse Apex source and generate an AST. The fact that this library is closed-source introduces two major headaches for us:

  • Without source transparency, as a policy, there are severe restrictions on how and where code can be run in our environment. The closed-source Jorje dependency is tainting the whole PMD Apex analyzer.
  • This limits enhancement or modification, either to provide early patches for new language features or for other extensions.

I can't imagine it's particularly pleasant to develop on top of an undocumented API. Or to slice-and-dice and integrate new versions.

Describe the solution you'd like

I'd like to explore (and contribute to) the transition to a fully open-source Apex front-end.

In my organization, we're using @nawforce's Apex grammar to parse almost 1MLoC, now with zero issues. Built on top of this, we've got an implementation to convert the resulting parse tree to an abstract syntax tree data structure-- excluding the SOQL query syntax, which isn't relevant here. This is not yet open-sourced, but that is obviously a prerequisite to move forward.

The integration of this code ought to be fairly mechanical, to translate this AST data structure to the existing PMD AST. If there are any gaps or incompatibilities, I guess I'm volunteering to accommodate the requested modifications. Validating the semantic equivalence of the representation is a little trickier, but my sense is that this wouldn't be a big hurdle.

The biggest risk is likely in swapping out one parser for another, and that the tool no longer parses an Apex file that it previously accepted. OTOH, any of these issues could actually be fixed upstream, which is not currently a possibility.

So... that's the idea and the offer... wanted to gauge your interest and gather your concerns.

The chosen open source front-end: Summit AST

... is Summit AST
Maven Repo for Snapshots: https://google.oss.sonatype.org/content/repositories/snapshots/com/google/summit/summit-ast/

Releases: https://central.sonatype.dev/artifact/com.google.summit/summit-ast/1.0.0/overview
https://repo1.maven.org/maven2/com/google/summit/summit-ast/

PR List

Note: All PRs are merged into branch https://github.com/pmd/pmd/tree/experimental-apex-parser

Related work

Metadata

Metadata

Assignees

No one assigned

    Labels

    an:enhancementAn improvement on existing features / rulesin:grammarAbout the grammar of a lexer or parser, eg, a parse/lex exception

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions