-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[core] Support language dialects #5438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- The idea is to force people defining a metadata to follow a different implementation path (and hence, ensuring it behaves as a dialect)
- This takes in DialectLanguageMetadata instances and ensures the
implementation for the language:
- Uses the same parser as the base language, and therefore the same
AST, so rules of the base language are applicable to a dialect.
- Keeps all decorators from the base language, and adds the
dialect's
- Keeps all XPath functions from the base language, and adds the
dialect's
- Keeps all metrics from the base language, and adds the dialect's
- Keeps all violation suppression from the base language, and adds
the dialect's
- Keeps all language properties from the language, and adds the
dialect's
- Now they are proper dialects, without needing to extend or have a compile-time dependency into the base language.
- When checking applicability we should always pass the proper language, not the applied dialect nonetheless, for proper version checking.
- Tests are passing, but we are not doing proper version check for rule
application.
- Overall, things are still tricky… there are several things that I've
found while working on this that are not really sorted out:
- If more than one language is applicable (even without dialects), rules
get initialized for all processors, even those that don't apply.
- Analysis cache is unaware of the chosen lang-version for the
analysis, so even if the ruleset is the same, results may vary.
- FileVersionDiscoverer configures the forced language, but doesn't
apply it… that is done by the FileCollector, and not on all flows…
- LanguageVersionHandler has no access to the actual
LanguageVersion… at all. These are only available to the
LanguageProcessor / LanguagePropertyBundle, and only to the
instance created in PmdAnalysis
|
Compared to main: |
- They are used in collections, but always assuming they are singletons. - In practice this is true, but inherently, we actually expect QNames to be unique (that is how Saxon finds them underneath), so let's make that explicit in the implementation.
- Make it readily available for all tests instead of defined outside of it only for Saxon tests.
- Add a dialect-level xpath fn, ensure it's available. - Add skeletons for other promises to be tested
- Avoid any binary incompatibility. The classes remain, but unused.
|
Sorry for the long time without following up on this… This is now fully functional, and honors all the promises. There are only 2 things pending, none of which I consider a blocker at this point in time.
Why do I say our lang version management is clanky?
So there is a lot of work to be done with this… but again, none of it seems to be blocking supporting dialects as an experimental feature; nor block in any way the dialects we are already shipping with PMD. |
adangel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've reviewed it and found a couple of typos/nitpicks/etc. I think, there are only 2 questions left, but since this is for now an experimental feature, I don't think we should wait any further. I'm going to leave these comments open and resolve the others I fixed in the below mentioned commits.
I've fixed the typos/nitpicks here: https://github.com/pmd/pmd/tree/pr-5438 since I can't push to your branch on Monits. This also includes a merge from main and release notes.
Specifically - when I'm going to merge this today or early tomorrow - these are the two commits:
| ## Steps | ||
|
|
||
| ### 1. Create a dialect module | ||
| * Dialects usually reside in the same module of the base language they leverage; but can technically live standalone in a separate module if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future:
Should we demand a more rigorous structure? While it's technically possible, should we allow separate modules?
Grouped together with the base language - Pro:
🟢 This makes sure, that base language is always available at runtime.
🟢 Easy to see which dialects are existing.
🟢 Clear and unambiguous rule
Con:
🔴 It is more a technical grouping, than logical. E.g. Visualforce would belong to a (future) pmd-salesforce module and not to pmd-xml...
🔴 If you add the dependency to pmd-xml, you might get a whole bunch of languages/dialects, you don't necessarily need/aren't interested in.
Since this is experimental, we can still change this. But it would have an impact of currently existing modules, which would need to be refactored, so that PMD itself serves as a good example.
We also should suggest a base package name for all the classes created for the dialect: net.sourceforge.pmd.lang.<base-lang-id>.<dialect-lang-id>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to tighter guidelines.
I'm in for using the same module within PMD… but I felt it was worth documenting this is not required. For instance, if I decide to create a dialect that for some reason I don't want to contribute back to PMD, I can create it without needing to fork PMD… not something we necessarily want to promote, but supported by this design.
All-in on the package convention. It is the one we are following either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think we need to differ between modules inside PMD and distributed as part of the "official" PMD binary and a module developed outside. Maybe we can just add a sentence or so like "..." - hm... actually, it is your sentence already 😄 The only problem I see here we use the word "module" for different things: "Dialect module" (in the heading) and "maven module". Maybe this would work?
Dialects usually reside in the same maven module of the base language they leverage; but can technically live standalone in a separate maven module or project if needed (e.g. to develop the dialect module as a project separate from PMD).
But let's change this in a follow-up PR (including mentioning the desired package name explicitly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
pmd-xml/src/main/java/net/sourceforge/pmd/lang/xml/wsdl/WsdlDialectModule.java
Show resolved
Hide resolved
pmd-xml/src/main/java/net/sourceforge/pmd/lang/xml/wsdl/WsdlLanguageModule.java
Show resolved
Hide resolved
pmd-xml/src/main/java/net/sourceforge/pmd/lang/xml/xsl/XslDialectModule.java
Show resolved
Hide resolved
pmd-xml/src/main/java/net/sourceforge/pmd/lang/xml/xsl/XslLanguageModule.java
Show resolved
Hide resolved
|
Thanks for the feedback and review! 🎉 |
Merge pull request #5438 from Monits:lang-dialects
Motivation
XML is a supported language in PMD. However, there are some specific languages based on XML. One example would be XHTML (but not HTML5...) and XSLT and WSDL. Currently, we have the language XML (https://docs.pmd-code.org/latest/pmd_languages_xml.html) and some related languages (Maven POM, WSDL, XSLT). These are all in the module
pmd-xml. They provide their own rules (e.g. https://docs.pmd-code.org/latest/pmd_rules_xsl.html).In the future, we might want to support other XML-based languages, especially in the Salesforce world, such as XML Metadata (custom fields, etc.) and LWC (Lightning Web Components). Note, that Visualforce (https://docs.pmd-code.org/latest/pmd_languages_visualforce.html) is currently an language of its own with its own grammar (supporting the expression language), but it is actually also XML-based.
These specialized languages, that are based on another one, will be called Language Dialects. They support their own file extensions, own language dialect versions, own rules, own custom XPath functions. At the same time, they "inherit" the features from the base language.
Describe the PR
Provide a way to declare a language as a dialect of another.
When a language is declared as a dialect:
These 2 conditions ensure that all rules that were applicable to the base language are also applicable to the dialect (and do so in a single parser pass).
Additionally, the dialect may extend upon these, adding it's own language properties, XPath functions or metrics for it's own rules to use.
The implementation is completely declarative. This means there is no compile-time dependency on the base language, nor requires the base language classes (language module, property bundles, language version handler, etc.) to provide any special constructors to ensure extensibility. It also means, these may change without breaking the dialect.
Additionally, this introduces a major constraint. Once a
LanguageMetadatais declared to be a dialect (asDialectOf(String id)), it forces users to not useSimpleLanguageModuleBaseas a base, but the newSimpleDialectLanguageModuleBase, which implements all the above features.This aims at forcing a correct dialect implementation. If a non-dialect language metadata is passed to
SimpleDialectLanguageModuleBasea compile time error occurs rather than a runtime error.Additionally, by leading developers to extend
SimpleDialectLanguageModuleBase, a correct implementation is achieved, which honors all dialect guarantees. Unfortunately, it would still be possible to manually extendLanguageModuleBasein a broken way (ie: a dialect not using the base language parser).I couldn't think of a better / more comprehensive way to achieve this.
I've been thinking a lot about how to map dialect lang version to base lang version… The mapping are not required to be 1:1 (ie: my xml dialect may accept xml 1.0 or 1.1). Inferring at runtime which is under use (even if possible in XML) is not done by PMD as it can't be done for most languages. I'm therefore thinking of having this be provided by the user (ie:
--use-version xml-1.0 --use-version pom-4.0.0) or apply the default if not provided. As such, the only thing a dialect may do is state "compatibility", producing a runtime error if not satisfied… but that would hardly be future-proof / I'm not really sure if it's valuable. Hence, I'm considering leaving that out for this iteration of the API (which is still@Experimental).Pending work:
I'm still thinking on how to make it self-evident that a language is a dialect. One plausible way would be to force dialect ids to become
{baselang}/{dialect}. This would force rule definitions in XMLs to use this id, and would change the CLI's usage of{lang}-{version}convention to{lang}/{dialect}-{version}, sopom-4.0.0would becomexml/pom-4.0.0.This obviously is a breaking change, and would require some close-gap patchwork with proper deprecation messages when the old convention is used.
RuleSet.appliesstill needs to be updated to acknowledge version numbers for dialect and base language may be different.Also missing:
Related issues
Ready?
./mvnw clean verifypasses (checked automatically by github actions)