You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the most generic solution to the problems we have when it comes to configuring languages, is to configure Language instances directly via properties.
Example use cases:
We currently have a CLI switch for the suppress marker, another for the auxclasspath.
The auxclasspath is only used by JVM languages (currently only Java). We try to reuse our special PmdAsmClassLoader, bound to the classloader created for the auxclasspath. That classloader is stored in a static variable. It's overwritten if you launch an analysis with another auxclasspath. This means, analysing different projects concurrently will overwrite each other's classloader repeatedly, undermining reuse. This also means the only way to reclaim the memory for this classloader is to launch another analysis, otherwise it will stay in the static variable. Clearly there's a problem with lifecycle here. See here
The suppress marker is set for every language globally. It's the only use case for ParserOptions, which otherwise could be removed. It would make sense to configure this language by language
The core of this proposal is to make LanguageRegistry non-static, and giving it a proper lifecycle. This makes it so, that analysis-global state (like the classloader) can be stored in the language instance, and shared with eg the parser and other processing stages, without having to jump through hoops to figure out if we have the same parameters as before.
Consider making the service that is loaded by ServiceLoader not Language directly, but a LanguageLoader. This object creates a Language instance with property descriptors. The initialization takes care of configuring the language with all those things mentioned above.
For example, a LanguageLoader interface could look like so:
newPropertyBundle creates a PropertySource and defines the PropertyDescriptors accepted by the language
createLanguage creates a language instance given a configuration
Eg the java implementation of this would be
classJavaLanguageLoaderimplementsLanguageLoader<JavaLanguageProperties> {
@OverridepublicStringid() {
returnJavaLanguageModule.TERSE_NAME;
}
@OverridepublicJavaLanguagePropertiesnewPropertyBundle() {
returnnewJavaLanguageProperties(); // defines some properties (auxclasspath, suppressMarker, filePatterns)
}
@OverridepublicLanguagecreateLanguage(JavaLanguagePropertiesproperties) {
// in this language instance we can initialize the special classloader before analysing the first filereturnnewJavaLanguageModule(
properties.getFilePatterns(), // more general than file extensions, eg supports "pom.xml"properties.getSuppressMarker(),
properties.getAuxclasspathClassloader()
);
}
}
Language properties can be set with this simple CLI extension: -L<langId>:<propName> <value>.
For example -Lxml:fileNamePatterns 'pom.xml,*.fxml', or -Ljava:auxclasspath 'some;cp'.
The initialization process of a language registry would look like so:
LanguageRegistry uses ServiceLoader to load a bunch of LanguageLoader instances
-L options of the command line args are partitioned by language ID
Foreach language loader, create a language instance like so:
create a property bundle (newPropertyBundle)
call setProperty for each -L switch targeting the given language (this will throw errors for misconfigurations)
call createLanguage with that property bundle
Create a LanguageRegistry wrapping this set of languages
Language and LanguageRegistry can implement AutoCloseable. This allows the language registry to be used in a try statement.
So, very high in the PMD call stack, we can have something like
publicstaticvoidmain(String[] args) {
// langProperties := properties from the CLI, partitioned by language // serviceClassLoader := classloader used by ServiceLoader// LanguageRegistry::loadWithProperties is the loading routine described abovetry (LanguageRegistryregistry = LanguageRegistry.loadWithProperties(serviceClassLoader, langProperties)) {
// load rulesets, execute, etc, using this registry// Other analyses use different LanguageRegistries, so are independent from this one
}
// classloader & other resources allocated on construction are reclaimed
}
I think this mechanism solves long-standing issues with our configuration model, namely
adding options to a language implementation requires changing everything, from the upper layers like Configuration and the CLI itself (or the schema, similar to [core] Abstract away optional AST traversals (first step) #1426), to the implementation of the feature, that is spread ad-hoc over all the pmd-core codebase. Most languages probably don't need all those options anyway (eg the auxclasspath).
there's no "analysis scoped" objects, which means we have to use dirty tricks like threadlocals or
the horror that's PmdAsmClassLoader to share objects through an analysis. Such "tricks" make PMD harder to use in a concurrent setting. There's no cleanup of static state at all.
ParserOptions is a pain to use and could be removed (see wiki)
It gives us a simple and general extension mechanism to configure language-specific behavior,
without affecting other languages, and without needing changes to pmd-core. For example, we could use that to enable logging of detailed type-resolution-specific debug information (missing classes, type inference failures), which would fit well into the updated java module
I think the most generic solution to the problems we have when it comes to configuring languages, is to configure Language instances directly via properties.
Example use cases:
The core of this proposal is to make LanguageRegistry non-static, and giving it a proper lifecycle. This makes it so, that analysis-global state (like the classloader) can be stored in the language instance, and shared with eg the parser and other processing stages, without having to jump through hoops to figure out if we have the same parameters as before.
Consider making the service that is loaded by ServiceLoader not Language directly, but a LanguageLoader. This object creates a Language instance with property descriptors. The initialization takes care of configuring the language with all those things mentioned above.
For example, a LanguageLoader interface could look like so:
newPropertyBundlecreates a PropertySource and defines the PropertyDescriptors accepted by the languagecreateLanguagecreates a language instance given a configurationEg the java implementation of this would be
Language properties can be set with this simple CLI extension:
-L<langId>:<propName> <value>.For example
-Lxml:fileNamePatterns 'pom.xml,*.fxml', or-Ljava:auxclasspath 'some;cp'.The initialization process of a language registry would look like so:
LanguageRegistryuses ServiceLoader to load a bunch of LanguageLoader instances-Loptions of the command line args are partitioned by language IDnewPropertyBundle)createLanguagewith that property bundleLanguage and LanguageRegistry can implement AutoCloseable. This allows the language registry to be used in a try statement.
So, very high in the PMD call stack, we can have something like
I think this mechanism solves long-standing issues with our configuration model, namely
the horror that's PmdAsmClassLoader to share objects through an analysis. Such "tricks" make PMD harder to use in a concurrent setting. There's no cleanup of static state at all.
It gives us a simple and general extension mechanism to configure language-specific behavior,
without affecting other languages, and without needing changes to pmd-core. For example, we could use that to enable logging of detailed type-resolution-specific debug information (missing classes, type inference failures), which would fit well into the updated java module