Skip to content

[core] Language properties #2518

@oowekyala

Description

@oowekyala

I think the most generic solution to the problems we have when it comes to configuring languages, is to configure Language instances directly via properties.

Example use cases:

The core of this proposal is to make LanguageRegistry non-static, and giving it a proper lifecycle. This makes it so, that analysis-global state (like the classloader) can be stored in the language instance, and shared with eg the parser and other processing stages, without having to jump through hoops to figure out if we have the same parameters as before.

Consider making the service that is loaded by ServiceLoader not Language directly, but a LanguageLoader. This object creates a Language instance with property descriptors. The initialization takes care of configuring the language with all those things mentioned above.

For example, a LanguageLoader interface could look like so:

interface LanguageLoader<P extends PropertySource> {

    String id();

    P newPropertyBundle();

    Language createLanguage(P properties);

}
  • newPropertyBundle creates a PropertySource and defines the PropertyDescriptors accepted by the language
  • createLanguage creates a language instance given a configuration

Eg the java implementation of this would be

class JavaLanguageLoader implements LanguageLoader<JavaLanguageProperties> {


    @Override
    public String id() {
        return JavaLanguageModule.TERSE_NAME;
    }

    @Override
    public JavaLanguageProperties newPropertyBundle() {
        return new JavaLanguageProperties(); // defines some properties (auxclasspath, suppressMarker, filePatterns)
    }

    @Override
    public Language createLanguage(JavaLanguageProperties properties) {
        // in this language instance we can initialize the special classloader before analysing the first file
        return new JavaLanguageModule(
            properties.getFilePatterns(), // more general than file extensions, eg supports "pom.xml"
            properties.getSuppressMarker(),
            properties.getAuxclasspathClassloader()
        );
    }
}

Language properties can be set with this simple CLI extension: -L<langId>:<propName> <value>.
For example -Lxml:fileNamePatterns 'pom.xml,*.fxml', or -Ljava:auxclasspath 'some;cp'.

The initialization process of a language registry would look like so:

  1. LanguageRegistry uses ServiceLoader to load a bunch of LanguageLoader instances
  2. -L options of the command line args are partitioned by language ID
  3. Foreach language loader, create a language instance like so:
    1. create a property bundle (newPropertyBundle)
    2. call setProperty for each -L switch targeting the given language (this will throw errors for misconfigurations)
    3. call createLanguage with that property bundle
  4. Create a LanguageRegistry wrapping this set of languages

Language and LanguageRegistry can implement AutoCloseable. This allows the language registry to be used in a try statement.
So, very high in the PMD call stack, we can have something like

    public static void main(String[] args) {
           
        // langProperties := properties from the CLI, partitioned by language 
        // serviceClassLoader := classloader used by ServiceLoader
        // LanguageRegistry::loadWithProperties is the loading routine described above

        try (LanguageRegistry registry = LanguageRegistry.loadWithProperties(serviceClassLoader, langProperties)) {
            // load rulesets, execute, etc, using this registry
            // Other analyses use different LanguageRegistries, so are independent from this one
        }
        // classloader & other resources allocated on construction are reclaimed
    }

I think this mechanism solves long-standing issues with our configuration model, namely

  • adding options to a language implementation requires changing everything, from the upper layers like Configuration and the CLI itself (or the schema, similar to [core] Abstract away optional AST traversals (first step) #1426), to the implementation of the feature, that is spread ad-hoc over all the pmd-core codebase. Most languages probably don't need all those options anyway (eg the auxclasspath).
  • there's no "analysis scoped" objects, which means we have to use dirty tricks like threadlocals or
    the horror that's PmdAsmClassLoader to share objects through an analysis. Such "tricks" make PMD harder to use in a concurrent setting. There's no cleanup of static state at all.
  • ParserOptions is a pain to use and could be removed (see wiki)

It gives us a simple and general extension mechanism to configure language-specific behavior,
without affecting other languages, and without needing changes to pmd-core. For example, we could use that to enable logging of detailed type-resolution-specific debug information (missing classes, type inference failures), which would fit well into the updated java module

Metadata

Metadata

Assignees

No one assigned

    Labels

    a:RFCA drafted proposal on changes to PMD, up for feedback form the team and communityan:enhancementAn improvement on existing features / rules

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions