[core] Language lifecycle

Split off from #2518. This ticket focuses on the implementation aspects related to language lifecycle.

#2518 proposes that language instances should have a proper lifecycle, allowing them to store analysis global data (like a classloaders/TypeSystem instance) and configuration (like tab sizes or auxclasspath). If language instances are analysis-global, then you need to wait until the start of the analysis to create them. However, we still need a way to refer to languages before starting the analysis, eg to identify them in the ruleset XML, to figure out what language versions they support, for CLI help etc. We hence need for each language
1. a global object that describes the language, eg its name and language versions and such.
2. an object that encapsulate analysis state during execution. This is stateful and has lifecycle methods.


I'm going to describe how I imagine the final API in PMD 7 working. 
- `Language` instances are stateless and global, like now.
- `Language` instances are loaded through `ServiceLoader` like now, into a `LanguageRegistry`.
- A new class `LanguageProcessor` encapsulates language-specific analysis-scoped state.
- A `Language` instance can create a `LanguageProcessor` instance and configure it via language properties.
- There is no reason to treat language versions as more than just another parameter to the construction of a `LanguageProcessor`. Since language versions should be inspectable from a `Language` instance (eg when parsing rulesets, before the analysis), they also need to be global and stateless. `LanguageVersion` instances are simpler than in PMD 6. A LanguageVersion just has metadata like a name and can be compared with other versions of the same language. `LanguageVersion` instances do not provide a `LanguageVersionHandler`.

  That is, you won't be calling `language.getLanguageVersion().getLanguageVersionHandler().getParser()` anymore or so.
Rather, `language.buildProcessor(languageProperties)`, and pass the LanguageVersion as a language property. Then `languageProcessor.getParser()`

- `LanguageVersionHandler` is renamed `PmdExtension`. It remains the extension point for PMD: you have to override `getParser()`, and can override other things used by pmd-core. A `LanguageProcessor` needs to provide a `PmdExtension`.

- A `LanguageProcessor` performs all the analysis and has control over
  - the order in which files are processed
  - which files are processed and how 
  - what to put in the analysis cache 
 
  This makes the analysis process extensible by languages, which allows us to integrate things like #2667. It will also be possible to have fine-grained language specific caching strategies, for example by inspecting ABI changes in the auxclasspath in the java module (#2704).

  In a first step we should provide a single LanguageProcessor implementation, which does everything like PMDProcessor does today.

### CPD

See #3919 

CPD-specific extensions, like the Tokenizer instance are provided by a new `CpdExtension` interface, which is similar to `PmdExtension`. Like `PmdExtension`, the `LanguageProcessor` provides access to an instance. You have to override `CpdExtension.getTokenizer()`.


### Additional notes


* The most invasive changes will probably be
   - making LanguageRegistry non-static (#3918)
   - changing the role of LanguageVersion and LanguageVersionHandler

   ~especially in outdated tests that still do everything themselves instead of using a BaseParsingHelper.~ There are no more such tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] Language lifecycle #3782

CPD

Additional notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[core] Language lifecycle #3782

Description

CPD

Additional notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions