Skip to content

[core] Add file collector and new programmatic API for PMD#3785

Merged
adangel merged 29 commits into
pmd:masterfrom
oowekyala:pmd6-file-collector
Mar 3, 2022
Merged

[core] Add file collector and new programmatic API for PMD#3785
adangel merged 29 commits into
pmd:masterfrom
oowekyala:pmd6-file-collector

Conversation

@oowekyala

@oowekyala oowekyala commented Feb 13, 2022

Copy link
Copy Markdown
Member

So in #3692 I had to modify the entry point for PMD. #3692 (comment) proposed to introduce a forward compatible API on master, which this PR implements.

The entry point for PMD has already been modified several times in PMD 6 to be forward compatible. Unfortunately changes in PMD 7 regularly invalidate these compatibility overloads so that the class PMD is accumulating a lot of cruft. I believe one of the problems is that we conceive this processFiles static method as the entry point of PMD. Maintaining a compatible signature is much harder than if it were hidden behind a configuration builder.

#3692 (comment) further suggests that the API to execute PMD from CLI and the programmatic API should be better separated.

This PR reduces the (supported) functionality of the PMD class to be just CLI-related things:

  • the main function
  • the runPmd overloads, which mimic the main function but return the status code
  • the StatusCode enum

All other members are now deprecated for removal in PMD 7.

The programmatic interface of PMD is now fronted by PmdAnalysis. Instances may be created from a PMDConfiguration easily, and collect files, rulesets and renderers (all the parameters to PMD::processFiles). It encapsulates the logic of processFiles inside a no-arg function performAnalysis. Given a complete PMDConfiguration, the only thing you have to do is the following:

try (PmdAnalysis pmd = PmdAnalysis.create(configuration)) { 
  // `create` reads rulesets and collect analysed files as configured in PMDConfiguration
  // optional further configuration
  pmd.files().addFileOrDirectory(Paths.get("/tmp/sources"));
  pmd.addRenderer(new TextRenderer()); 
  // do the analysis
  pmd.performAnalysis();

} // finally, close all data sources registered in the file collector and release other resources

This will be easier to keep forward compatible, as the user-facing API does not deal with DataSource (which is turned into TextFile in PMD 7), and wraps nicely the logic that handles renderers (which is refactored to use GlobalAnalysisListener in #3692). I think the API will also be compatible with #3782.

About separating CLI and programmatic API: some responsibilities of PMDConfiguration could be gradually shifted into the PmdAnalysis. For instance, all that stuff with input paths and exclude files can be implemented easily using the FileCollector directly (and so getInputPaths and such can be removed from PMDConfiguration). In a cleaner model, PMD's programmatic API would rely less on unvalidated strings given to the PMD configuration (this should instead be done in the CLI module). I think removing them from the PMDConfiguration also removes possible error situations, as the data put in PMDConfiguration for "what rulesets to use" and "what files to process" may be out of sync with the actual contents of the FileCollector and of the rulesets, if the user modified those programmatically. Overall, decreasing the importance of PMDConfiguration is I believe a good idea, as it currently infiltrates the codebase very deep and infects it with irrelevant config options that should be handled elsewhere.

This PR introduces 4 new important types:

  • TextFile: this is something I've extracted from my PMD 7 branches. It's what will replace DataSource in PMD 7 (see [core] Text documents epic #3784). TextFile is related to TextDocument but not identical. The version on master is simplified and converted to DataSource before processing so that everything works like before.
  • FileCollector: this one is tasked with collecting TextFiles to analyse in a PMD run. This is a programmatic API to finely control which files are added by PMD. It now is the implementation of PMD.getApplicableFiles
  • PmdLogger: this is a logging strategy, that I feel is needed to uniformize the logging of traces and errors. Currently it's used by internally FileCollector and forwards to a java.util.Logger. I believe this will make it easier to migrate to SLF4J. Maybe it can be replaced with SLF4J's Logger, or extend from it when we migrate.
  • PmdAnalysis: described above

PmdLogger is internal, and TextFile is experimental until PMD 7 (as it will be modified by #3784).

TODO

  • Update release notes
    • There is a source-incompatible change: PMDConfiguration#prependClasspath does not throw IOException anymore

@oowekyala oowekyala added this to the 6.43.0 milestone Feb 13, 2022
@oowekyala oowekyala mentioned this pull request Feb 13, 2022
6 tasks
@ghost

ghost commented Feb 13, 2022

Copy link
Copy Markdown
1 Message
📖 This changeset changes 86 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 1 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 86 violations,
introduces 1 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report
This changeset changes 0 violations,
introduces 0 new violations, 21269 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
Full report

Generated by 🚫 Danger

@oowekyala

oowekyala commented Feb 14, 2022

Copy link
Copy Markdown
Member Author

We should maybe even create a new PmdCli class in the CLI package to host the main function and runPmd(String...) (and remove runPmd(PMDConfiguration)). This way

  • PMDConfiguration is clearly identified as part of the programmatic API and not mixed with the CLI
  • we can deprecate the PMD entry point in PMD 6, and move PmdCli into a pmd-cli submodule in PMD 7, along with the whole nspmd.cli package.

@adangel adangel self-requested a review February 16, 2022 19:12
@adangel adangel added an:enhancement An improvement on existing features / rules is:deprecation The main focus is deprecating public APIs or rules, eg to make them internal, or removing them labels Feb 20, 2022

@adangel adangel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

I'm not sure we need to rush bringing this into 6.43.0 - maybe it's better to wait for 6.44.0.
In general, I like the direction/ideas. I'm not sure yet about the naming and the packages (e.g. TextFile, FileCollector is under n.s.pmd.util.document).

Here are my thoughts:

  • I think, we should not create an own logging implementation. There are enough implementations, we don't need our self made one.
    • it makes it more complex
    • also passing a logger instance from one class to another creates confusion: because one cannot see from the logged message where the message has been logged
    • this is not helpful to diagnose problems. And logging is for developers mostly. Only level INFO would be user facing.
    • rule of thumb: always use the correct logger instance (that is the current class you are in), avoid shared logger instances
    • I'd say, we should remove the classes around it and keep it simple.
  • FileCollectionUtil seems to have the implementation I expected to be in FileCollector... The file collector seems to be mostly a file collection (it doesn't collect=discovers the files, it just keeps a list of files - a collection)
    Ok, after reading the files, I've seen now addDirectory, or addZipFile.
    For me it seems, that FileCollectionUtil does a lot. But it's internal, so we can move it around later anyways. I think, I understand now, what FileCollectionUtil does: It converts the PMDConfiguration into FileCollection calls.
  • Given that we have this support for DBURI, not sure if "FileCollection" etc. is the correct naming. As you mention yourself in "TextFile".
    Maybe a more generic name is "SourceUnit" (similar to compilation unit). It can originate from a file on disk, from a URL, from a database, etc. Or maybe "artifact"? (that's the name used in SARIF)
  • The source level incompatibility affects maven-pmd-plugin - there we explicitly catch IOException. But it's indeed only a source-level incompat, as the dogfood runs fine.
  • Just a note: Handling for absolute / relative paths. I learned recently, that e.g. SARIF/GitHub demands relative paths for the analyzed files (artifacts). Also the current working directory of the tool should be mentioned in the report.
    So we could think about make the paths relative to the current working directory.
    We also have this "shortNames" option for renderers...
    We also had some trouble with the incremental cache regarding absolute paths - and a feature request: currently the cache is bound to one machine, since it contains all absolute paths...
  • Use cases: maven-pmd-plugin collects the files on its own. But I think, this is still possible, since the PMDConfiguration has then "null" for inputPaths, so PMD doesn't do any collection. So m-pmd-p can add its own files via analyzer.files()...

Comment thread docs/pages/release_notes.md
Comment thread pmd-core/src/main/java/net/sourceforge/pmd/PMD.java Outdated
config.getLanguageVersionDiscoverer(),
logger
);
final Level logLevel = configuration.isDebug() ? Level.TRACE : Level.INFO;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting question: Right now, I considered this option "--debug" entirely only for command line and outside the scope of the programmatic api. Maybe we can document it in PMDConfiguration, that it is only considered when PMD is run from cli?
The thing is, logging should be initialized very early and ideally only once - because it is static and global.
Here it would be changed everytime a new PmdAnalysis is created - which could be done multiple times when using PMD programmatically. If PMD is used programmatically, then whoever does the integration, should configure logging outside of PMD in its own software before calling PMD...
Also I think, the logger should be the class, not the package.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3816 for more context on this topic.

Comment thread pmd-core/src/main/java/net/sourceforge/pmd/PmdAnalysis.java Outdated
Comment thread pmd-core/src/main/java/net/sourceforge/pmd/util/document/FileCollector.java Outdated
Comment thread pmd-core/src/main/java/net/sourceforge/pmd/util/document/NioTextFile.java Outdated
Comment thread pmd-test/src/main/java/net/sourceforge/pmd/cli/BaseCLITest.java
@oowekyala oowekyala modified the milestones: 6.43.0, 6.44.0 Feb 21, 2022
@adangel adangel mentioned this pull request Feb 26, 2022
9 tasks
oowekyala and others added 8 commits February 27, 2022 18:15
I don't think it's worth removing it in master...
Traces could admittedly be reported on a jutil.Logger,
but this makes the code unreadable and it's not
clear why we would do that. Other things like
warnings and errors will be reported through
this interface in pmd 7, so it also doesn't
make sense to update their call sites.
@adangel

adangel commented Mar 3, 2022

Copy link
Copy Markdown
Member

@oowekyala I'm going to merge this today. I see, that the logger is now internal - which gives us time to further improve it. I understand the purpose of it which you summarized in #3816, so further discussions go there.

Comment thread docs/pages/pmd/userdocs/tools/java-api.md
Comment thread docs/pages/pmd/userdocs/tools/java-api.md
adangel added a commit that referenced this pull request Mar 3, 2022
[core] Add file collector and new programmatic API for PMD #3785
@adangel adangel merged commit 6a30e59 into pmd:master Mar 3, 2022
@oowekyala oowekyala deleted the pmd6-file-collector branch March 3, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

an:enhancement An improvement on existing features / rules is:deprecation The main focus is deprecating public APIs or rules, eg to make them internal, or removing them

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants