[core] Add file collector and new programmatic API for PMD#3785
Conversation
e013ce8 to
8ae1b06
Compare
Generated by 🚫 Danger |
|
We should maybe even create a new
|
adangel
left a comment
There was a problem hiding this comment.
Thanks for your work on this!
I'm not sure we need to rush bringing this into 6.43.0 - maybe it's better to wait for 6.44.0.
In general, I like the direction/ideas. I'm not sure yet about the naming and the packages (e.g. TextFile, FileCollector is under n.s.pmd.util.document).
Here are my thoughts:
- I think, we should not create an own logging implementation. There are enough implementations, we don't need our self made one.
- it makes it more complex
- also passing a logger instance from one class to another creates confusion: because one cannot see from the logged message where the message has been logged
- this is not helpful to diagnose problems. And logging is for developers mostly. Only level INFO would be user facing.
- rule of thumb: always use the correct logger instance (that is the current class you are in), avoid shared logger instances
- I'd say, we should remove the classes around it and keep it simple.
- FileCollectionUtil seems to have the implementation I expected to be in FileCollector... The file collector seems to be mostly a file collection (it doesn't collect=discovers the files, it just keeps a list of files - a collection)
Ok, after reading the files, I've seen now addDirectory, or addZipFile.
For me it seems, that FileCollectionUtil does a lot. But it's internal, so we can move it around later anyways. I think, I understand now, what FileCollectionUtil does: It converts the PMDConfiguration into FileCollection calls. - Given that we have this support for DBURI, not sure if "FileCollection" etc. is the correct naming. As you mention yourself in "TextFile".
Maybe a more generic name is "SourceUnit" (similar to compilation unit). It can originate from a file on disk, from a URL, from a database, etc. Or maybe "artifact"? (that's the name used in SARIF) - The source level incompatibility affects maven-pmd-plugin - there we explicitly catch IOException. But it's indeed only a source-level incompat, as the dogfood runs fine.
- Just a note: Handling for absolute / relative paths. I learned recently, that e.g. SARIF/GitHub demands relative paths for the analyzed files (artifacts). Also the current working directory of the tool should be mentioned in the report.
So we could think about make the paths relative to the current working directory.
We also have this "shortNames" option for renderers...
We also had some trouble with the incremental cache regarding absolute paths - and a feature request: currently the cache is bound to one machine, since it contains all absolute paths... - Use cases: maven-pmd-plugin collects the files on its own. But I think, this is still possible, since the PMDConfiguration has then "null" for inputPaths, so PMD doesn't do any collection. So m-pmd-p can add its own files via analyzer.files()...
| config.getLanguageVersionDiscoverer(), | ||
| logger | ||
| ); | ||
| final Level logLevel = configuration.isDebug() ? Level.TRACE : Level.INFO; |
There was a problem hiding this comment.
That's an interesting question: Right now, I considered this option "--debug" entirely only for command line and outside the scope of the programmatic api. Maybe we can document it in PMDConfiguration, that it is only considered when PMD is run from cli?
The thing is, logging should be initialized very early and ideally only once - because it is static and global.
Here it would be changed everytime a new PmdAnalysis is created - which could be done multiple times when using PMD programmatically. If PMD is used programmatically, then whoever does the integration, should configure logging outside of PMD in its own software before calling PMD...
Also I think, the logger should be the class, not the package.
I don't think it's worth removing it in master... Traces could admittedly be reported on a jutil.Logger, but this makes the code unreadable and it's not clear why we would do that. Other things like warnings and errors will be reported through this interface in pmd 7, so it also doesn't make sense to update their call sites.
|
@oowekyala I'm going to merge this today. I see, that the logger is now internal - which gives us time to further improve it. I understand the purpose of it which you summarized in #3816, so further discussions go there. |
which deals with restoring System.out/err already
[core] Add file collector and new programmatic API for PMD #3785
So in #3692 I had to modify the entry point for PMD. #3692 (comment) proposed to introduce a forward compatible API on master, which this PR implements.
The entry point for PMD has already been modified several times in PMD 6 to be forward compatible. Unfortunately changes in PMD 7 regularly invalidate these compatibility overloads so that the class
PMDis accumulating a lot of cruft. I believe one of the problems is that we conceive thisprocessFilesstatic method as the entry point of PMD. Maintaining a compatible signature is much harder than if it were hidden behind a configuration builder.#3692 (comment) further suggests that the API to execute PMD from CLI and the programmatic API should be better separated.
This PR reduces the (supported) functionality of the
PMDclass to be just CLI-related things:mainfunctionrunPmdoverloads, which mimic the main function but return the status codeAll other members are now deprecated for removal in PMD 7.
The programmatic interface of PMD is now fronted by PmdAnalysis. Instances may be created from a PMDConfiguration easily, and collect files, rulesets and renderers (all the parameters to
PMD::processFiles). It encapsulates the logic ofprocessFilesinside a no-arg functionperformAnalysis. Given a complete PMDConfiguration, the only thing you have to do is the following:This will be easier to keep forward compatible, as the user-facing API does not deal with DataSource (which is turned into TextFile in PMD 7), and wraps nicely the logic that handles renderers (which is refactored to use GlobalAnalysisListener in #3692). I think the API will also be compatible with #3782.
About separating CLI and programmatic API: some responsibilities of PMDConfiguration could be gradually shifted into the PmdAnalysis. For instance, all that stuff with input paths and exclude files can be implemented easily using the FileCollector directly (and so getInputPaths and such can be removed from PMDConfiguration). In a cleaner model, PMD's programmatic API would rely less on unvalidated strings given to the PMD configuration (this should instead be done in the CLI module). I think removing them from the PMDConfiguration also removes possible error situations, as the data put in PMDConfiguration for "what rulesets to use" and "what files to process" may be out of sync with the actual contents of the FileCollector and of the rulesets, if the user modified those programmatically. Overall, decreasing the importance of PMDConfiguration is I believe a good idea, as it currently infiltrates the codebase very deep and infects it with irrelevant config options that should be handled elsewhere.
This PR introduces 4 new important types:
PMD.getApplicableFilesPmdLogger is internal, and TextFile is experimental until PMD 7 (as it will be modified by #3784).
TODO