-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[core] Initial implementation of Source Code Minimizer #2142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Generated by 🚫 Danger |
|
Implemented a greedy interactive (w.r.t. to relying on a feedback from the compiler) strategy. Suppose I copied $ diff pmd-core/src/main/java/net/sourceforge/pmd/RuleSet.java RuleSet.java
336c336
< Objects.requireNonNull(patterns, "Pattern collection was null");
---
> Objects.requireNonNull(patterns, "Pattern collection was null", 123);Now let's try to compile it: Sure, there are other problems, but let's try to minimize for just introduced one: Finally, I got the following test.java: it has lots of comments and compiles like that: |
f031c9b to
cbec142
Compare
cbec142 to
e76c669
Compare
|
With the same test file and minimization request: (Part) of Result: test.java |
|
Hmm... When I just started to check that the trimmed source can be parsed again before starting compiler, the minimization time dropped to ~5min - it is quite expected. But when I then changed |
|
Not commenting on implementation yet. I'm really not sure this is something we want in PMD at this moment. It's a completely independent program, and among other things it tries to introduce its own parallel system of Language (while we already have a messy split between CPD and PMD Language). To be very clear: I don't want to discourage people from using PMD as a generic platform to implement tools working on ASTs. In fact, I think that's something PMD should absolutely strive for. The amount of work that has been put into designing language-independent interfaces, and implementing them, is enormous, and IMO it would be failing PMD's potential to limit it to just "make rules". So in principle I'm totally for basing new programs off PMD's platform. Yet I think the codebase is not ready yet to be used in that way. Currently anyone wanting to use it needs to publish it to pmd-core. Everything's merged into that module with no real structure, and we have to maintain it all. That's btw just the same for language implementations. Eg someone published a pmd-scala module a few months ago, and now we have to maintain it, yet nobody bothered to add rules. We're just a few maintainers, and I feel like those plugins should be factored out of the main pmd distribution and maintained by people that are interested in it/ originated the project. This is btw one of the reasons the designer was moved out of this repo. That would also give them more visibility, and avoid stuff getting buried for decades and forgotten (eg DCD, eg the old designer programs, eg data flow). Speaking in my own name: your program looks interesting but I don't want to be responsible for its development, or future lack thereof. Nothing personal. PMD certainly has the potential to be a great platform for that kind of tools, but I think the core of the app needs to be refactored in a way that is more extension-friendly before we can commit to fully supporting this. This needs a lot of planning and I think should be one of the design goals of the next major version (for which we've already been rethinking APIs for a while). |
|
Just to make it clear: I just though it is the preferred way to register one On the other hand, some language modules may want to interfere with minimizer module (such as describing the dependencies between AST nodes). This is some API, too, but not the entire Language system. Technically, for now, it seems the minimizer module can be a completely separate tool utilizing the main PMD (with its languages, etc.) as a dependency. The only thing that would be great to handle is to have some extension points for language modules to introduce themselves to separate tools. But for this specific need even this is not absolutely required. So, I'm not sure right now, but it may turn out that this tool can be designed utilizing only the public PMD APIs from the outside. In that case this PR can be dropped at all. |
|
Meanwhile, how it would be better to design the language implementation, so it would require as less maintainance as possible for simply not becoming deprecated? Is the Scala module hard to maintain or I just misunderstood you and it is just not too useful without bundled rules? |
I understand why that may be so. Ideally I think the collection of parsers could function independently of even the PMD/CPD/SCM programs, and we'd have a single shared model to represent languages, language versions, etc in all those. But we're stuck with the current design until PMD 7.
This is what I'm talking about, which we should think about for PMD 7.
It's not worse per se than other language modules. It's just still quite immature, and nobody's worked on it since the initial PRs, which is a shame. IMO it's just pretty underwhelming to ship features that are half baked and remain so for years. Afterwards when you want to add one generic feature, all modules need to somehow be aligned on a shared contract, and so the ones that have been left behind either need more work, or get even more behind... The thing with PMD is that each language module would ideally have its own community of developers of that language to drive innovation, but some modules are next to abandoned...
The worst thing about the current implementations of language modules, is the code duplication everywhere, and the fact that most every class is public without reason. Some suboptimal design decisions need to stay for an entire major release cycle because they're public, and may be duplicated tens of times. A nice thing you can do is keep the published API as small as possible. This helps keep your hands free when you want to refactor down the line. So eg if you're implementing an interface from PMD core, consider the implementation internal by default and hide it. Clients can use your implementation through the interface. If you're implementing something new, start by writing everything as package-private, and only make public what needs to be so. Side note: we're not about to deprecate language modules either, so don't worry about modelica (or scala for that matter). |
|
Hi @atrosinenko , thanks for your PoC! Therefore I've moved your code to a separate repository: https://github.com/pmd/pmd-scm .
It's mostly a chicken-egg-problem: Without provided rules to start with ("batteries included"), there are no users. Without users, there are no improvements/developers. One side note about moving stuff out of the main repo: The probable thing, that will happen is: it is simply not maintained and will shortly start to be defunctional. This is prevented by keeping it in the main repo, because then, when we refactor etc. we are forced to keep it running (building+unit tests, at least).
So, that is indeed possible. And that's all what we need here. So, I'm closing this PR. Regards, |
|
Thanks @adangel , Random thoughts on the design. Previously, I tried to completely separate SCM from PMD. On one hand, it could use PMD as one of the backends (the others may be plain ANTLR grammars or even binary formats based on Kaitai Struct for format-aware replacement for In fact, some SCM strategies may work with just a plain ANTLR grammars, while others may utilize as much language support as possible (and it may be tricky to abstract it out completely), so it fills the entire range from CPD to PMD w.r.t. required language support. :) |
|
Another side-effect of such separation is drastically faster |
This PR introduces initial draft of Source Code Minimizer feature, see #2103 (hope it would not name-clash with something source code management related).
Now it is already able to execute something like this:
There are no multi-step interactive strategies implemented yet.