Skip to content

[java] New type resolution#2689

Merged
adangel merged 315 commits into
pmd:java-grammarfrom
oowekyala:typeres-jtypes
Sep 11, 2020
Merged

[java] New type resolution#2689
adangel merged 315 commits into
pmd:java-grammarfrom
oowekyala:typeres-jtypes

Conversation

@oowekyala

@oowekyala oowekyala commented Aug 1, 2020

Copy link
Copy Markdown
Member

Main changes:

Symbols

Instead of using Class instances to back symbols, we use ASM to read class files directly.
This has several advantages

  • Decoupling of the JDK version used to run PMD vs the JDK version of the analysed sources. PMD can now analyse code of future JDK versions, provided the classpath is complete. Since we don't depend on the core reflection API, we can also get access to information that is only available in later JDK versions (eg, Class::isRecord is only there since JDK 14, so we can't use it if we target JDK 8)
  • Performance: we don't need to load the class, decode the body of methods, verify the bytecode etc. Also, we don't have to deal with ClassNotFoundExceptions, which was a performance issue ([java] Don't use ClassNotFoundException to report unresolved classes #2236)
  • Safety: there's no risk of a linkage error. At most we can get an unresolved class reference, which is patched by creating unresolved symbols. Missing dependencies may be reported transparently to the user (that's not implemented, but if we do, there are only a few places to update)
  • Functionality:
    • we can get compile-time constant values from the classfile, an information that the Core reflection API doesn't expose. This means we can resolve the value of constant expressions ([java] Allow @SuppressWarnings with constants instead of literals #520)
    • in the future we may be able to represent annotation values without even having the annotation classes on the classpath too.

Implementation is tricky, because we have to lazy load referenced classes (to prevent stuff like #2685), but we have to do it in a thread-safe way.

This replaces PMDAsmClassLoader. The AsmSymbolResolver uses a classloader as input, but only to resolve URLs to class files.

The reflected symbol implementation is removed. It doesn't carry its weight.

Using ASM means, we don't have access to Class instances at runtime anymore, only symbols. TypeNode::getType is deprecated for removal. I think this is a win: we have a single API to get class information.

Type representation

The core of the framework is a set of interfaces to represent types. The root interface is JTypeMirror.

Type mirrors are created by a TypeSystem object. This object is analysis-global (eventually it should be part of the Language instance, #2518)

TypeSystem has a bunch of final fields that represent important types, eg Object, primitive types, void, etc. The reason those are not static constants, is it makes initialization clearer. It also makes us able to use the Object class that's loaded from the auxclasspath.

JTypeMirror objects give access to operations like subtyping checks, that are aware of type arguments, can test if they are reifiable or erased, etc. This API is meant for rules, though currently it's only been fleshed out enough to support type inference. How rules will use it, we can figure out later.

  • Though note that StringToStringRule has been ported to use the new framework as a proof of concept:
if ("toString".equals(node.getMethodName())
        && node.getArguments().size() == 0
        && TypeTestUtil.isA(String.class, node.getQualifier())) {
        addViolation(data, node);
    }
}

This should precisely fit the contract of the rule, and if there are FPs/FNs, then it's not the fault of the rule, it's the fault of type resolution. Being able to depend on these components is essential to write simpler, more maintainable, and more precise rules. This PR brings us to that stage.

I wanted to keep out null from the API, so there's a special UNRESOLVED_TYPE as a field of TypeSystem, that is used as a sentinel.

Type resolution

Resolution of types is done lazily. See LazyTypeResolver, which is the "better ClassTypeResolver". Laziness is probably not a great benefit, but it at least makes implementation very clear.

The logic in LazyTypeResolver only cares about the so-called "standalone expressions", that is, expressions whose type does not depend on context (eg literals). Expressions that are not standalone are called "poly expressions", these include lambdas, method references, generic method/constructor calls, conditional expressions. Resolving the type of poly expressions is complicated, and in the general case, it needs type inference.

Type inference

A huge chunk of the code additions is the type inference engine. The entry point for this API is the class Infer. An Infer object is created per file as it's not thread-safe.

The engine's crucial responsibility is to do overload selection. This may involve doing type inference when methods are generic, or for diamond invocation. Sometimes type inference is not needed (method is not generic).

The engine doesn't interact with the AST directly. It has its own set of interfaces (ExprMirror), whose implementations wrap AST nodes. This decouples the AST from the engine, for example, we can do overload selection for enum constants as if they were regular constructor calls.

There are sophisticated logging strategies to inspect a run of the inference engine. Some tests in the maven build show a sample output.

All this is internal API, though I hope to use it in some privileged rules to detect unnecessary cast, unnecessary boxing, unnecessary type arguments, while being aware of maybe casts/boxings that disambiguate between some overloads.

Symbol table

Symbol tables resolve not only symbols now, they resolve typed symbols. For example, if you're in a class that inherits from AbstractList<String>, then the method get that the class inherits is not just T get(int), it's String get(int). Same thing for fields, and inherited nested types.

The typed equivalent for symbols are:

  • JTypeParameterSymbol -> JTypeVar
  • JClassSymbol -> JClassType
  • JVariableSymbol -> JVariableSig
  • JMethodSymbol -> JMethodSig

Code maturity

  • There's a TypeResTestRule in category security. This is just a rule I used to test the type resolution on an entire codebase and catch bugs. I'm leaving it in this PR, in case you want to do the same. We should remove it before releasing
  • The type inference logger can be selected by a system property (PMD_DEBUG_LEVEL, FINE or FINEST). This is also just a dev-only feature, that one can use to inspect a run. Logic is in JavaAstProcessor. I expect this can become a forever-experimental language property. Note that level FINEST is too much detail, usually only appropriate for only a single overload resolution.
  • TypeSystem should be truly analysis-global: [core] Language properties #2518. For now it's hacked much in the same way as PMDASMClassLoader: by comparing the classloader. Logic is in JavaAstProcessor.
  • Many tests depend on the toString of JTypeMirror. I haven't had the courage to update them yet... this is done now
  • The old type resolution is completely removed, save from ClassTypeResolver, because some rules use it. It doesn't work anymore, this is just so that the rules compile. We can fix that later
  • There are a lot of open TODOs, most of them either warning about
    • code that would do an infinite loop/infinite recursion if the source code is semantically invalid (eg, cyclic inheritance). This should be caught early in a verification pass.
    • some rare corner case, that's not been implemented yet
  • There are some more detailed TODOs in the package-info of the types package.
  • I just pushed an updated designer version that is compatible with 7.0. Demo, for the inferred type of this variable

typeres_demo



  • I added a lot of tests and cleaned up up all around.
  • Provided the classpath is complete, this now processes at 100% resolution rate (proportion of TypeNodes whose type is not unknown). I tested this on java.base and pmd-core. In java.base there are zero unresolved nodes out of 1.4 million. With no classpath this does about 85/90%, which believe is about the same as current master.
  • This has been built and tested nearly exclusively on valid code, and since everything is pretty defensive, you can eg end up with IllegalArgumentException if you write List<K,V>. Fault tolerance can be explored later. I won't add anything more to this PR

Blockers

  • Deprecate the old symboltable on java-grammar. Many of its internal classes may also be deprecated on master as @InternalApi
  • Deprecate some utilities of TypeHelper on master: unused/very specific stuff like isExactlyNone, and all methods which take TypedNameDeclaration parameters. This class is deprecated. Remaining is only methods is[Exactly]A that take a TypeNode parameter.
  • Fix the remaining bugs about lambdas (this needs a big refactoring of the new things introduced here and I'd rather do this in this PR)

@oowekyala oowekyala added this to the 7.0.0 milestone Aug 1, 2020
@ghost

ghost commented Aug 1, 2020

Copy link
Copy Markdown
1 Message
📖 No java rules are changed!

Generated by 🚫 Danger

@adangel adangel mentioned this pull request Aug 2, 2020
4 tasks
@oowekyala oowekyala marked this pull request as ready for review August 6, 2020 13:26
(But implicitly, every getTypeMirror call may trigger type
resolution, it's the entire point)

Also uniformize the exception, it's an assert

@adangel adangel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the huge work!!
I especially like, that we use ASM now to load the types/classes rather than loading the class vial classloader/reflection. That should make it really possible, to use Java8 as runtime to analyze a newer java version.

I'll add the small stuff (like deprecations for 700 and TODO comments) and merge it then.

* BSD-style license; for more info see http://pmd.sourceforge.net/license.html
*/

package net.sourceforge.pmd.lang.java.rule.security;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could move this later to package "internal" - security doesn't seem to fit. Maybe a ruleset/category "diagnostics" is a better name?

* from {@link JTypeDeclSymbol symbols}, a layer of abstraction above reflection
* classes.
*
* <p>Type mirrors can be obtained {@linkplain TypesFromReflection from reflected types},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow this in the future - TypesFromReflection? I think, we should remove this possibility in the end to avoid mixing auxclasspath and PMD's runtime classpath...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think TypesFromReflection doesn't use the Class instances directly. It will ask the TypeSystem to resolve the class name, and gives up if that class is not on the classpath of the typesystem.

This is a remnant of an early version of the framework, where it was still using the reflected symbol implementation and not ASM. So this was effectively depended on by the reflected symbol implementation to get the type of eg a method parameter, and was important.

I figured, if we keep it we can build complex types with eg typeSystem.fromReflect(new TypeLiteral<List<String>>() { }) (which would produce the type for List<String>) without having to go through hoops with parameterize and such. But it's not a crucial piece of the framework and could be removed. I think, we should decide when we try to port rules, to see if it carries its weight.

Comment thread pmd-java/src/main/java/net/sourceforge/pmd/lang/java/types/Lub.java
Comment thread pmd-java/src/main/resources/category/java/security.xml Outdated
Comment thread pmd-java/src/test/java/net/sourceforge/pmd/lang/java/ast/Java10Test.java Outdated
@adangel adangel merged commit 8751271 into pmd:java-grammar Sep 11, 2020
@oowekyala

Copy link
Copy Markdown
Member Author

Thanks for merging ! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in:type-resolution Affects the type resolution code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants