[java] Symbol reference framework#1566
Conversation
Generated by 🚫 Danger |
|
Thanks for the PR, I'm still trying to fully grasp your vision as for where this is going, so please, bare with me…
This is not exactly true... for overloaded methods, you need type resolution to known which method is actually being called (which in turn may affect the return type of that method, which feeds back to the type resolution). Symbol table and type resolution are very intertwined; I believe the best approach is to have a few self-contained helper objects for specific tasks:
But a standalone object will have to drive the whole process, calling upon these helpers at specific points on it's own; having these operations as part of the logic, and not separate standalone stages.
I'm a little worried about this... the current status has 2 issues I think are very important to address:
It's not explicit, but you are thinking of having 2 implementations for each interface, right? one using the node (where the overhead is simply the wrapper object), and another one where all fields are populated (ie: taken from a file storage)… So, on multifile, you would need a first pass on all files to populate this cache? we should be aware of not keeping all files' AST in memory. |
The meaning of a name depends on the context the name is found in, and so on the declarations currently in scope, and on their relative precedence, w.r.t. shadowing rules. So essentially, resolving names can't be separated from the scope model. JSymbolTable addresses your three points with the single mechanism of delegation, and encodes the precedence rules into the structure of the symbol table stack. That obviates the need for a master algorithm that knows about all the rules, and is easier to unit test I think.
The approach I'm thinking of for now is the following:
Conceptually, the third stage (type resolution) can easily be made lazy, provided the scope information has already been built. The typing of an expression only depends on the typing of the symbols it references (and its syntactic context, but we have an AST node), so that if references can be resolved, we can resolve types on-demand. Reference resolving needs a more global approach that takes the structure of the whole compilation unit into account, which is why I don't think the first stage can be optional.
Yes it maybe would be best to remove it, I don't expect it to be very useful. Ideally the JSymbolElement would present all useful information directly. That is internal API for now so we can decide on that later I think
I don't think this will be very noticeable, especially if we try to cache symbols/ be somewhat lazy. We'll have to be careful but I'm not too worried. I would think that AST nodes and Tokens would be orders of magnitude heavier
In fact what I've implemented uses the node or Class at the time of construction, and so uses a single implementation for each interface. What you propose may be better, idk. I'd rather talk about that on the implementation PRs |
Yes, I would expect the scope object to handle this, ie: qualify reference types with the proper parent types from the current file. Actually, some of this is already done within the current Scope implementation if I recall correctly… (although probably missing some edge cases). However, going from this "qualified" name to the actual type may be split apart if needed.
I'll dig deeper into it and get back to you.
I'm unsure I follow. As already stated, disambiguation needs type resolution, how can that be a third stage? how is the second stage going to disambiguate without extra info?
Actually, I think the one thing you need to do beforehand is creating the scope hierarchy, noting all declarations and "qualifying" types. Both, type resolution and symbol-usage finding can be made on-demand (for performance we may want to index all occurrences of symbol names, and on-demand simply disambiguate among them). If type resolution results are stored into the nodes (as currently done), then these can be memoized to have each node resolved at most once.
Bare in mind, we keep the AST of a single file per thread in memory at a time. When a thread moves on to the next file, the whole AST and tokens are dropped for garbage collection. We never keep all the files analyzed in memory. For multifile analysis keeping these objects for all files is orders of magnitude more memory than we currently use. |
Yes, there is a semantic gap between names and actual types. JSymbolTable only cares about resolving unqualified names, types can be resolved later-on.
I wrote some more extensive documentation about how those shadowing rules can be encoded in the stack on JSymbolTable, but I planned to introduce it in some later PR. Here are some links:
JSymbolTable doesn't qualify anything in itself. It only holds the knowledge to find the symbol denoted by a given simple name, taking care of the shadowing rules. That way later stages like actual type resolution and disambiguation drive their analysis while supporting them on the table when a reference needs to be resolved, but the table itself is not e.g. a type resolver, just a support system. Tables also can resolve for type names, value names, and method names, provided they're unqualified. The shadowing rules for those different kinds of names are so similar that we can shoot the three birds with one stone.
You're wrong about that. Disambiguation occurs for names, not types, so it doesn't need type resolution. It only needs to occur for names that were syntactically classified as ambiguous. The JLS goes into detail about how reclassification is done, and notice it doesn't mention types at all, only names, and the scope of declarations. Notice that the reclassification algorithm is iterative and starts from the leftmost name, ie. the unqualified one. That one can be resolved to a symbol by the symbol table. The first phase models the scope of declarations, so that after it, we have all the info we need to perform disambiguation. Having disambiguation occur in a second phase allows to separate concerns and take care of obscuring rules separately from shadowing rules. Shadowing rules can be encoded in the symbol table stack because they apply to a whole extent of program points indiscriminately. Obscuring depends on the actual context of individual name references, and only occurs if a type and a value with the same name are in scope at the same time, and they could both occur in the position of the reference. Since JSymbolTable has separate channels to find a type name and a value name, this second pass can use it without the symbol table having to care. Btw symbol tables use the auxclasspath to know about members that were declared in e.g. superclasses of the current class. This is distinct from type resolution itself.
Yes this should be the case in general for expressions. But symbols should also carry type information
I never implied keeping all ASTs in memory, just keeping some symbols in memory. If we keep the |
|
@jsotuyod May I merge this? Not in the main 7.0.x branch if you still have reservations. I think bringing in pieces of the implementation will drive this discussion forward, and I wouldn't like us to stall much longer on the very first PR. |
jsotuyod
left a comment
There was a problem hiding this comment.
I thought we had agreed to proceed with this, but just in case…
There was a problem hiding this comment.
This is the only place where we use a stream instead of a list, any reason for that?
There was a problem hiding this comment.
Possibly, looking up a method may involve looking up all the supertypes of the class in scope (and enclosing classes and their supertypes, for inner/anonymous classes). So since this might be costly, doing it lazily and stopping the hierarchy exploration as soon as we find a matching method might improve performance. A return type that is not a list also documents that the methods may not actually be prefetched.
But maybe all of this should be mentioned on the doc of the method.
There was a problem hiding this comment.
The thing is, unless the method I'm looking for has 0 args / I have an EXACT match on all parameters, I'll always need all to determine which overload best fits my arguments and that requires the complete list.
Probably most methods will fit that bill 'though…
There was a problem hiding this comment.
May be... Tbh it's too soon to say, we'll have to profile this for different scenarios when it's implemented and used by type resolution.
b976d22 to
b5ab656
Compare
4f8f68f to
12a52bc
Compare
|
@oowekyala build is failing due to a missing symbol… maybe a bad rebase? |
This is only done to split the work into two PRs. TODO revert me
Some utilities used by the symbol table tests were deleted in this commit to reduce the diff. TODO restore them with the next PR
About Optional for example
799b517 to
cb65fa3
Compare
This is the first third of #1536. It contains interfaces to represent code declarations (e.g. a class/ method/ field declaration) that abstracts over whether the class was found through reflection or by parsing a source file.
The goals are
For comparison, this kind of framework is e.g. found in Spoon (code analysis and transformation processor). The explanation on that page is very relevant to us. IntelliJ does some voodoo stuff with symbolic AST nodes backed by an on-disk serialized index, that trigger a full file reparse if you ask them a question that the index doesn't know... we probably don't need that do we
About VCS history:
Navigating the changes