Hi,
For my work on the metrics framework, I need a way to find the qualified names of the methods that are called and fields that are accessed in a block of code. This is necessary to resolve accurately the usage of methods and data across a multi file project, and I believe it's the next big step of my project.
Type resolution could greatly help to do so, but after a discussion with @WinterGrascph, it appears it's not exhaustive enough yet: it currently populates the type of AST nodes, and the parser groups some elements in the same nodes. For instance, an expression such as a.field.foo() would be parsed into Primary[Prefix[a.field.foo], Suffix[()]], and type resolution would only populate the type of the prefix (type of foo method invocation) and the suffix containing the () arguments and the Primary expression would have the same types as well. That means we miss the type of a, and a.field. Similarly, when a similar expression is parenthesized ((a).field.foo()), it would be parsed into Primary[Prefix[Expression[(a)]], Suffix[field], Suffix[foo], Suffix[()]], here the prefix with (a) would have the type of a, suffix field would have field's type assigned and so on.
There's a couple of thing here that deserve attention:
- These two expressions, while being semantically equivalent, are parsed completely differently. This complicates the task of rules that want to, eg, forbid calls to some methods, as they have to parse the nodes' images and consider every edge case to be exhaustive. Moreover, the parsing rules are quite counter intuitive and hard to grasp for a rule designer.
- Type resolution is able to resolve the type of the entire chain, however these types are not kept in the first case, just because there's no node to bear them. We therefore don't exploit type resolution to its full power. That's too bad for the metrics framework too, as some field accesses are not considered in this case.
Since we can't change the way the AST is parsed, one solution to these problems would be to create another layer of abstraction atop PrimaryExpressions. This layer could bind a data structure that would represent the expression's structure in a more intuitive way. This would
- Ensure semantically equivalent expressions have an identical representation even though the AST nodes are not identical
- Provide a higher level interface for programmers, who wouldn't need to parse images anymore
- Enable type resolution to store the type of all subexpressions in a call chain (good for metrics), whatever the underlying AST
I imagine such a representation could look like the following for expression a.field.foo():
- MethodCall
- image="a.field.foo()"
- type=<foo's return type>
- methodName="foo"
- arguments=...
- leftHandSide={FieldAccess}
- image="a.field"
- type=<type of field>
- fieldName="field"
- leftHandSide={Variable}
- name="a"
- type=<type of a>
The representation would probably be created and populated by the type resolution visitor. Since the implementation of this feature is necessary for the metrics framework to move forward, I guess I could work on it with @WinterGrascph, which would need some organisation
Comments? @adangel @jsotuyod
Hi,
For my work on the metrics framework, I need a way to find the qualified names of the methods that are called and fields that are accessed in a block of code. This is necessary to resolve accurately the usage of methods and data across a multi file project, and I believe it's the next big step of my project.
Type resolution could greatly help to do so, but after a discussion with @WinterGrascph, it appears it's not exhaustive enough yet: it currently populates the type of AST nodes, and the parser groups some elements in the same nodes. For instance, an expression such as
a.field.foo()would be parsed intoPrimary[Prefix[a.field.foo], Suffix[()]], and type resolution would only populate the type of the prefix (type offoomethod invocation) and the suffix containing the () arguments and the Primary expression would have the same types as well. That means we miss the type ofa, anda.field. Similarly, when a similar expression is parenthesized ((a).field.foo()), it would be parsed intoPrimary[Prefix[Expression[(a)]], Suffix[field], Suffix[foo], Suffix[()]], here the prefix with(a)would have the type ofa, suffix field would have field's type assigned and so on.There's a couple of thing here that deserve attention:
Since we can't change the way the AST is parsed, one solution to these problems would be to create another layer of abstraction atop
PrimaryExpressions. This layer could bind a data structure that would represent the expression's structure in a more intuitive way. This wouldI imagine such a representation could look like the following for expression
a.field.foo():The representation would probably be created and populated by the type resolution visitor. Since the implementation of this feature is necessary for the metrics framework to move forward, I guess I could work on it with @WinterGrascph, which would need some organisation
Comments? @adangel @jsotuyod