Shared: use shared access path syntax to parse arguments in CSV rows#8149
Shared: use shared access path syntax to parse arguments in CSV rows#8149asgerf merged 7 commits intogithub:mainfrom
Conversation
smowton
left a comment
There was a problem hiding this comment.
Java version of this looks fine
| regexpCaptureTwo(arg, "(-?\\d+)\\.\\.N-(\\d+)", lo, hi) and | ||
| result = [lo.toInt() .. arity - hi.toInt()] | ||
| or | ||
| // N-x..Ny |
There was a problem hiding this comment.
| // N-x..Ny | |
| // N-x..N-y |
da288ee to
f1bfb31
Compare
| result = FlowSummary::SummaryComponent::arrayElementUnknown() | ||
| or | ||
| exists(int i | | ||
| c.regexpCapture("ArrayElement\\[([0-9]+)\\]", 1).toInt() = i and |
There was a problem hiding this comment.
Note that this regexp did not allow negative indices (but the range-parsing below did). This is why the follow-up commit is necessary, to prevent ArrayElement[-1] from being interpreted as ArrayElement[?]. Some of the dynamically generated input/output specs end up referencing negative array indices (and should not have wrap-around semantics).
There was a problem hiding this comment.
This looks fine – I can see that you've moved the restriction to nonnegative indices up to arrayElementKnown.
|
@nickrolfe can you review the Ruby changes? (see my inline comment) |
| result = FlowSummary::SummaryComponent::arrayElementUnknown() | ||
| or | ||
| exists(int i | | ||
| c.regexpCapture("ArrayElement\\[([0-9]+)\\]", 1).toInt() = i and |
There was a problem hiding this comment.
This looks fine – I can see that you've moved the restriction to nonnegative indices up to arrayElementKnown.
| c.regexpCapture("ArrayElement\\[([-0-9]+)\\.\\.([0-9]+)\\]", 1).toInt() = i1 and | ||
| c.regexpCapture("ArrayElement\\[([-0-9]+)\\.\\.([0-9]+)\\]", 2).toInt() = i2 and | ||
| result = FlowSummary::SummaryComponent::arrayElementKnown([i1 .. i2]) | ||
| ) |
There was a problem hiding this comment.
It took me a second to realise why this could be deleted, but it looks like this is now handled as part of the call to parseInt above, on line 80.
tamasvajk
left a comment
There was a problem hiding this comment.
Are there any performance implications of these changes?
Maybe run DCA before merging? |
|
Absolutely. I've started DCA runs for each language. |
|
DCA evaluations were quite flaky, but overall doesn't indicate a slow-down:
|
michaelnebel
left a comment
There was a problem hiding this comment.
Looks like a really nice improvement on both readability and sharing!
Previously it was up to each language to parse the numeric arguments in an access path token, such as
Argument[1..3]. This has lead to some duplication and inconsistency, with plenty ofregexpCapturecalls doing similar things.This PR places some common parsing primitives into the
AccessPathSyntax.qllfile and uses those where possible.Note that for the identifying access path (in dynamic languages), the syntax
Argument[N-1]means the last argument of a call, andArgument[N-2]the second-last, etc. For the time being at least, this syntax is specific to the identifying access path and therefore to dynamic languages, but I still think the sharedAccessPathSyntax.qllfile is the best place for this code to live. The predicate for parsing it,parseIntWithExplicitArity, can only be invoked with a known arity, and the call-site arity is not generally known when interpreting an input/output spec (for static langauges, it should be known to the author writing the spec, so there is no need to use theN-1syntax). I'm mentioning this here in case you're wondering why this code appears to be unused for your language.Also replaces a few
regexpCapturecalls for extracting a field name from a token such asField[x]. The qualified name of a field can contain commas (at least for C#), so to avoid splitting it, we extract it withgetArgumentList()rather thangetAnArgument().