Skip to content

Shared: use shared access path syntax to parse arguments in CSV rows#8149

Merged
asgerf merged 7 commits intogithub:mainfrom
asgerf:shared/use-shared-access-path-syntax
Feb 25, 2022
Merged

Shared: use shared access path syntax to parse arguments in CSV rows#8149
asgerf merged 7 commits intogithub:mainfrom
asgerf:shared/use-shared-access-path-syntax

Conversation

@asgerf
Copy link
Contributor

@asgerf asgerf commented Feb 21, 2022

Previously it was up to each language to parse the numeric arguments in an access path token, such as Argument[1..3]. This has lead to some duplication and inconsistency, with plenty of regexpCapture calls doing similar things.

This PR places some common parsing primitives into the AccessPathSyntax.qll file and uses those where possible.

Note that for the identifying access path (in dynamic languages), the syntax Argument[N-1] means the last argument of a call, and Argument[N-2] the second-last, etc. For the time being at least, this syntax is specific to the identifying access path and therefore to dynamic languages, but I still think the shared AccessPathSyntax.qll file is the best place for this code to live. The predicate for parsing it, parseIntWithExplicitArity, can only be invoked with a known arity, and the call-site arity is not generally known when interpreting an input/output spec (for static langauges, it should be known to the author writing the spec, so there is no need to use the N-1 syntax). I'm mentioning this here in case you're wondering why this code appears to be unused for your language.

Also replaces a few regexpCapture calls for extracting a field name from a token such as Field[x]. The qualified name of a field can contain commas (at least for C#), so to avoid splitting it, we extract it with getArgumentList() rather than getAnArgument().

@asgerf asgerf added the no-change-note-required This PR does not need a change note label Feb 21, 2022
smowton
smowton previously approved these changes Feb 21, 2022
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java version of this looks fine

regexpCaptureTwo(arg, "(-?\\d+)\\.\\.N-(\\d+)", lo, hi) and
result = [lo.toInt() .. arity - hi.toInt()]
or
// N-x..Ny
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// N-x..Ny
// N-x..N-y

@asgerf asgerf force-pushed the shared/use-shared-access-path-syntax branch from da288ee to f1bfb31 Compare February 23, 2022 13:13
@asgerf asgerf marked this pull request as ready for review February 23, 2022 14:53
@asgerf asgerf requested review from a team as code owners February 23, 2022 14:53
result = FlowSummary::SummaryComponent::arrayElementUnknown()
or
exists(int i |
c.regexpCapture("ArrayElement\\[([0-9]+)\\]", 1).toInt() = i and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this regexp did not allow negative indices (but the range-parsing below did). This is why the follow-up commit is necessary, to prevent ArrayElement[-1] from being interpreted as ArrayElement[?]. Some of the dynamically generated input/output specs end up referencing negative array indices (and should not have wrap-around semantics).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine – I can see that you've moved the restriction to nonnegative indices up to arrayElementKnown.

@asgerf
Copy link
Contributor Author

asgerf commented Feb 23, 2022

@nickrolfe can you review the Ruby changes? (see my inline comment)

result = FlowSummary::SummaryComponent::arrayElementUnknown()
or
exists(int i |
c.regexpCapture("ArrayElement\\[([0-9]+)\\]", 1).toInt() = i and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine – I can see that you've moved the restriction to nonnegative indices up to arrayElementKnown.

c.regexpCapture("ArrayElement\\[([-0-9]+)\\.\\.([0-9]+)\\]", 1).toInt() = i1 and
c.regexpCapture("ArrayElement\\[([-0-9]+)\\.\\.([0-9]+)\\]", 2).toInt() = i2 and
result = FlowSummary::SummaryComponent::arrayElementKnown([i1 .. i2])
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a second to realise why this could be deleted, but it looks like this is now handled as part of the call to parseInt above, on line 80.

Copy link
Contributor

@tamasvajk tamasvajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any performance implications of these changes?

@michaelnebel
Copy link
Contributor

Are there any performance implications of these changes?

Maybe run DCA before merging?

@asgerf
Copy link
Contributor Author

asgerf commented Feb 24, 2022

Absolutely. I've started DCA runs for each language.

@asgerf
Copy link
Contributor Author

asgerf commented Feb 25, 2022

DCA evaluations were quite flaky, but overall doesn't indicate a slow-down:

  • Ruby
  • Java - this was particularly noisy, but symmetric on both failures and timings.
  • JavaScript
  • C#

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a really nice improvement on both readability and sharing!

@asgerf asgerf merged commit a8bfeba into github:main Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C# Java JS no-change-note-required This PR does not need a change note Ruby

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants