Shared: use shared access path syntax to parse arguments in CSV rows by asgerf · Pull Request #8149 · github/codeql

asgerf · 2022-02-21T14:40:15Z

Previously it was up to each language to parse the numeric arguments in an access path token, such as Argument[1..3]. This has lead to some duplication and inconsistency, with plenty of regexpCapture calls doing similar things.

This PR places some common parsing primitives into the AccessPathSyntax.qll file and uses those where possible.

Note that for the identifying access path (in dynamic languages), the syntax Argument[N-1] means the last argument of a call, and Argument[N-2] the second-last, etc. For the time being at least, this syntax is specific to the identifying access path and therefore to dynamic languages, but I still think the shared AccessPathSyntax.qll file is the best place for this code to live. The predicate for parsing it, parseIntWithExplicitArity, can only be invoked with a known arity, and the call-site arity is not generally known when interpreting an input/output spec (for static langauges, it should be known to the author writing the spec, so there is no need to use the N-1 syntax). I'm mentioning this here in case you're wondering why this code appears to be unused for your language.

Also replaces a few regexpCapture calls for extracting a field name from a token such as Field[x]. The qualified name of a field can contain commas (at least for C#), so to avoid splitting it, we extract it with getArgumentList() rather than getAnArgument().

smowton

Java version of this looks fine

smowton · 2022-02-21T16:32:46Z

java/ql/lib/semmle/code/java/dataflow/internal/AccessPathSyntax.qll

+      regexpCaptureTwo(arg, "(-?\\d+)\\.\\.N-(\\d+)", lo, hi) and
+      result = [lo.toInt() .. arity - hi.toInt()]
+      or
+      // N-x..Ny


Suggested change

// N-x..Ny

// N-x..N-y

…ent[?]

asgerf · 2022-02-23T15:00:18Z

ruby/ql/lib/codeql/ruby/dataflow/internal/FlowSummaryImplSpecific.qll

  result = FlowSummary::SummaryComponent::arrayElementUnknown()
  or
  exists(int i |
-    c.regexpCapture("ArrayElement\\[([0-9]+)\\]", 1).toInt() = i and


Note that this regexp did not allow negative indices (but the range-parsing below did). This is why the follow-up commit is necessary, to prevent ArrayElement[-1] from being interpreted as ArrayElement[?]. Some of the dynamically generated input/output specs end up referencing negative array indices (and should not have wrap-around semantics).

This looks fine – I can see that you've moved the restriction to nonnegative indices up to arrayElementKnown.

asgerf · 2022-02-23T15:01:44Z

@nickrolfe can you review the Ruby changes? (see my inline comment)

nickrolfe · 2022-02-24T12:15:46Z

ruby/ql/lib/codeql/ruby/dataflow/internal/FlowSummaryImplSpecific.qll

  result = FlowSummary::SummaryComponent::arrayElementUnknown()
  or
  exists(int i |
-    c.regexpCapture("ArrayElement\\[([0-9]+)\\]", 1).toInt() = i and


This looks fine – I can see that you've moved the restriction to nonnegative indices up to arrayElementKnown.

nickrolfe · 2022-02-24T12:18:37Z

ruby/ql/lib/codeql/ruby/dataflow/internal/FlowSummaryImplSpecific.qll

-    c.regexpCapture("ArrayElement\\[([-0-9]+)\\.\\.([0-9]+)\\]", 1).toInt() = i1 and
-    c.regexpCapture("ArrayElement\\[([-0-9]+)\\.\\.([0-9]+)\\]", 2).toInt() = i2 and
-    result = FlowSummary::SummaryComponent::arrayElementKnown([i1 .. i2])
-  )


It took me a second to realise why this could be deleted, but it looks like this is now handled as part of the call to parseInt above, on line 80.

tamasvajk

Are there any performance implications of these changes?

michaelnebel · 2022-02-24T13:47:30Z

Are there any performance implications of these changes?

Maybe run DCA before merging?

asgerf · 2022-02-24T14:56:03Z

Absolutely. I've started DCA runs for each language.

asgerf · 2022-02-25T07:42:55Z

DCA evaluations were quite flaky, but overall doesn't indicate a slow-down:

Ruby
Java - this was particularly noisy, but symmetric on both failures and timings.
JavaScript
C#

michaelnebel

Looks like a really nice improvement on both readability and sharing!

asgerf added the no-change-note-required This PR does not need a change note label Feb 21, 2022

github-actions bot added C# Java JS Ruby labels Feb 21, 2022

smowton previously approved these changes Feb 21, 2022

View reviewed changes

asgerf dismissed smowton’s stale review via 64d7b28 February 22, 2022 08:21

asgerf added 7 commits February 23, 2022 14:13

Shared: move numeric parsing into AccessPathSyntax.qll

abd4933

Shared: sync AccessPathSyntax.qll

5cab737

C#: use AccessPathSyntax library

9cff065

Java: use AccessPathSyntax library

efec348

Ruby: use AccessPathSyntax library

a11c6f0

Ruby: reject ArrayElement[-n] instead of interpreting it as ArrayElem…

bb9348d

…ent[?]

Shared: fix typo in a comment

f1bfb31

asgerf force-pushed the shared/use-shared-access-path-syntax branch from da288ee to f1bfb31 Compare February 23, 2022 13:13

asgerf marked this pull request as ready for review February 23, 2022 14:53

asgerf requested review from a team as code owners February 23, 2022 14:53

asgerf commented Feb 23, 2022

View reviewed changes

nickrolfe approved these changes Feb 24, 2022

View reviewed changes

tamasvajk reviewed Feb 24, 2022

View reviewed changes

michaelnebel approved these changes Feb 25, 2022

View reviewed changes

asgerf merged commit a8bfeba into github:main Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared: use shared access path syntax to parse arguments in CSV rows#8149

Shared: use shared access path syntax to parse arguments in CSV rows#8149
asgerf merged 7 commits intogithub:mainfrom
asgerf:shared/use-shared-access-path-syntax

asgerf commented Feb 21, 2022

Uh oh!

smowton left a comment

Uh oh!

smowton Feb 21, 2022

Uh oh!

asgerf Feb 23, 2022

Uh oh!

nickrolfe Feb 24, 2022

Uh oh!

asgerf commented Feb 23, 2022

Uh oh!

nickrolfe Feb 24, 2022

Uh oh!

nickrolfe Feb 24, 2022

Uh oh!

tamasvajk left a comment

Uh oh!

michaelnebel commented Feb 24, 2022

Uh oh!

asgerf commented Feb 24, 2022

Uh oh!

asgerf commented Feb 25, 2022

Uh oh!

michaelnebel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

asgerf commented Feb 21, 2022

Uh oh!

smowton left a comment

Choose a reason for hiding this comment

Uh oh!

smowton Feb 21, 2022

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

nickrolfe Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

asgerf commented Feb 23, 2022

Uh oh!

nickrolfe Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

nickrolfe Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

tamasvajk left a comment

Choose a reason for hiding this comment

Uh oh!

michaelnebel commented Feb 24, 2022

Uh oh!

asgerf commented Feb 24, 2022

Uh oh!

asgerf commented Feb 25, 2022

Uh oh!

michaelnebel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants