Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: add DataFlow::Content for arrays #13741

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

rdmarsh2
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the Swift label Jul 13, 2023
@rdmarsh2
Copy link
Contributor Author

Some more checking to do about Array member functions before I'm ready to merge this, but it seems to work so far.

@rdmarsh2 rdmarsh2 marked this pull request as ready for review July 18, 2023 14:08
@rdmarsh2 rdmarsh2 requested a review from a team as a code owner July 18, 2023 14:08
@MathiasVP
Copy link
Contributor

Also this needs a DCA run. But probably not before the above fixes have been implemented.

@MathiasVP
Copy link
Contributor

Oh, one more thing: We need to add ArrayContent to forceHighPrecision like Java does here: https://github.com/github/codeql/blob/main/java/ql/lib/semmle/code/java/dataflow/internal/DataFlowPrivate.qll#L391C3-L391C28. This will prevent an access-path blowup on certain evil databases.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not looking at this sooner.

This looks great. Thank you for including models-as-data support (ArrayElement), there are a number of models that should benefit from using this in the near future.

var matrix = [[source()]]
sink(arg: matrix[0])
sink(arg: matrix[0][0]) // $ flow=645
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does assignment to a multi-dimensional array work as well?

var matrix2 = [[1]]
matrix2[0][0] = source()
sink(arg: matrix2[0][0])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't, and now that I look at the cases I based this off of I'm not sure we handle nested fields or tuples properly either...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. Then we should probably create an issue for this (in all content types) rather than trying to address it for just arrays here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding some tests it seems to work for nested fields... I'm not sure why, though.

Copy link
Contributor

@MathiasVP MathiasVP Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of "reverse reads": if readStep(n2.(PostUpdateNode).getPreUpdateNode(), f, n1.(PostUpdateNode).getPreUpdateNode()) then the dataflow library generates storeStep(n1, f, n2). So if this doesn't work for array content then it's either because the post-update/pre-update isn't working, or because the readStep isn't working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually that there's a step through an InOutExpr between the store and read step:

#  650|     getElement(11): [AssignExpr]  ... = ...
#  650|       getDest(): [SubscriptExpr] ...[...]
#  650|         getBase(): [InOutExpr] &...
#  650|           getSubExpr(): [SubscriptExpr] ...[...]
#  650|             getBase(): [InOutExpr] &...
#  650|               getSubExpr(): [DeclRefExpr] matrix2

var arr3 = [1]
var arr4 = arr2 + arr3
sink(arg: arr3[0])
sink(arg: arr4[0]) // $ MISSING: flow=642
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a model of Array.+(_:_:) in ArraySummaries fix this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should.

sink(arg: arr4[0]) // $ MISSING: flow=642
var arr5 = Array(repeating: source(), count: 2)
sink(arg: arr5[0]) // $ MISSING: flow=654
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again it's possible a model of the initializer is all we need here. It can wait as follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@geoffw0
Copy link
Contributor

geoffw0 commented Jul 19, 2023

DCA shows a 6.6x analysis slowdown, which hopefully @MathiasVP s suggestion will fix. We can tolerate some slowdown for an improvement like this.

@geoffw0
Copy link
Contributor

geoffw0 commented Jul 20, 2023

New DCA run LGTM. Not a hint of performance issues. Good work!

Comment on lines +483 to +484
component.getName() = "ArrayElement" and
content instanceof Content::ArrayContent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be pulled into its own predicate to follow the structure of parseField?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a considerably simpler case than parseField, I'd be happy either way TBH.

@@ -111,7 +111,8 @@ private module Cached {
hasExprNode(n,
[
any(Argument arg | modifiable(arg)).getExpr(), any(MemberRefExpr ref).getBase(),
any(ApplyExpr apply).getQualifier(), any(TupleElementExpr te).getSubExpr()
any(ApplyExpr apply).getQualifier(), any(TupleElementExpr te).getSubExpr(),
any(SubscriptExpr se).getBase(), any(InOutExpr ioe).getSubExpr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a better fix maybe be to hide InOutExpr from the AST entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe - we might break some other things by doing that... I can try it though

@@ -705,7 +709,7 @@ predicate storeStep(Node node1, ContentSet c, Node node2) {
node1.asExpr() = assign.getSource() and
node2.(PostUpdateNode).getPreUpdateNode().asExpr() = subscript.getBase() and
subscript = assign.getDest() and
subscript.getBase().getType().(InOutType).getObjectType() instanceof ArrayType and
// subscript.getBase().getType().(InOutType).getObjectType() instanceof ArrayType and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to comment this out? Based on 42cc644#r1267210752, I gathered that conjunct was important

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh oops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants