Ruby: Document flow summary syntax#10899
Conversation
|
@hmac Could you split this PR? Documentation changes for the GA should go into If you think the doc changes are not fit for the GA release, then please retarget this PR to |
|
I don't know why this has a GA label - these are internal docs (and also this PR is unfinished). I've removed it. |
Ok great! I mistakenly labelled it as GA, sorry for that. |
edadd59 to
9776dfd
Compare
hvitved
left a comment
There was a problem hiding this comment.
Thanks for the excellent write-up.
ruby/ql/docs/flow_summaries.md
Outdated
|
|
||
| `input` and `output` define the "from" and "to" locations in the flow summary. | ||
| They use a custom string-based syntax which is similar but not identical to | ||
| Models as Data. These strings are often referred to as access paths. |
There was a problem hiding this comment.
These strings should be identical to the input and output string in MaD.
There was a problem hiding this comment.
As I understand it, MaD access paths map to API graph paths wheras flow summary paths map to dataflow nodes, and so the tokens supported by each can differ. This slack thread highlights one difference. Another example I've just spotted is that MaD has no support for the hash-splat specifier in Argument tokens.
There was a problem hiding this comment.
Ok, I see what you mean. However, I think of MaD collectively as all the components that make up a summary; but you are right that the access path component (which identifies the relevant calls) maps to API graphs, where as the input/output components have their own (but similar) syntax.
There was a problem hiding this comment.
I don't think we should call out the difference with the MaD path column just now (called the "identifying access path"). We can elaborate on this when documenting the MaD format.
There was a problem hiding this comment.
I've slightly reworded this sentence, but I want to retain the reference to MaD because I think it's useful to the reader. Putting myself in the shoes of a newcomer to the codebase: if I've encountered MaD already then this helps provide context and a point of comparison, and if I haven't seen MaD yet then this lets me know that it exists and I can choose to explore more by reading the MaD documentation if I want (and if/when it exists).
| Any keyword argument to the call. | ||
|
|
||
| #### `hash-splat` | ||
| The special "hash splat" argument/parameter, which is written as `**args`. |
There was a problem hiding this comment.
Note that this will also include all explicit keyword arguments, wrapped in an implicit hash splat argument.
There was a problem hiding this comment.
I've pushed a commit that elaborates on this.
| `ReturnValue` refers to the return value of the element identified in the | ||
| preceding access path. For example, `Argument[0].ReturnValue` refers to the | ||
| return value of the first argument. Of course this only makes sense if the first | ||
| argument is a callable. |
There was a problem hiding this comment.
Maybe callback instead of callable.
There was a problem hiding this comment.
Interesting, I wrote callable to mean that, of course, it makes no sense to use ReturnValue on an object. Instead it must be a proc, lambda or block. I guess depending on how you define "callback", then these would all be considered callbacks. Ruby doesn't tend to use that terminology, though, and in cases like arr.detect(&:even?) I would probably not call &:even? a callback. 🤷
Would it improve things if I change this to say if the first argument is callable - i.e. it is a proc, lambda or block.?
ruby/ql/docs/flow_summaries.md
Outdated
| TODO: I've not seen this component used in an output path; I don't know if it makes | ||
| sense to do so, or what meaning it would have. |
There was a problem hiding this comment.
Doesn't make sense to use in ouput.
ruby/ql/docs/flow_summaries.md
Outdated
| TODO: I've not seen this component used in an output path; I don't know if it makes | ||
| sense to do so, or what meaning it would have. |
ruby/ql/docs/flow_summaries.md
Outdated
| ``` | ||
|
|
||
| any data in any index of the first argument will be copied to the return value, | ||
| with the exception of data at index 0. |
There was a problem hiding this comment.
A prominent use case of WithoutElement is when you want to model a method that removes (a set of) element(s) from a collection. For example, if we want to model a method that removes the first element of the receiver collection, we can do so via
input = "Argument[self].WithoutElement[0]" and
output = "Argument[self]"where both the input and the output refer to the same argument. The data-flow library will then prevent use-use flow from any corresponding receivers:
a[0] = taint
a[1] = taint
a.remove_first # use-use flow from `a` to the `a` below will be blocked, but there will still be flow from `[post-update] a` to `a` below
sink(a[0]) # good
sink(a[1]) # badThere was a problem hiding this comment.
Thanks! I've written this example into the docs. If you get time, it would be great to have an example of where you should use .WithoutElement[...].WithElement[...]. I've tried removing instances of this pattern from every flow summary and all tests still pass, which leads me to believe that it is redundant.
There was a problem hiding this comment.
A good example of this is something like Hash#except. If we see a call like hash.except(:foo), we would like that to map to a flow summary with
input = Argument[self].WithoutElement[":foo"].WithElement[any]
output = ReturnValue
I.e. data can flow out of the call, but only when it is stored in some element that is not known to be at index :foo.
There was a problem hiding this comment.
I'm still not sure that WithElement[any] is necessary. Test 53 has an example:
private class S53 extends SimpleSummarizedCallable {
S53() { this = "s53" }
override predicate propagatesFlowExt(string input, string output, boolean preservesValue) {
preservesValue = true and
input = "Argument[self].WithoutElement[:foo]" and
output = "ReturnValue"
}
}def m53(i, h)
h[:foo] = source("a")
h[:bar] = source("b")
h[i] = source("c")
sink h[:foo] # $ hasValueFlow=a hasValueFlow=c
sink h[:bar] # $ hasValueFlow=b hasValueFlow=c
x = h.s53()
sink x[:foo]
sink x[:bar] # $ hasValueFlow=b
endThere was a problem hiding this comment.
This would also incorrectly include flow for sink(source("d").s53()).
There was a problem hiding this comment.
Hmmm... that seems very unintuitive to me, but you're certainly right. How do I explain this? Maybe something like:
An important point to note is that in a summary such as
input = "Argument[self].WithoutElement[0]" and output = "ReturnValue"if
Argument[self]contains data, it will be copied toReturnValue. If you only want to copy data in elements, and not in the container itself, addWithElement[any]to the input path:input = "Argument[self].WithoutElement[0].WithElement[any]" and output = "ReturnValue"See tests 53 and 54 [todo: i'll add this test] for examples of this behaviour.
?
There was a problem hiding this comment.
Yes, that looks good. WithoutElement[x] just says that data must not be stored inside a collection at index x, not that data must be stored inside a collection.
This is also the reason why something like a reverse method needs a summary like
input = "Argument[self].WithElement[any]" and
output = "ReturnValue"and not
input = "Argument[self]" and
output = "ReturnValue"|
Ping @hmac - ok to merge (after resolving conflicts)? |
|
There's a few comments above where I'd like to get a response from @hvitved before merging this. |
59a519c to
18fe233
Compare
These test cases are a companion to the flow summary docs, and ensure that the documentated behaviour matches reality.
c8f67ba to
f49507e
Compare
This PR adds some internal documentation for the Ruby flow summary syntax. There's a few TODOs left where I'm not sure of some behaviour - if you know the answer, let me know and I'll fill them in.