Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby: expand DataFlow API #11024

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Oct 28, 2022

Adds some classes to do basic stuff with DataFlow nodes without having to map down to CFG and AST nodes, and a few other tidbits I found myself missing.

To ensure some degree of battle-hardening I (mostly) ported three library models to use the new nodes where it makes sense. The affected models are Rails.qll, Railties.qll, and ActionController.qll.

Migrating from AST nodes to DataFlow nodes revealed a couple of pre-existing issues, some of which have now been fixed or fixed enough to avoid a regression:

Some other issues which were too hard to fix prior to opening this PR:

  • We're missing flows into captured variables in some cases, in particular when returning a lambda. I attempted a fix in this draft PR which would align closer with how we do things in JS, but after a whole day of debugging strange call graph changes, I've put that aside for now. Instead, I've resorted to a rather unambitious workaround in ab91b57.
  • Blocks are assumed to capture self, but of course that's not always the case. There is currently no way to tell the analysis what the value of self is, so we have to work around it for now. This means using things like getParent+() and treating the module self as if it's an instance of the module.

getConst() and what about API graphs?

One of the changes that may look odd is the introduction of DataFlow::getConst() and ConstRef. It provides convenient access to constants that may resolve to a given qualified name. For example, the incantation

DataFlow::getConst("ActionController").getConst("Base").getADescendentModule()

gives us all descendents of ActionController::Base.

This was probably the biggest use case for API graphs in Ruby, as you can get to (almost) the same thing with API::getTopLevelMember("ActionController").getMember("Base"). So why add getConst()?

  • getConst() doesn't rely on the call graph, which means it can be used for things like getACallSimple() and other pre-call graph customizations (we have quite a few of those in JS and I expect we'll find the need in Ruby as well).
  • I intend to share more of the API graphs implementation between languages (stay tuned!), but the rules for constant lookup in Ruby are pretty language-specific, so it would be nice to have some of this factored out anyway.
  • getConst() handles the rules of constant lookup more precisely than API graphs, in particular when related to include, nested modules, and modules with more than one declaration. I think API graphs could be extended to make this work, but it's not really a natural fit for its use/def tracking, it would need to be a bunch of new predicates specific to constant lookup, which might as well exist earlier in the pipeline.

I replaced some uses of API graphs with getConst, not because it matters in practice, I just had to ensure getConst() is being used enough to test it in the wild.

I hope the name getConst() makes more sense in the context of Ruby. I think the name in API graphs, getMember, was used mostly because that's what JS called it.

Evaluation

Evaluation shows

  • 100 new taint sinks
  • 491 new taint sources, resulting in 1890 new tainted expressions
  • 3 new call edges due to the change in flowsTo
  • There seems to be a slight performance hit, which I might investigate a bit further, but I wanted to get the PR up for now.

The new sources and sinks are mainly due to resolving methods upwards in the hierarchy in ActionController. The combination getADescendentModule().getAnInstanceMethod() takes "sideways" method resolution into account without having to really think about it, so I like that.

@github-actions github-actions bot added the Ruby label Oct 28, 2022
Copy link

@github-code-scanning github-code-scanning bot left a comment

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@asgerf asgerf force-pushed the rb/data-flow-layer-capture2 branch from 99c8eba to be1d931 Compare Oct 28, 2022
@asgerf asgerf force-pushed the rb/data-flow-layer-capture2 branch 2 times, most recently from 04a5c91 to c2e33a2 Compare Oct 28, 2022
These had the same name and ended up being unified
@asgerf asgerf force-pushed the rb/data-flow-layer-capture2 branch from a3b5401 to be87c74 Compare Oct 31, 2022
@asgerf asgerf force-pushed the rb/data-flow-layer-capture2 branch from be87c74 to ab4e341 Compare Oct 31, 2022
@asgerf asgerf changed the title Tests Ruby: expand DataFlow API Nov 1, 2022
@asgerf asgerf marked this pull request as ready for review Nov 1, 2022
@asgerf asgerf requested a review from a team as a code owner Nov 1, 2022
@asgerf asgerf added the no-change-note-required This PR does not need a change note label Nov 1, 2022
Copy link
Contributor

@aibaars aibaars left a comment

Partial review.

@@ -170,6 +170,24 @@ module ConstantValue {

/** A constant `nil` value. */
class ConstantNilValue extends ConstantValue, TNil { }

/** Gets the integer constant `x`. */
ConstantValue getInt(int x) { result.getInt() = x }
Copy link
Contributor

@aibaars aibaars Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit strange that the "inverse" of getInt is named getInt . Wouldn't ConstantValue::fromInt be a better name?

@@ -23,6 +24,20 @@ class Module extends TModule {
/** Gets an `include`d module. */
Module getAnIncludedModule() { result = getAnIncludedModule(this) }

/** Gets the super class or an included or prepended module. */
Copy link
Contributor

@aibaars aibaars Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is very similar to https://github.com/github/codeql/pull/10747/files#diff-9333de60c5c1b8283b484ae3b7e3cbef24b7bf52aee695df420d0e3f4d55403b

The main difference is that you use * instead of + which would make all modules ancestors of themselves, which is odd (even though it is handy in some cases).

@hvitved suggested a pragma[inline] annotation on getAnAncestor , perhaps you should do that as well.

* For example, the canonical enclosing module of `A::B` is `A`, and `A` itself has no canonical enclosing module.
*/
pragma[nomagic]
Module getCanonicalEnclosingModule() { result.getQualifiedName() = this.getEnclosingModuleName() }
Copy link
Contributor

@aibaars aibaars Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what a "canonical" enclosing module means. Could you give an example of a non-canonical enclosing module?

Node getAnElementWriteValue() {
exists(CallNode call |
call = this.getAMethodCall("[]=") and
call.getNumberOfArguments() = 2 and
Copy link
Contributor

@aibaars aibaars Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have assignments like

obj[a, b, c] = x

I'm not sure we want to capture such things in this predicate though. Just wanted to make sure you ignored such calls on purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants