Skip to content

Performance: Improve join order in data flow library#6165

Merged
aschackmull merged 1 commit intogithub:mainfrom
edoardopirovano:fix-regression
Jul 1, 2021
Merged

Performance: Improve join order in data flow library#6165
aschackmull merged 1 commit intogithub:mainfrom
edoardopirovano:fix-regression

Conversation

@edoardopirovano
Copy link
Copy Markdown
Contributor

@edoardopirovano edoardopirovano commented Jun 26, 2021

Follows on from #6149

This adds a few more pragmas that are unfortunately needed to convince the optimiser to pick the right join order for this predicate in a few cases. This should hopefully fix the remaining regression from the optimiser's recent heuristics change.

cc. @github/codeql-core

Differences jobs

C++: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/2125/
Python: https://jenkins.internal.semmle.com/job/Changes/job/Python-Differences/596/
C#: https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/1163/
Java: https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1479/

@edoardopirovano
Copy link
Copy Markdown
Contributor Author

Differences jobs all look good to me!

@aschackmull
Copy link
Copy Markdown
Contributor

Differences jobs all look good to me!

Really? CPP looks like it has a consistent slowdown in SignedOverFlowCheck.ql across multiple projects.

Also, these 3 pragmas forces revFlow to be first in the pipeline. I think that's reasonable, but it does render the other two pragmas superfluous, so we should remove those.

@hvitved
Copy link
Copy Markdown
Contributor

hvitved commented Jun 29, 2021

Really? CPP looks like it has a consistent slowdown in SignedOverFlowCheck.ql across multiple projects.

I think that is just noise, but better to have @MathiasVP confirm.

@aschackmull
Copy link
Copy Markdown
Contributor

I think that is just noise, but better to have @MathiasVP confirm.

Yeah, it's hard to imagine how this PR could cause something like that, but it just looks so consistent.

@edoardopirovano
Copy link
Copy Markdown
Contributor Author

Really? CPP looks like it has a consistent slowdown in SignedOverFlowCheck.ql across multiple projects.

It didn't look particularly consistent to me (6 slower projects, 2 unchanged, 5 faster), and the increase in total time is only 0.7%. I don't know how much noise there is in the Differences jobs but less than 1% definitely seems like noise.

@edoardopirovano
Copy link
Copy Markdown
Contributor Author

Also, these 3 pragmas forces revFlow to be first in the pipeline. I think that's reasonable, but it does render the other two pragmas superfluous, so we should remove those.

I think we also need the pragmas on fwdFlow to avoid the other bad pipeline we previously saw:

[2021-06-23 12:58:30] (840s) Starting to evaluate predicate DataFlowImpl::Stage2::parameterMayFlowThrough#ffff/4@cf4805
[2021-06-23 13:21:20] (2210s) Tuple counts for DataFlowImpl::Stage2::parameterMayFlowThrough#ffff/4@cf4805:
42921      ~0%      {5} r1 = SCAN DataFlowImpl::Stage2::parameterFlow#fffff OUTPUT In.1, In.0 'p', In.2, In.3 'c', In.4 'config'
42921      ~1%      {6} r2 = JOIN r1 WITH DataFlowImplCommon::Cached::TBooleanSome#ff@staged_ext ON FIRST 1 OUTPUT Lhs.1 'p', Rhs.0, Rhs.1, Lhs.2, Lhs.3 'c', Lhs.4 'config'
42921      ~0%      {7} r3 = JOIN r2 WITH DataFlowImplCommon::ParamNode::isParameterOf_dispred#fff ON FIRST 1 OUTPUT Lhs.2, Lhs.3, Lhs.5 'config', Lhs.1 'ap', Lhs.0 'p', Lhs.4 'c', Rhs.2
9279868986 ~4%      {9} r4 = JOIN r3 WITH DataFlowImpl::Stage2::fwdFlow#fffff_23401#join_rhs ON FIRST 3 OUTPUT Lhs.3 'ap', Lhs.0, Lhs.4 'p', Lhs.1, Lhs.5 'c', Lhs.2 'config', Lhs.6, Rhs.3, Rhs.4
9279868986 ~4%      {9} r5 = SELECT r4 ON In.8 = true
9279868986 ~0%      {8} r6 = SCAN r5 OUTPUT In.7, In.4 'c', In.0 'ap', In.2 'p', In.3, In.5 'config', In.6, In.7
468670     ~0%      {8} r7 = JOIN r6 WITH DataFlowImplCommon::Cached::nodeEnclosingCallable#ff@staged_ext ON FIRST 2 OUTPUT Lhs.7, true, Lhs.4, Lhs.5 'config', Lhs.2 'ap', Lhs.3 'p', Lhs.1 'c', Lhs.6
398765     ~3%      {7} r8 = JOIN r7 WITH DataFlowImpl::Stage2::revFlow#fffff_01342#join_rhs ON FIRST 4 OUTPUT Rhs.4, Lhs.4 'ap', Lhs.5 'p', Lhs.6 'c', Lhs.3 'config', Lhs.7, Lhs.0
398652     ~25%     {6} r9 = JOIN r8 WITH DataFlowImplCommon::Cached::TBooleanSome#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.6, Lhs.1 'ap', Lhs.2 'p', Lhs.3 'c', Lhs.4 'config', Lhs.5
83682      ~76%     {6} r10 = JOIN r9 WITH DataFlowImplCommon::ReturnNodeExt::getKind_dispred#ff ON FIRST 1 OUTPUT Lhs.1 'ap', Lhs.2 'p', Lhs.3 'c', Lhs.4 'config', Lhs.5, Rhs.1
60523      ~74%     {6} r11 = r10 AND NOT DataFlowImplCommon::Cached::TParamUpdate#ff@staged_ext(Lhs.4, Lhs.5)
60523      ~96%     {4} r12 = SCAN r11 OUTPUT In.1 'p', In.2 'c', In.0 'ap', In.3 'config'
                    return r12

It's not enough to just have revFlow late in the pipeline, we need both revFlow and fwdFlow to be late.

@aschackmull
Copy link
Copy Markdown
Contributor

It's not enough to just have revFlow late in the pipeline, we need both revFlow and fwdFlow to be late.

The introduced pragmas won't put revFlow late - on the contrary it forces revFlow first. And we need at least one of revFlow and fwdFlow to come before parameterFlow, if parameterFlow is first among the 3 then the join-order is guaranteed to be bad.

@edoardopirovano
Copy link
Copy Markdown
Contributor Author

edoardopirovano commented Jun 29, 2021

The introduced pragmas won't put revFlow late - on the contrary it forces revFlow first. And we need at least one of revFlow and fwdFlow to come before parameterFlow, if parameterFlow is first among the 3 then the join-order is guaranteed to be bad.

Right, I follow now and agree we should be able to remove the other two pragmas. I've pushed a commit doing this, not sure if we want to profile again (I've checked both the queries that had bad join orders at some point locally and they are both 👍🏼).

@aschackmull aschackmull merged commit 37f8794 into github:main Jul 1, 2021
@edoardopirovano edoardopirovano deleted the fix-regression branch July 1, 2021 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C# C++ Java no-change-note-required This PR does not need a change note Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants