Move subqueries to use the operator model#13750
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
38db062 to
a2b6a1c
Compare
a2b6a1c to
1dac013
Compare
933f462 to
fe89b8c
Compare
426840e to
ac78e61
Compare
e604f9b to
3974776
Compare
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
| case *SubQueryContainer: | ||
| return pushOrMergeSubQueryContainer(ctx, in) | ||
| case *QueryGraph: | ||
| return optimizeQueryGraph(ctx, in) |
There was a problem hiding this comment.
Why is *QueryGraph being used here? It seems that the *QueryGraph type has already been handled in the previous transformToPhysical method.
There was a problem hiding this comment.
Great question! This is because we might have subqueries hiding inside horizons, and these won't become available until we do horizon expansion.
I'm not really happy with how this ended up, and I think I want to clean up this part. We shouldn't produce Horizons that are hiding subqueries - these should get expanded straight away. Unfortunately, we depend on the root being a Horizon to know what columns the user originally asked for in addTruncationOrProjectionToReturnOutput. I'll work on cleaning this up in coming PRs
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
| utils.AssertMatches(t, mcmp.VtConn, ` | ||
| select id | ||
| from t1 | ||
| where exists( | ||
| select t2.id, count(*) | ||
| from t2 | ||
| where t1.col = t2.tcol2 | ||
| having count(*) > 0 | ||
| )`, | ||
| `[[INT64(100)]]`) | ||
| utils.AssertMatches(t, mcmp.VtConn, ` | ||
| select id | ||
| from t1 | ||
| where exists( | ||
| select t2.id, count(*) | ||
| from t2 | ||
| where t1.col = t2.tcol1 | ||
| ) order by id`, | ||
| `[[INT64(1)] [INT64(4)] [INT64(100)]]`) |
There was a problem hiding this comment.
any reason for removing these test cases
There was a problem hiding this comment.
yeah, they are not valid. they are not following the ONLY_FULL_GROUP_BY directive, but this test doesn't change sql_mode on the connection, so the query fails.
| func Walk(visit Visit, first SQLNode, nodes ...SQLNode) error { | ||
| err := VisitSQLNode(first, visit) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
I don't see why first field is required. Won't the first element of nodes otherwise be the first to run?
There was a problem hiding this comment.
this is to avoid accidental use of this method with no nodes to visit. We found one case where this was done by accident.
There was a problem hiding this comment.
Shouldn't we then just check the length of nodes and panic/error on 0 length?
There was a problem hiding this comment.
Why is that preferable to handling it like this?
| Assignments map[string]sqlparser.Expr | ||
| ChangedVindexValues map[string]*engine.VindexValues | ||
| OwnedVindexQuery string | ||
| AST *sqlparser.Update |
There was a problem hiding this comment.
Why are we not storing AST in Update anymore, but we are doing so for Delete and Inserts?
There was a problem hiding this comment.
we probably should. do you remember, @harshit-gangal
There was a problem hiding this comment.
we want to get rid of using them for others as well, just that subquery is not supported for delete and insert queries therefore it is not changed in this PR.
Signed-off-by: Andres Taylor <andres@planetscale.com>
| func (sqc *SubQueryContainer) getRootOperator(op ops.Operator) ops.Operator { | ||
| if len(sqc.Inner) == 0 { | ||
| return op | ||
| } | ||
|
|
||
| sqc.Outer = op | ||
| return sqc | ||
| } |
There was a problem hiding this comment.
getRootOperator does not feel like a good name here, we are returning the receiver on line 129
There was a problem hiding this comment.
wdyt would be a better name?
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com> Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Description
This PR moves the last big piece of logic over to the operator side of query planning.
Sub query planning introduces a new step in the query planning process - "subquery settling".
For queries containing subqueries, the process is now:
WHEREclause sub querySubquery containers
When a query has multiple subqueries, we organise them as follows:
In this diagram, SQC is the sub query container, O is the outer query, and I1 and I2 are the two inner queries.
The alternative way of representing the same query could be something like this:
Here SQ1 has O as it's outer query, but the inner query is another subquery, SQ2. It's outer is I1 and it's inner is I2. This is the "natural" way to represent this query pattern, and this is also what the final plan looks like.
The reason for having the sub query container until the subquery settle time is to make it easier to merge the outer with any of the inner queries. It makes it easy to compare the mergability of the outer query with each sub query in an efficient way.
Subquery merging
We are now a bit better when it comes to merging subqueries together. This is due to making it possible to push subquery operators down the operator tree and thus coming close enough to another
Routeso we can merge them together.Related Issue(s)
Tracking issue #11626
Checklist