Move subqueries to use the operator model by systay · Pull Request #13750 · vitessio/vitess

systay · 2023-08-09T09:20:27Z

Description

This PR moves the last big piece of logic over to the operator side of query planning.

Sub query planning introduces a new step in the query planning process - "subquery settling".

For queries containing subqueries, the process is now:

Starting from an AST with corresponding semantic state
Build the initial op tree with horizons, query graphs and subquery containers (more about these terms later)
Iteratively, push as many operators as possible under a route, and merge routes when possible
Once we decide we won't be able to push/merge subqueries any more, we "settle the subqueries"
- This means remove the subquery containers and decide how the subquery needs to be executed.
- If we notice that we are dealing with a correlated subquery that is not a semi-join, this is where we fail
- During settling we rewrite merged subquery expressions back into an AST expression
- Finally, we inject a filter into the outer side if we are dealing with a WHERE clause sub query

Subquery containers

When a query has multiple subqueries, we organise them as follows:

    ┌───┐
    │SQC│
    └───┘
┌──┐ ┌──┐ ┌──┐
│O │ │I1│ │I2│
└──┘ └──┘ └──┘

In this diagram, SQC is the sub query container, O is the outer query, and I1 and I2 are the two inner queries.

The alternative way of representing the same query could be something like this:

 ┌──────────┐
 │    SQ1   │
 └──────────┘

┌──┐      ┌───┐
│O │      │SQ2│
└──┘      └───┘

       ┌──┐  ┌──┐
       │I1│  │I2│
       └──┘  └──┘

Here SQ1 has O as it's outer query, but the inner query is another subquery, SQ2. It's outer is I1 and it's inner is I2. This is the "natural" way to represent this query pattern, and this is also what the final plan looks like.

The reason for having the sub query container until the subquery settle time is to make it easier to merge the outer with any of the inner queries. It makes it easy to compare the mergability of the outer query with each sub query in an efficient way.

Subquery merging

We are now a bit better when it comes to merging subqueries together. This is due to making it possible to push subquery operators down the operator tree and thus coming close enough to another Route so we can merge them together.

Related Issue(s)

Tracking issue #11626

Checklist

"Backport to:" labels have been added if this change should be back-ported
Tests were added or are not required
Did the new or modified tests pass consistently locally and on the CI
Documentation was added or is not required

vitess-bot · 2023-08-09T09:20:30Z

Signed-off-by: Andres Taylor <andres@planetscale.com>

Signed-off-by: Manan Gupta <manan@planetscale.com>

wangweicugw · 2023-09-27T07:25:45Z

go/vt/vtgate/planbuilder/operators/horizon_planning.go

+		case *SubQueryContainer:
+			return pushOrMergeSubQueryContainer(ctx, in)
+		case *QueryGraph:
+			return optimizeQueryGraph(ctx, in)


Why is *QueryGraph being used here? It seems that the *QueryGraph type has already been handled in the previous transformToPhysical method.

Great question! This is because we might have subqueries hiding inside horizons, and these won't become available until we do horizon expansion.

I'm not really happy with how this ended up, and I think I want to clean up this part. We shouldn't produce Horizons that are hiding subqueries - these should get expanded straight away. Unfortunately, we depend on the root being a Horizon to know what columns the user originally asked for in addTruncationOrProjectionToReturnOutput. I'll work on cleaning this up in coming PRs

Signed-off-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Andres Taylor <andres@planetscale.com>

go/slice/slice.go

harshit-gangal · 2023-09-27T09:57:58Z

go/test/endtoend/vtgate/gen4/gen4_test.go

-	utils.AssertMatches(t, mcmp.VtConn, `
-select id 
-from t1 
-where exists(
-	select t2.id, count(*) 
-	from t2 
-	where t1.col = t2.tcol2
-    having count(*) > 0
-)`,
-		`[[INT64(100)]]`)
-	utils.AssertMatches(t, mcmp.VtConn, `
-select id 
-from t1 
-where exists(
-	select t2.id, count(*) 
-	from t2 
-	where t1.col = t2.tcol1
-) order by id`,
-		`[[INT64(1)] [INT64(4)] [INT64(100)]]`)


any reason for removing these test cases

yeah, they are not valid. they are not following the ONLY_FULL_GROUP_BY directive, but this test doesn't change sql_mode on the connection, so the query fails.

go/test/endtoend/vtgate/gen4/gen4_test.go

GuptaManan100 · 2023-09-27T10:23:21Z

go/vt/sqlparser/ast_funcs.go

+func Walk(visit Visit, first SQLNode, nodes ...SQLNode) error {
+	err := VisitSQLNode(first, visit)
+	if err != nil {
+		return err
+	}


I don't see why first field is required. Won't the first element of nodes otherwise be the first to run?

this is to avoid accidental use of this method with no nodes to visit. We found one case where this was done by accident.

Shouldn't we then just check the length of nodes and panic/error on 0 length?

Why is that preferable to handling it like this?

go/vt/sqlparser/ast_rewriting.go

go/vt/sqlparser/reserved_vars.go

go/vt/vtgate/planbuilder/operator_transformers.go

go/vt/vtgate/planbuilder/operators/phases.go

go/vt/vtgate/planbuilder/operators/queryprojection.go

go/vt/vtgate/planbuilder/operators/route.go

go/vt/vtgate/planbuilder/operators/horizon.go

go/vt/vtgate/semantics/early_rewriter.go

go/vt/vtgate/planbuilder/plan_test.go

GuptaManan100 · 2023-09-27T11:54:00Z

go/vt/vtgate/planbuilder/operators/update.go

-	Assignments         map[string]sqlparser.Expr
-	ChangedVindexValues map[string]*engine.VindexValues
-	OwnedVindexQuery    string
-	AST                 *sqlparser.Update


Why are we not storing AST in Update anymore, but we are doing so for Delete and Inserts?

we probably should. do you remember, @harshit-gangal

we want to get rid of using them for others as well, just that subquery is not supported for delete and insert queries therefore it is not changed in this PR.

go/vt/vtgate/planbuilder/operators/subquery_container.go

go/vt/vtgate/planbuilder/operators/subquery.go

go/vt/vtgate/planbuilder/operators/aggregation_pushing.go

Signed-off-by: Andres Taylor <andres@planetscale.com>

go/slice/slice.go

go/test/endtoend/vtgate/gen4/gen4_test.go

go/vt/sqlparser/reserved_vars.go

go/vt/vtgate/planbuilder/operators/subquery.go

go/vt/vtgate/planbuilder/operators/ast_to_op.go

frouioui · 2023-09-27T12:46:44Z

go/vt/vtgate/planbuilder/operators/ast_to_op.go

+func (sqc *SubQueryContainer) getRootOperator(op ops.Operator) ops.Operator {
+	if len(sqc.Inner) == 0 {
+		return op
+	}
+
+	sqc.Outer = op
+	return sqc
+}


getRootOperator does not feel like a good name here, we are returning the receiver on line 129

wdyt would be a better name?

Signed-off-by: Andres Taylor <andres@planetscale.com>

Signed-off-by: Harshit Gangal <harshit@planetscale.com> Signed-off-by: Andres Taylor <andres@planetscale.com>

Signed-off-by: Andres Taylor <andres@planetscale.com>

vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Aug 9, 2023

github-actions bot added this to the v18.0.0 milestone Aug 9, 2023

systay force-pushed the subq-op branch from 11b681d to 14d62cb Compare August 9, 2023 12:25

frouioui force-pushed the subq-op branch from c741cc8 to fd57d06 Compare August 10, 2023 08:53

systay force-pushed the subq-op branch 3 times, most recently from 38db062 to a2b6a1c Compare August 21, 2023 12:38

harshit-gangal force-pushed the subq-op branch from a2b6a1c to 1dac013 Compare August 22, 2023 07:38

systay force-pushed the subq-op branch 3 times, most recently from 933f462 to fe89b8c Compare August 22, 2023 17:57

harshit-gangal force-pushed the subq-op branch 2 times, most recently from 426840e to ac78e61 Compare August 24, 2023 07:51

systay force-pushed the subq-op branch 3 times, most recently from e604f9b to 3974776 Compare August 25, 2023 13:23

systay added 6 commits August 28, 2023 16:46

renamed planbuilder, operator and engine primitive

a344c3a

Signed-off-by: Andres Taylor <andres@planetscale.com>

wip - handle EXISTS and push it down through an ApplyJoin

98be673

Signed-off-by: Andres Taylor <andres@planetscale.com>

wip - change subquery pushing

ecb83d8

Signed-off-by: Andres Taylor <andres@planetscale.com>

wip - semiJoin from EXISTS now planner correctly

f359ed0

Signed-off-by: Andres Taylor <andres@planetscale.com>

more work on semijoin and exists

756e935

Signed-off-by: Andres Taylor <andres@planetscale.com>

handle IN queries usign SemiJoins

e0f00ce

Signed-off-by: Andres Taylor <andres@planetscale.com>

systay and others added 3 commits September 27, 2023 07:13

remove another query with subq in outer join condition

7ab4d8c

Signed-off-by: Andres Taylor <andres@planetscale.com>

bug: check the output columns on commented queries

894823e

Signed-off-by: Andres Taylor <andres@planetscale.com>

test: fix test expectation and add a comment explaining it

a104a2d

Signed-off-by: Manan Gupta <manan@planetscale.com>

wangweicugw reviewed Sep 27, 2023

View reviewed changes

feat: fix pushing order by underneath an aggregation

173ea54

Signed-off-by: Manan Gupta <manan@planetscale.com>

systay force-pushed the subq-op branch from e612d46 to 173ea54 Compare September 27, 2023 07:54

bug: fix the subquery merging logic

eb23151

Signed-off-by: Andres Taylor <andres@planetscale.com>

harshit-gangal reviewed Sep 27, 2023

View reviewed changes

go/slice/slice.go Outdated Show resolved Hide resolved

harshit-gangal reviewed Sep 27, 2023

View reviewed changes

go/test/endtoend/vtgate/gen4/gen4_test.go Show resolved Hide resolved

GuptaManan100 reviewed Sep 27, 2023

View reviewed changes

go/vt/vtgate/planbuilder/operator_transformers.go Show resolved Hide resolved

go/vt/vtgate/planbuilder/operator_transformers.go Show resolved Hide resolved

go/vt/vtgate/planbuilder/operator_transformers.go Show resolved Hide resolved

GuptaManan100 reviewed Sep 27, 2023

View reviewed changes

go/vt/vtgate/planbuilder/operators/phases.go Outdated Show resolved Hide resolved

go/vt/vtgate/planbuilder/operators/queryprojection.go Outdated Show resolved Hide resolved

go/vt/vtgate/planbuilder/operators/route.go Outdated Show resolved Hide resolved

GuptaManan100 reviewed Sep 27, 2023

View reviewed changes

address review comments

baeeabd

Signed-off-by: Andres Taylor <andres@planetscale.com>

frouioui reviewed Sep 27, 2023

View reviewed changes

go/slice/slice.go Show resolved Hide resolved

go/test/endtoend/vtgate/gen4/gen4_test.go Show resolved Hide resolved

go/vt/sqlparser/reserved_vars.go Show resolved Hide resolved

go/vt/vtgate/planbuilder/operators/subquery.go Outdated Show resolved Hide resolved

frouioui reviewed Sep 27, 2023

View reviewed changes

go/vt/vtgate/planbuilder/operators/ast_to_op.go Show resolved Hide resolved

frouioui reviewed Sep 27, 2023

View reviewed changes

extract subquery building from subquery container

b804997

Signed-off-by: Andres Taylor <andres@planetscale.com>

systay force-pushed the subq-op branch from b2a6f3c to 4fa471f Compare September 28, 2023 13:45

allow merging but not routing if predicates are deep in expression tree

5f4b40d

Signed-off-by: Andres Taylor <andres@planetscale.com>

GuptaManan100 approved these changes Sep 29, 2023

View reviewed changes

systay and others added 4 commits September 29, 2023 07:50

clean up projection subquery planning

b513c99

Signed-off-by: Andres Taylor <andres@planetscale.com>

Merge remote-tracking branch 'upstream/main' into subq-op

02afec8

Signed-off-by: Andres Taylor <andres@planetscale.com>

handle subquery with vindex value on update better with blocking merge

826a3bb

Signed-off-by: Harshit Gangal <harshit@planetscale.com> Signed-off-by: Andres Taylor <andres@planetscale.com>

make sure to handle Exists in projections correctly

77e4cb7

Signed-off-by: Andres Taylor <andres@planetscale.com>

frouioui approved these changes Sep 29, 2023

View reviewed changes

arthurschreiber mentioned this pull request Jun 13, 2024

Bug Report: EXISTS subqueries no longer apply LIMIT 1 #16149

Closed

systay mentioned this pull request Jun 13, 2024

feat: add a LIMIT 1 on EXISTS subqueries to limit network overhead #16153

Merged

5 tasks

frouioui mentioned this pull request Feb 14, 2025

slack-19.0: skip tests that will fail on v15 downgrade testing slackhq/vitess#605

Merged

5 tasks

Conversation

systay commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Subquery containers

Subquery merging

Related Issue(s)

Checklist

Uh oh!

vitess-bot bot commented Aug 9, 2023

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuptaManan100 Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

systay commented Aug 9, 2023 •

edited

Loading

GuptaManan100 Sep 28, 2023 •

edited

Loading