-
Notifications
You must be signed in to change notification settings - Fork 409
Description
Enhancement
Current executor test framework only supports in testing three kinds of join: ASTTableJoin::Kind::left, ASTTableJoin::Kind::right and ASTTableJoin::Kind::inner.
We need to support more types of join.
Design Doc
This design doc aims to solve the following drawbacks within current executor test framework
- Incomplete join kind supports, therefore incomplete join test cases
- No
onexpression supports
Detailed work can be separated into several steps.
Step I: Simplify join executor's translation workflow
rationale
Current join executor translation is quite cumbersome. It includes
First, test framework phase: test API -> ASTTableJoin -> DB::mock::Join -> DAGRequest
Then, tiflash execution engine phase: DAGRequest -> part of ASTTableJoin -> TiflashJoin -> DB::Join
As can be seen from the flow, DAGRequest is the bridge between test framework and execution engine being tested. However, ASTTableJoin appears twice and is actually redundant in the first phase because
- DAGRequest and ASTTableJoin has different join type classification approach, unfortunately a many-to-many mapping
- From the ut writer's perspective, the only thing he/she should know is that using framework's API the product is a DAGRequest in line with expectations. Specifically, it means two things
- he/she should be aware of the mapping from DAGRequest to ASTTableJoin, because it is how the execution engine will do towards the input
- he/she should not be aware of the mapping from ASTTableJoin to DAGRequest, it has nothing to do with the code being tested and requires additional mental load
In a word, we should avoid using ASTTableJoin in the first phase and rather make the first phase be like testAPI -> DB::mock::Join -> DAGRequest using tiup::JoinType.
compatibility issue
Some legacy tests rely on clickhouse's native parser to translate sql query string to AST trees and then use toTiPBExecutor to get DAGRequest, since these old tests follow the code path that only supports left/right/inner join:
tiflash/dbms/src/Debug/astToExecutor.cpp
Lines 1281 to 1294 in 3ad743b
| switch (join_params.kind) // todo support more type... | |
| { | |
| case ASTTableJoin::Kind::Inner: | |
| join->set_join_type(tipb::JoinType::TypeInnerJoin); | |
| break; | |
| case ASTTableJoin::Kind::Left: | |
| join->set_join_type(tipb::JoinType::TypeLeftOuterJoin); | |
| break; | |
| case ASTTableJoin::Kind::Right: | |
| join->set_join_type(tipb::JoinType::TypeRightOuterJoin); | |
| break; | |
| default: | |
| throw Exception("Unsupported join type"); | |
| } |
one workaround might be move the translation for ASTTableJoin::Kind->TiPB::JoinType from toTiPBExecutor forward to compileJoin. This is another view that can show the restriction of old API. We might be able to deprecate these tests in the future.
deliverables
After this step, deliverables include
- updated
DAGRequestBuilder::jointest api withtipb::JoinTypeand avoid using ASTTableJoin in the framework phase - passed all the join regression tests
Step II: Enable all join kinds and enhance test cases
rationale
Based on the first step, now it is possible to support all DAGRequest::JoinType given a valid using expression. All join kinds passed from TiDB to TiFlash in DAGRequest are as follows
- InnerJoin
- LeftOuterJoin
- RightOuterJoin
- SemiJoin
- AntiSemiJoin
- LeftOuterSemiJoin
- AntiLeftOuterSemiJoin
deliverables
- All
DAGRequest::JoinTypesupport and corresponding test cases - Increased join executor test coverage
Step III: Support more complicated DAGRequest construction [WIP]
rationale
Though it is possible to build DAGRequest with all kinds of DAGRequest::JoinType, it does not mean that the tests are exhausted because, again, JoinType -> ASTTableJoin::Kind is not a 1-to-1 mapping. Those ASTTableJoin::Kind secretly used within tiflash with prefix cross_ requires DAGRequest to not have join keys.
For instance, if you write a query like select * from t1 left join t2 on t1.a = 1, tidb will construct a DAGRequest with type tipb::JoinType::TypeLeftOuterJoin but no join keys since expression t1.a = 1 only references left table. Then within tiflash, it will interpret the join type as ASTTableJoin::Kind::Cross_Left.
Now it is clear that we can only test those Cross_* types until we first support on expression in the framework. However, to directly support on expression in the framework might not be a good idea because
- complexity: the expression used with
oncan be any conditional expression of the form that can be used in a WHERE clause - compatibility: in order to parse the expression correctly, the parsing logic should have the exactly same behavior as how tidb handles
on - table column reference: our current framework does not support qualified column name
Since the main goal of our framework is to construct the DAGRequest, we can instead choosing a DAGRequest-oriented approach. Namely, ask test writers to be specific about how to set each field of join executor in DAGRequest rather than providing an on expression parser to do this job for them.
draft
tipb::Join has following fields related to the on expression parsing logic
- left_conditions: left join && conditional expressions only reference left table
- right_conditions: right join && conditional expressions only reference right table
- other_conditions: other conditional expressions
- other_eq_conditions_from_in: a special kind of other condition, i.e. equality expressions within
insubquery whose join type should be AntiSemiJoin, AntiLeftOuterSemiJoin or LeftOuterSemiJoin (ref: https://github.com/pingcap/tidb/blob/97f66c3fecebc89fa7e1472c4a53d7a3462432cf/planner/core/plan_to_pb.go#L415-L429)
deliverables
- two join builder APIs:
DAGRequestBuilder & join(const DAGRequestBuilder & right, tipb::JoinType tp,
MockAstVec join_col_exprs,
MockAstVec left_conds,
MockAstVec right_conds,
MockAstVec other_conds,
MockAstVec other_eq_conds_from_in);
DAGRequestBuilder & join(const DAGRequestBuilder & right, tipb::JoinType tp, MockAstVec join_col_exprs);Note: since the new join builder API is kind of harder to use (user must know how TiDB translates sql's join-on clause to DAGRequest's conditional expression fields), instead of using default parameters for conditional expressions, we explicitly require them to supplement these fields and recommend them to use the old one unless they have to test conditional expressions for join in the comment.
- test coverage improvement of join executor
Progress Tracking
- step 1: Simplify join executor's translation workflow
- step 2: Enable all join kinds and enhance test cases
- step 3: Support more complicated DAGRequest construction
08:43:35 UTC+8 Sunday, August 7, 2022