-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improvement](mtmv) Optimize the nested materialized view rewrite performance #34050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement](mtmv) Optimize the nested materialized view rewrite performance #34050
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
ClickBench: Total hot run time: 30.96 s |
|
run buildall |
|
run buildall |
| for (GroupExpression groupExpression : group.getLogicalExpressions()) { | ||
| List<Set<BitSet>> childrenTableMap = new ArrayList<>(); | ||
| boolean needRefresh = false; | ||
| boolean needRefresh = groupExpressionMap.isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove it. It's seems not needed any more
| public class StructInfoMap { | ||
| private final Map<BitSet, Pair<GroupExpression, List<BitSet>>> groupExpressionMap = new HashMap<>(); | ||
| private final Map<BitSet, StructInfo> infoMap = new HashMap<>(); | ||
| private boolean refreshed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have fixed it
ClickBench: Total hot run time: 31.62 s |
|
run buildall |
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
…formance (apache#34050) Optimize the nested materialized view rewrite performance when exists many join This is brought by apache#33362
…e is useless in some scene (#41472) This is brought by #34050 if set `enable_materialized_view_nest_rewrite = false`, as expected, top level materialized view should rewritten fail, but now successfully. Such as first level materialized view def is CREATE MATERIALIZED VIEW level1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey FROM lineitem_2 INNER JOIN orders_2 ON l_orderkey = o_orderkey; second level materialized view def is CREATE MATERIALIZED VIEW level2 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, o_orderkey, sum(l_partkey) AS total_revenue, max(o_custkey) AS max_discount FROM join_mv1 GROUP BY l_orderkey, l_linenumber, o_orderkey; if set `enable_materialized_view_nest_rewrite = false`, only `level1` can rewriten succesfully and chosen by cbo if set `enable_materialized_view_nest_rewrite = true`, both `level1` and `level2` can rewriten succesfully and `level2` should be chosen by cbo. This pr fixed this
…e is useless in some scene (apache#41472) This is brought by apache#34050 if set `enable_materialized_view_nest_rewrite = false`, as expected, top level materialized view should rewritten fail, but now successfully. Such as first level materialized view def is CREATE MATERIALIZED VIEW level1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey FROM lineitem_2 INNER JOIN orders_2 ON l_orderkey = o_orderkey; second level materialized view def is CREATE MATERIALIZED VIEW level2 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, o_orderkey, sum(l_partkey) AS total_revenue, max(o_custkey) AS max_discount FROM join_mv1 GROUP BY l_orderkey, l_linenumber, o_orderkey; if set `enable_materialized_view_nest_rewrite = false`, only `level1` can rewriten succesfully and chosen by cbo if set `enable_materialized_view_nest_rewrite = true`, both `level1` and `level2` can rewriten succesfully and `level2` should be chosen by cbo. This pr fixed this
…e is useless in some scene (apache#41472) This is brought by apache#34050 if set `enable_materialized_view_nest_rewrite = false`, as expected, top level materialized view should rewritten fail, but now successfully. Such as first level materialized view def is CREATE MATERIALIZED VIEW level1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey FROM lineitem_2 INNER JOIN orders_2 ON l_orderkey = o_orderkey; second level materialized view def is CREATE MATERIALIZED VIEW level2 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, o_orderkey, sum(l_partkey) AS total_revenue, max(o_custkey) AS max_discount FROM join_mv1 GROUP BY l_orderkey, l_linenumber, o_orderkey; if set `enable_materialized_view_nest_rewrite = false`, only `level1` can rewriten succesfully and chosen by cbo if set `enable_materialized_view_nest_rewrite = true`, both `level1` and `level2` can rewriten succesfully and `level2` should be chosen by cbo. This pr fixed this
…e is useless in some scene (apache#41472) This is brought by apache#34050 if set `enable_materialized_view_nest_rewrite = false`, as expected, top level materialized view should rewritten fail, but now successfully. Such as first level materialized view def is CREATE MATERIALIZED VIEW level1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey FROM lineitem_2 INNER JOIN orders_2 ON l_orderkey = o_orderkey; second level materialized view def is CREATE MATERIALIZED VIEW level2 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT l_orderkey, l_linenumber, o_orderkey, sum(l_partkey) AS total_revenue, max(o_custkey) AS max_discount FROM join_mv1 GROUP BY l_orderkey, l_linenumber, o_orderkey; if set `enable_materialized_view_nest_rewrite = false`, only `level1` can rewriten succesfully and chosen by cbo if set `enable_materialized_view_nest_rewrite = true`, both `level1` and `level2` can rewriten succesfully and `level2` should be chosen by cbo. This pr fixed this
Proposed changes
Optimize the nested materialized view rewrite performance when exists many join
This is brought by #33362
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...