-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](mtmv)MTMV refresh support multi pct tables #56958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
# Conflicts: # fe/fe-core/src/main/java/org/apache/doris/mtmv/MTMVPartitionUtil.java # fe/fe-core/src/test/java/org/apache/doris/mtmv/MTMVRewriteUtilTest.java
…union all input when create partition materialized view
union all input when create partition materialized view
|
run buildall |
TPC-DS: Total hot run time: 190086 ms |
ClickBench: Total hot run time: 30.38 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.94 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-DS: Total hot run time: 190567 ms |
ClickBench: Total hot run time: 30.23 s |
FE UT Coverage ReportIncrement line coverage |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 28.06 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Currently, a materialized view can only have one PCT table. If the reference table has multiple partitioned tables and changes occur in non-PCT tables, even if only some partitions are modified, the materialized view must be fully refreshed, which is costly. Therefore, support for multiple PCT tables is needed, but with the following restrictions: - Only inner joins and unions (including UNION ALL) are supported. - In the case of unions, all base tables must be PCT tables; otherwise, the derivation fails. - The partition granularity across multiple PCT tables must align. For example: The following scenario is allowed: t1 has 2 partitions: [2020-01-01, 2020-01-02), [2020-01-02, 2020-01-03) t2 has 2 partitions: [2020-01-02, 2020-01-03), [2020-01-03, 2020-01-04) The following scenario is not allowed: t1 has 2 partitions: [2020-01-01, 2020-01-03), [2020-01-03, 2020-01-05) t2 has 2 partitions: [2020-01-01, 2020-01-02), [2020-01-02, 2020-01-03) However, if the materialized view uses monthly partition roll-up, the above scenario is allowed, because the materialized view only needs to generate one partition: [2020-01-01, 2020-02-01). --------- Co-authored-by: seawinde <daydayup005@yeah.net> Co-authored-by: seawinde <wusi@selectdb.com>
…ew (#57958) ### What problem does this PR solve? Related PR: #56423 #56958 Problem Summary: 1. Fix partition trace fail when create partition mv with view and date_trunc 2. Fix err if use data_trunc('day', col) partition column when create partition materialized view the mv def sql would be success after this fix CREATE VIEW lineitem_daily_summary_view AS SELECT DATE_TRUNC('day', L_SHIPDATE) AS ship_date, L_RETURNFLAG, L_LINESTATUS, COUNT(*) AS order_count, SUM(L_QUANTITY) AS total_quantity, SUM(L_EXTENDEDPRICE) AS total_price, AVG(L_DISCOUNT) AS avg_discount FROM lineitem WHERE L_SHIPDATE IS NOT NULL GROUP BY ship_date, L_RETURNFLAG, L_LINESTATUS; mv def is as following SELECT ship_date, L_RETURNFLAG, SUM(order_count) AS total_orders, SUM(total_quantity) AS sum_quantity, SUM(total_price) AS sum_price, AVG(avg_discount) AS average_discount FROM lineitem_daily_summary_view GROUP BY ship_date, L_RETURNFLAG ORDER BY ship_date, L_RETURNFLAG, total_orders, sum_quantity, sum_price;
apache#57558) Related PR: apache#49514 apache#56958 Problem Summary: In the scenario of nested materialized view rewriting, if the underlying materialized view is a partitioned materialized view, the rewriting of the upper-level materialized view would fail. This PR fixes the issue.
### What problem does this PR solve? Currently, a materialized view can only have one PCT table. If the reference table has multiple partitioned tables and changes occur in non-PCT tables, even if only some partitions are modified, the materialized view must be fully refreshed, which is costly. Therefore, support for multiple PCT tables is needed, but with the following restrictions: - Only inner joins and unions (including UNION ALL) are supported. - In the case of unions, all base tables must be PCT tables; otherwise, the derivation fails. - The partition granularity across multiple PCT tables must align. For example: The following scenario is allowed: t1 has 2 partitions: [2020-01-01, 2020-01-02), [2020-01-02, 2020-01-03) t2 has 2 partitions: [2020-01-02, 2020-01-03), [2020-01-03, 2020-01-04) The following scenario is not allowed: t1 has 2 partitions: [2020-01-01, 2020-01-03), [2020-01-03, 2020-01-05) t2 has 2 partitions: [2020-01-01, 2020-01-02), [2020-01-02, 2020-01-03) However, if the materialized view uses monthly partition roll-up, the above scenario is allowed, because the materialized view only needs to generate one partition: [2020-01-01, 2020-02-01). --------- Co-authored-by: seawinde <daydayup005@yeah.net> Co-authored-by: seawinde <wusi@selectdb.com> # Conflicts: # fe/fe-core/src/main/java/org/apache/doris/mtmv/MTMVPartitionUtil.java # fe/fe-core/src/main/java/org/apache/doris/mtmv/MTMVRewriteUtil.java # fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/MaterializedViewUtils.java # fe/fe-core/src/test/java/org/apache/doris/mtmv/MTMVRewriteUtilTest.java
…ew (apache#57958) ### What problem does this PR solve? Related PR: apache#56423 apache#56958 Problem Summary: 1. Fix partition trace fail when create partition mv with view and date_trunc 2. Fix err if use data_trunc('day', col) partition column when create partition materialized view the mv def sql would be success after this fix CREATE VIEW lineitem_daily_summary_view AS SELECT DATE_TRUNC('day', L_SHIPDATE) AS ship_date, L_RETURNFLAG, L_LINESTATUS, COUNT(*) AS order_count, SUM(L_QUANTITY) AS total_quantity, SUM(L_EXTENDEDPRICE) AS total_price, AVG(L_DISCOUNT) AS avg_discount FROM lineitem WHERE L_SHIPDATE IS NOT NULL GROUP BY ship_date, L_RETURNFLAG, L_LINESTATUS; mv def is as following SELECT ship_date, L_RETURNFLAG, SUM(order_count) AS total_orders, SUM(total_quantity) AS sum_quantity, SUM(total_price) AS sum_price, AVG(avg_discount) AS average_discount FROM lineitem_daily_summary_view GROUP BY ship_date, L_RETURNFLAG ORDER BY ship_date, L_RETURNFLAG, total_orders, sum_quantity, sum_price;
apache#57558) Related PR: apache#49514 apache#56958 Problem Summary: In the scenario of nested materialized view rewriting, if the underlying materialized view is a partitioned materialized view, the rewriting of the upper-level materialized view would fail. This PR fixes the issue.
apache#57558) Related PR: apache#49514 apache#56958 Problem Summary: In the scenario of nested materialized view rewriting, if the underlying materialized view is a partitioned materialized view, the rewriting of the upper-level materialized view would fail. This PR fixes the issue.
…ew (apache#57958) ### What problem does this PR solve? Related PR: apache#56423 apache#56958 Problem Summary: 1. Fix partition trace fail when create partition mv with view and date_trunc 2. Fix err if use data_trunc('day', col) partition column when create partition materialized view the mv def sql would be success after this fix CREATE VIEW lineitem_daily_summary_view AS SELECT DATE_TRUNC('day', L_SHIPDATE) AS ship_date, L_RETURNFLAG, L_LINESTATUS, COUNT(*) AS order_count, SUM(L_QUANTITY) AS total_quantity, SUM(L_EXTENDEDPRICE) AS total_price, AVG(L_DISCOUNT) AS avg_discount FROM lineitem WHERE L_SHIPDATE IS NOT NULL GROUP BY ship_date, L_RETURNFLAG, L_LINESTATUS; mv def is as following SELECT ship_date, L_RETURNFLAG, SUM(order_count) AS total_orders, SUM(total_quantity) AS sum_quantity, SUM(total_price) AS sum_price, AVG(avg_discount) AS average_discount FROM lineitem_daily_summary_view GROUP BY ship_date, L_RETURNFLAG ORDER BY ship_date, L_RETURNFLAG, total_orders, sum_quantity, sum_price;
What problem does this PR solve?
Currently, a materialized view can only have one PCT table. If the reference table has multiple partitioned tables and changes occur in non-PCT tables, even if only some partitions are modified, the materialized view must be fully refreshed, which is costly.
Therefore, support for multiple PCT tables is needed, but with the following restrictions:
Only inner joins and unions (including UNION ALL) are supported.
In the case of unions, all base tables must be PCT tables; otherwise, the derivation fails.
The partition granularity across multiple PCT tables must align.
For example:
The following scenario is allowed:
t1 has 2 partitions: [2020-01-01, 2020-01-02), [2020-01-02, 2020-01-03)
t2 has 2 partitions: [2020-01-02, 2020-01-03), [2020-01-03, 2020-01-04)
The following scenario is not allowed:
t1 has 2 partitions: [2020-01-01, 2020-01-03), [2020-01-03, 2020-01-05)
t2 has 2 partitions: [2020-01-01, 2020-01-02), [2020-01-02, 2020-01-03)
However, if the materialized view uses monthly partition roll-up, the above scenario is allowed,
because the materialized view only needs to generate one partition: [2020-01-01, 2020-02-01).
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)