-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhance](mtmv)cache table snapshot in refresh context #50855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
morrySnow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add ut
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 33837 ms |
TPC-DS: Total hot run time: 193941 ms |
ClickBench: Total hot run time: 29.6 s |
|
run buildall |
TPC-H: Total hot run time: 33504 ms |
TPC-DS: Total hot run time: 193779 ms |
ClickBench: Total hot run time: 29.05 s |
fe/fe-core/src/test/java/org/apache/doris/mtmv/MTMVPartitionUtilTest.java
Show resolved
Hide resolved
|
PR approved by anyone and no changes requested. |
|
run buildall |
TPC-H: Total hot run time: 34310 ms |
TPC-DS: Total hot run time: 193713 ms |
ClickBench: Total hot run time: 29.74 s |
|
run external |
|
run buildall |
TPC-H: Total hot run time: 35251 ms |
TPC-DS: Total hot run time: 186865 ms |
ClickBench: Total hot run time: 28.74 s |
|
PR approved by at least one committer and no changes requested. |
Assuming the SQL definition below, there are two tables (t1, t2) and one
materialized view (mv1).
When determining whether mv1 is synchronized with its base tables, the
system compares:
Whether mvp1 is synchronized with partition p1 and base table t2
Whether mvp2 is synchronized with partition p2 and base table t2
Optimization in this PR:
The original logic would fetch t2's snapshot information twice. This PR
improves efficiency by caching t2's snapshot in the refresh context to
avoid redundant retrieval.
CREATE TABLE t1
(
k2 TINYINT,
k3 INT not null
)
PARTITION BY LIST(`k3`)
(
PARTITION `p1` VALUES IN ('1'),
PARTITION `p2` VALUES IN ('2')
)
CREATE TABLE t2
(
k2 TINYINT,
k3 INT not null
);
create materialized view mv1
partition by(k3)
as
select * from t1 join t2;
mv1 will has two partition mvp1,mvp2
Assuming the SQL definition below, there are two tables (t1, t2) and one
materialized view (mv1).
When determining whether mv1 is synchronized with its base tables, the
system compares:
Whether mvp1 is synchronized with partition p1 and base table t2
Whether mvp2 is synchronized with partition p2 and base table t2
Optimization in this PR:
The original logic would fetch t2's snapshot information twice. This PR
improves efficiency by caching t2's snapshot in the refresh context to
avoid redundant retrieval.
CREATE TABLE t1
(
k2 TINYINT,
k3 INT not null
)
PARTITION BY LIST(`k3`)
(
PARTITION `p1` VALUES IN ('1'),
PARTITION `p2` VALUES IN ('2')
)
CREATE TABLE t2
(
k2 TINYINT,
k3 INT not null
);
create materialized view mv1
partition by(k3)
as
select * from t1 join t2;
mv1 will has two partition mvp1,mvp2
Assuming the SQL definition below, there are two tables (t1, t2) and one
materialized view (mv1).
When determining whether mv1 is synchronized with its base tables, the
system compares:
Whether mvp1 is synchronized with partition p1 and base table t2
Whether mvp2 is synchronized with partition p2 and base table t2
Optimization in this PR:
The original logic would fetch t2's snapshot information twice. This PR
improves efficiency by caching t2's snapshot in the refresh context to
avoid redundant retrieval.
CREATE TABLE t1
(
k2 TINYINT,
k3 INT not null
)
PARTITION BY LIST(`k3`)
(
PARTITION `p1` VALUES IN ('1'),
PARTITION `p2` VALUES IN ('2')
)
CREATE TABLE t2
(
k2 TINYINT,
k3 INT not null
);
create materialized view mv1
partition by(k3)
as
select * from t1 join t2;
mv1 will has two partition mvp1,mvp2
What problem does this PR solve?
Assuming the SQL definition below, there are two tables (t1, t2) and one materialized view (mv1).
When determining whether mv1 is synchronized with its base tables, the system compares:
Whether mvp1 is synchronized with partition p1 and base table t2
Whether mvp2 is synchronized with partition p2 and base table t2
Optimization in this PR:
The original logic would fetch t2's snapshot information twice. This PR improves efficiency by caching t2's snapshot in the refresh context to avoid redundant retrieval.
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)