Skip to content

Conversation

@seawinde
Copy link
Contributor

Proposed changes

Optimize the nested materialized view rewrite performance when exists many join
This is brought by #33362

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 30.96 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f7db5398760b540f34c8c1a1226550871401ee95, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.49	0.52	0.52
query6	1.15	0.87	0.81
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.50	0.45	0.44
query10	0.51	0.52	0.50
query11	0.14	0.12	0.10
query12	0.13	0.11	0.11
query13	0.63	0.64	0.64
query14	0.91	1.07	0.91
query15	0.84	0.84	0.84
query16	0.38	0.36	0.37
query17	1.04	1.03	1.05
query18	0.21	0.25	0.24
query19	1.92	1.80	1.73
query20	0.01	0.02	0.01
query21	15.40	0.66	0.64
query22	4.52	7.81	1.44
query23	18.22	1.36	1.35
query24	1.87	0.27	0.24
query25	0.15	0.09	0.11
query26	0.27	0.18	0.17
query27	0.09	0.09	0.09
query28	13.39	1.04	1.02
query29	12.65	3.43	3.42
query30	0.26	0.08	0.06
query31	2.86	0.40	0.40
query32	3.22	0.49	0.49
query33	2.76	2.86	2.82
query34	17.07	4.64	4.56
query35	4.55	4.69	4.50
query36	0.70	0.47	0.48
query37	0.21	0.17	0.17
query38	0.20	0.18	0.19
query39	0.05	0.05	0.05
query40	0.19	0.16	0.16
query41	0.11	0.06	0.06
query42	0.07	0.06	0.06
query43	0.04	0.04	0.04
Total cold run time: 109.8 s
Total hot run time: 30.96 s

@seawinde
Copy link
Contributor Author

run buildall

@seawinde
Copy link
Contributor Author

run buildall

for (GroupExpression groupExpression : group.getLogicalExpressions()) {
List<Set<BitSet>> childrenTableMap = new ArrayList<>();
boolean needRefresh = false;
boolean needRefresh = groupExpressionMap.isEmpty();
Copy link
Contributor

@keanji-x keanji-x Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it. It's seems not needed any more

public class StructInfoMap {
private final Map<BitSet, Pair<GroupExpression, List<BitSet>>> groupExpressionMap = new HashMap<>();
private final Map<BitSet, StructInfo> infoMap = new HashMap<>();
private boolean refreshed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have fixed it

@doris-robot
Copy link

ClickBench: Total hot run time: 31.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cb3c0afe5a3e9e989f15dfa9ced43df737e9ed57, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.04
query3	0.23	0.04	0.04
query4	1.68	0.07	0.08
query5	0.50	0.50	0.50
query6	1.21	0.86	0.82
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.49	0.45	0.44
query10	0.50	0.49	0.50
query11	0.15	0.10	0.10
query12	0.13	0.11	0.11
query13	0.65	0.62	0.63
query14	0.90	1.00	0.98
query15	0.85	0.86	0.85
query16	0.35	0.37	0.38
query17	1.01	1.04	0.97
query18	0.22	0.23	0.24
query19	1.88	1.77	1.87
query20	0.01	0.01	0.01
query21	15.42	0.66	0.66
query22	4.38	6.60	2.29
query23	18.33	1.32	1.28
query24	1.51	0.29	0.33
query25	0.16	0.09	0.09
query26	0.26	0.17	0.18
query27	0.09	0.09	0.09
query28	13.40	1.02	1.01
query29	12.71	3.42	3.44
query30	0.26	0.08	0.07
query31	2.83	0.39	0.40
query32	3.25	0.49	0.50
query33	2.75	2.73	3.04
query34	17.05	4.49	4.41
query35	4.49	4.58	4.54
query36	0.65	0.49	0.49
query37	0.21	0.18	0.18
query38	0.20	0.19	0.18
query39	0.05	0.05	0.05
query40	0.18	0.14	0.16
query41	0.11	0.06	0.06
query42	0.06	0.06	0.06
query43	0.05	0.05	0.05
Total cold run time: 109.34 s
Total hot run time: 31.62 s

@seawinde
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 24, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit e86c599 into apache:master Apr 24, 2024
seawinde added a commit to seawinde/doris that referenced this pull request Apr 24, 2024
…formance (apache#34050)

Optimize the nested materialized view rewrite performance when exists many join
This is brought by apache#33362
yiguolei pushed a commit that referenced this pull request Apr 24, 2024
…formance (#34050) (#34078)

Optimize the nested materialized view rewrite performance when exists many join
This is brought by #33362
yiguolei pushed a commit that referenced this pull request Apr 24, 2024
…formance (#34050)

Optimize the nested materialized view rewrite performance when exists many join
This is brought by #33362
yiguolei pushed a commit that referenced this pull request Apr 24, 2024
…formance (#34050)

Optimize the nested materialized view rewrite performance when exists many join
This is brought by #33362
yiguolei pushed a commit that referenced this pull request Apr 25, 2024
…formance (#34050)

Optimize the nested materialized view rewrite performance when exists many join
This is brought by #33362
morrySnow pushed a commit that referenced this pull request Oct 11, 2024
…e is useless in some scene (#41472)

This is brought by #34050

if set `enable_materialized_view_nest_rewrite = false`, as expected, top
level materialized view should rewritten fail, but now successfully.

Such as first level materialized view def is

        CREATE MATERIALIZED VIEW level1
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1') 
        AS
    SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey
    FROM lineitem_2 INNER JOIN orders_2
    ON l_orderkey = o_orderkey;

second level materialized view def is

        CREATE MATERIALIZED VIEW level2
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1') 
        AS
    SELECT
    l_orderkey,
    l_linenumber,
    o_orderkey,
    sum(l_partkey) AS total_revenue,
    max(o_custkey) AS max_discount
    FROM join_mv1
    GROUP BY l_orderkey, l_linenumber, o_orderkey;

if set `enable_materialized_view_nest_rewrite = false`, only `level1`
can rewriten succesfully and chosen by cbo
if set `enable_materialized_view_nest_rewrite = true`, both `level1` and
`level2` can rewriten succesfully and `level2` should be chosen by cbo.

This pr fixed this
seawinde added a commit to seawinde/doris that referenced this pull request Oct 21, 2024
…e is useless in some scene (apache#41472)

This is brought by apache#34050

if set `enable_materialized_view_nest_rewrite = false`, as expected, top
level materialized view should rewritten fail, but now successfully.

Such as first level materialized view def is

        CREATE MATERIALIZED VIEW level1
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1')
        AS
    SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey
    FROM lineitem_2 INNER JOIN orders_2
    ON l_orderkey = o_orderkey;

second level materialized view def is

        CREATE MATERIALIZED VIEW level2
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1')
        AS
    SELECT
    l_orderkey,
    l_linenumber,
    o_orderkey,
    sum(l_partkey) AS total_revenue,
    max(o_custkey) AS max_discount
    FROM join_mv1
    GROUP BY l_orderkey, l_linenumber, o_orderkey;

if set `enable_materialized_view_nest_rewrite = false`, only `level1`
can rewriten succesfully and chosen by cbo
if set `enable_materialized_view_nest_rewrite = true`, both `level1` and
`level2` can rewriten succesfully and `level2` should be chosen by cbo.

This pr fixed this
seawinde added a commit to seawinde/doris that referenced this pull request Oct 23, 2024
…e is useless in some scene (apache#41472)

This is brought by apache#34050

if set `enable_materialized_view_nest_rewrite = false`, as expected, top
level materialized view should rewritten fail, but now successfully.

Such as first level materialized view def is

        CREATE MATERIALIZED VIEW level1
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1')
        AS
    SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey
    FROM lineitem_2 INNER JOIN orders_2
    ON l_orderkey = o_orderkey;

second level materialized view def is

        CREATE MATERIALIZED VIEW level2
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1')
        AS
    SELECT
    l_orderkey,
    l_linenumber,
    o_orderkey,
    sum(l_partkey) AS total_revenue,
    max(o_custkey) AS max_discount
    FROM join_mv1
    GROUP BY l_orderkey, l_linenumber, o_orderkey;

if set `enable_materialized_view_nest_rewrite = false`, only `level1`
can rewriten succesfully and chosen by cbo
if set `enable_materialized_view_nest_rewrite = true`, both `level1` and
`level2` can rewriten succesfully and `level2` should be chosen by cbo.

This pr fixed this
seawinde added a commit to seawinde/doris that referenced this pull request Oct 23, 2024
…e is useless in some scene (apache#41472)

This is brought by apache#34050

if set `enable_materialized_view_nest_rewrite = false`, as expected, top
level materialized view should rewritten fail, but now successfully.

Such as first level materialized view def is

        CREATE MATERIALIZED VIEW level1
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1') 
        AS
    SELECT l_orderkey, l_linenumber, l_partkey, o_orderkey, o_custkey
    FROM lineitem_2 INNER JOIN orders_2
    ON l_orderkey = o_orderkey;

second level materialized view def is

        CREATE MATERIALIZED VIEW level2
        BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1') 
        AS
    SELECT
    l_orderkey,
    l_linenumber,
    o_orderkey,
    sum(l_partkey) AS total_revenue,
    max(o_custkey) AS max_discount
    FROM join_mv1
    GROUP BY l_orderkey, l_linenumber, o_orderkey;

if set `enable_materialized_view_nest_rewrite = false`, only `level1`
can rewriten succesfully and chosen by cbo
if set `enable_materialized_view_nest_rewrite = true`, both `level1` and
`level2` can rewriten succesfully and `level2` should be chosen by cbo.

This pr fixed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants