Skip to content

Conversation

@xiaokang
Copy link
Contributor

@xiaokang xiaokang commented Jan 5, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

Describe your changes.

Optimize for order by key topn query like SELECT * FROM table1 ORDER BY k1, k2 LIMIT n in which k1 and k2 is the prefix of table sort key.

This optimization is only for table with DUP_KEYS and UNQIUE_KEYS with merge on write.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner area/vectorization labels Jan 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Jan 5, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.1 seconds
load time: 476 seconds
storage size: 17172148092 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230202144437_clickbench_pr_89579.html

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

std::string MutableBlock::dump_names() const {
std::stringstream out;
for (auto it = _names.begin(); it != _names.end(); ++it) {
if (it != _names.begin()) out << ", ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (it != _names.begin()) out << ", ";
if (it != _names.begin()) { out << ", ";
}

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

const MutableBlock* _mutable_block = nullptr;
const std::vector<uint32_t>* _compare_columns;
// reverse the compare order
const bool _is_reverse = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: private field '_is_reverse' is not used [clang-diagnostic-unused-private-field]

        const bool _is_reverse = false;
                   ^

@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@Gabriel39 Gabriel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Gabriel39 Gabriel39 merged commit 737c73d into apache:master Feb 6, 2023
luwei16 pushed a commit to luwei16/incubator-doris that referenced this pull request Apr 7, 2023
Issue Number: close http://jira.selectdb-in.cc/browse/CORE-1462

Describe the overview of changes.

commit e1697741a82f875ca42b0d18caa7972eaa225bee
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Jan 19 22:59:29 2023 +0800

    [opt](test) scalar_types_p0 use 100k lines dataset and scalar_types_p2 use 1000k (apache#16104)

commit 33a47e8d02644123ffd8c5c4353653c1c175e96a
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Jan 18 14:17:24 2023 +0800

    [testcase](bitmap index)bitmap index testcase (apache#15975)

    * add bitmap index testcases for all scalar types

commit 260a631441834ca7e23da4b77c922eb818eddca7
Author: Kang <kxiao.tiger@gmail.com>
Date:   Mon Jan 16 16:49:59 2023 +0800

    [regression-test](topn)add test cases for nonkey topn query for each scalar type (apache#15790)

    related to apache#15558 apache#15693
    1. dup key table with 17 scalar datatypes
    2. unique key table with mow enabled
    3. unique key table with mow disabled

commit 81cea5219ae86df950f10aa123072df78c7cdf23
Author: Kang <kxiao.tiger@gmail.com>
Date:   Sun Feb 19 23:28:33 2023 +0800

    [bugfix](topn) fix topn read_orderby_key_columns nullptr (apache#16896)

    The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
    make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
    triggered by skipping _colname_to_value_range init in apache#16818 .

    This PR makes two changes:
    1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
    2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump

commit 2fee1d1d79942e49eddaafdc2b49e49b0651b109
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Feb 10 12:56:33 2023 +0800

    [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (apache#16514)

    1. add limit threshold for topn runtime pushdown and key topn optimization
    2. use unified session variable topn_opt_limit_threshold for all topn optimizations
    3. add fuzzy support for topn_opt_limit_threshold

commit 1696bed39129fcc891f32f64ff1fb43f9531fcd4
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Feb 2 09:13:32 2023 +0800

    [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (apache#16331)

    * fix topn runtime predicate getting value bug for decimal type

    * fix cast_to_string bug for TYPE_DECIMALV2

commit d70cdf61521a23417c9bc734a3cdb668265a15b0
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 16:18:46 2023 +0800

    topn sync doris order by key topn query optimization apache#15663

commit 1df514c8f0b66ae9a8438617163a31848e519949
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 15:14:43 2023 +0800

    sync with doris runtime prune for topn query apache#15558
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
Issue Number: close http://jira.selectdb-in.cc/browse/CORE-1462

Describe the overview of changes.

commit e1697741a82f875ca42b0d18caa7972eaa225bee
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Jan 19 22:59:29 2023 +0800

    [opt](test) scalar_types_p0 use 100k lines dataset and scalar_types_p2 use 1000k (apache#16104)

commit 33a47e8d02644123ffd8c5c4353653c1c175e96a
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Jan 18 14:17:24 2023 +0800

    [testcase](bitmap index)bitmap index testcase (apache#15975)

    * add bitmap index testcases for all scalar types

commit 260a631441834ca7e23da4b77c922eb818eddca7
Author: Kang <kxiao.tiger@gmail.com>
Date:   Mon Jan 16 16:49:59 2023 +0800

    [regression-test](topn)add test cases for nonkey topn query for each scalar type (apache#15790)

    related to apache#15558 apache#15693
    1. dup key table with 17 scalar datatypes
    2. unique key table with mow enabled
    3. unique key table with mow disabled

commit 81cea5219ae86df950f10aa123072df78c7cdf23
Author: Kang <kxiao.tiger@gmail.com>
Date:   Sun Feb 19 23:28:33 2023 +0800

    [bugfix](topn) fix topn read_orderby_key_columns nullptr (apache#16896)

    The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
    make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
    triggered by skipping _colname_to_value_range init in apache#16818 .

    This PR makes two changes:
    1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
    2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump

commit 2fee1d1d79942e49eddaafdc2b49e49b0651b109
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Feb 10 12:56:33 2023 +0800

    [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (apache#16514)

    1. add limit threshold for topn runtime pushdown and key topn optimization
    2. use unified session variable topn_opt_limit_threshold for all topn optimizations
    3. add fuzzy support for topn_opt_limit_threshold

commit 1696bed39129fcc891f32f64ff1fb43f9531fcd4
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Feb 2 09:13:32 2023 +0800

    [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (apache#16331)

    * fix topn runtime predicate getting value bug for decimal type

    * fix cast_to_string bug for TYPE_DECIMALV2

commit d70cdf61521a23417c9bc734a3cdb668265a15b0
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 16:18:46 2023 +0800

    topn sync doris order by key topn query optimization apache#15663

commit 1df514c8f0b66ae9a8438617163a31848e519949
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 15:14:43 2023 +0800

    sync with doris runtime prune for topn query apache#15558
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/planner Issues or PRs related to the query planner area/vectorization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants