Skip to content

Conversation

@xiaokang
Copy link
Contributor

@xiaokang xiaokang commented Feb 8, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

  1. add limit threshold for topn runtime pushdown and key topn optimization
  2. use unified session variable topn_opt_limit_threshold for all topn optimizations
  3. add fuzzy support for topn_opt_limit_threshold

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/docs Categorizes issue or PR as related to documentation. labels Feb 8, 2023
@eldenmoon
Copy link
Member

we could add this limit to fuzzy sesion variable

@xiaokang
Copy link
Contributor Author

xiaokang commented Feb 8, 2023

we could add this limit to fuzzy sesion variable

added

@xiaokang xiaokang changed the title [Improvement](topn) add limit threashold session variable for topn optimizations [Improvement](topn) add limit threashold session variable add fuzzy for topn optimizations Feb 9, 2023
@xiaokang xiaokang changed the title [Improvement](topn) add limit threashold session variable add fuzzy for topn optimizations [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations Feb 9, 2023
eldenmoon
eldenmoon previously approved these changes Feb 9, 2023
Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

PR approved by anyone and no changes requested.

SortNode sortNode = (SortNode) node;
PlanNode child = sortNode.getChild(0);
if (child instanceof OlapScanNode && sortNode.getLimit() > 0
&& sortNode.getLimit() <= ConnectContext.get().getSessionVariable().topnOptLimitThreshold
Copy link
Contributor

@Gabriel39 Gabriel39 Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if session will be null here. Such as this is a load task or read by a flink/spark connertor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added check for null

Copy link
Contributor

@Gabriel39 Gabriel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 10, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@Gabriel39 Gabriel39 merged commit d9924c9 into apache:master Feb 10, 2023
YangShaw pushed a commit to YangShaw/doris that referenced this pull request Feb 17, 2023
…or topn optimizations (apache#16514)

1. add limit threshold for topn runtime pushdown and key topn optimization
2. use unified session variable topn_opt_limit_threshold for all topn optimizations
3. add fuzzy support for topn_opt_limit_threshold
luwei16 pushed a commit to luwei16/incubator-doris that referenced this pull request Apr 7, 2023
Issue Number: close http://jira.selectdb-in.cc/browse/CORE-1462

Describe the overview of changes.

commit e1697741a82f875ca42b0d18caa7972eaa225bee
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Jan 19 22:59:29 2023 +0800

    [opt](test) scalar_types_p0 use 100k lines dataset and scalar_types_p2 use 1000k (apache#16104)

commit 33a47e8d02644123ffd8c5c4353653c1c175e96a
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Jan 18 14:17:24 2023 +0800

    [testcase](bitmap index)bitmap index testcase (apache#15975)

    * add bitmap index testcases for all scalar types

commit 260a631441834ca7e23da4b77c922eb818eddca7
Author: Kang <kxiao.tiger@gmail.com>
Date:   Mon Jan 16 16:49:59 2023 +0800

    [regression-test](topn)add test cases for nonkey topn query for each scalar type (apache#15790)

    related to apache#15558 apache#15693
    1. dup key table with 17 scalar datatypes
    2. unique key table with mow enabled
    3. unique key table with mow disabled

commit 81cea5219ae86df950f10aa123072df78c7cdf23
Author: Kang <kxiao.tiger@gmail.com>
Date:   Sun Feb 19 23:28:33 2023 +0800

    [bugfix](topn) fix topn read_orderby_key_columns nullptr (apache#16896)

    The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
    make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
    triggered by skipping _colname_to_value_range init in apache#16818 .

    This PR makes two changes:
    1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
    2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump

commit 2fee1d1d79942e49eddaafdc2b49e49b0651b109
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Feb 10 12:56:33 2023 +0800

    [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (apache#16514)

    1. add limit threshold for topn runtime pushdown and key topn optimization
    2. use unified session variable topn_opt_limit_threshold for all topn optimizations
    3. add fuzzy support for topn_opt_limit_threshold

commit 1696bed39129fcc891f32f64ff1fb43f9531fcd4
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Feb 2 09:13:32 2023 +0800

    [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (apache#16331)

    * fix topn runtime predicate getting value bug for decimal type

    * fix cast_to_string bug for TYPE_DECIMALV2

commit d70cdf61521a23417c9bc734a3cdb668265a15b0
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 16:18:46 2023 +0800

    topn sync doris order by key topn query optimization apache#15663

commit 1df514c8f0b66ae9a8438617163a31848e519949
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 15:14:43 2023 +0800

    sync with doris runtime prune for topn query apache#15558
xiaokang added a commit to xiaokang/doris that referenced this pull request Jun 4, 2023
…or topn optimizations (apache#16514)

1. add limit threshold for topn runtime pushdown and key topn optimization
2. use unified session variable topn_opt_limit_threshold for all topn optimizations
3. add fuzzy support for topn_opt_limit_threshold
morningman pushed a commit that referenced this pull request Jun 6, 2023
1. add testcase for key topn opt
2. disable key topn opt only for DUP_KEYS and UNIQUE_KEYS with MOW
3. cherry pick some bugfix commit from master

commit 58c5108
Author: Kang <kxiao.tiger@gmail.com>
Date:   Sun Feb 19 23:28:33 2023 +0800

    [bugfix](topn) fix topn read_orderby_key_columns nullptr (#16896)

commit 479272f
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Mar 31 10:02:07 2023 +0800

    [bugfix](topn) fix topn optimzation wrong result for NULL values (#18121)


commit d9924c9
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Feb 10 12:56:33 2023 +0800

    [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (#16514)

    1. add limit threshold for topn runtime pushdown and key topn optimization
    2. use unified session variable topn_opt_limit_threshold for all topn optimizations
    4. add fuzzy support for topn_opt_limit_threshold
mongo360 pushed a commit to mongo360/doris that referenced this pull request Jul 12, 2023
…20406)

1. add testcase for key topn opt
2. disable key topn opt only for DUP_KEYS and UNIQUE_KEYS with MOW
3. cherry pick some bugfix commit from master

commit 58c5108
Author: Kang <kxiao.tiger@gmail.com>
Date:   Sun Feb 19 23:28:33 2023 +0800

    [bugfix](topn) fix topn read_orderby_key_columns nullptr (apache#16896)

commit 479272f
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Mar 31 10:02:07 2023 +0800

    [bugfix](topn) fix topn optimzation wrong result for NULL values (apache#18121)


commit d9924c9
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Feb 10 12:56:33 2023 +0800

    [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (apache#16514)

    1. add limit threshold for topn runtime pushdown and key topn optimization
    2. use unified session variable topn_opt_limit_threshold for all topn optimizations
    4. add fuzzy support for topn_opt_limit_threshold
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
Issue Number: close http://jira.selectdb-in.cc/browse/CORE-1462

Describe the overview of changes.

commit e1697741a82f875ca42b0d18caa7972eaa225bee
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Jan 19 22:59:29 2023 +0800

    [opt](test) scalar_types_p0 use 100k lines dataset and scalar_types_p2 use 1000k (apache#16104)

commit 33a47e8d02644123ffd8c5c4353653c1c175e96a
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Jan 18 14:17:24 2023 +0800

    [testcase](bitmap index)bitmap index testcase (apache#15975)

    * add bitmap index testcases for all scalar types

commit 260a631441834ca7e23da4b77c922eb818eddca7
Author: Kang <kxiao.tiger@gmail.com>
Date:   Mon Jan 16 16:49:59 2023 +0800

    [regression-test](topn)add test cases for nonkey topn query for each scalar type (apache#15790)

    related to apache#15558 apache#15693
    1. dup key table with 17 scalar datatypes
    2. unique key table with mow enabled
    3. unique key table with mow disabled

commit 81cea5219ae86df950f10aa123072df78c7cdf23
Author: Kang <kxiao.tiger@gmail.com>
Date:   Sun Feb 19 23:28:33 2023 +0800

    [bugfix](topn) fix topn read_orderby_key_columns nullptr (apache#16896)

    The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
    make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
    triggered by skipping _colname_to_value_range init in apache#16818 .

    This PR makes two changes:
    1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
    2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump

commit 2fee1d1d79942e49eddaafdc2b49e49b0651b109
Author: Kang <kxiao.tiger@gmail.com>
Date:   Fri Feb 10 12:56:33 2023 +0800

    [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (apache#16514)

    1. add limit threshold for topn runtime pushdown and key topn optimization
    2. use unified session variable topn_opt_limit_threshold for all topn optimizations
    3. add fuzzy support for topn_opt_limit_threshold

commit 1696bed39129fcc891f32f64ff1fb43f9531fcd4
Author: Kang <kxiao.tiger@gmail.com>
Date:   Thu Feb 2 09:13:32 2023 +0800

    [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (apache#16331)

    * fix topn runtime predicate getting value bug for decimal type

    * fix cast_to_string bug for TYPE_DECIMALV2

commit d70cdf61521a23417c9bc734a3cdb668265a15b0
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 16:18:46 2023 +0800

    topn sync doris order by key topn query optimization apache#15663

commit 1df514c8f0b66ae9a8438617163a31848e519949
Author: Kang <kxiao.tiger@gmail.com>
Date:   Wed Feb 22 15:14:43 2023 +0800

    sync with doris runtime prune for topn query apache#15558
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner kind/docs Categorizes issue or PR as related to documentation. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants