Skip to content

planner: push aggregation functions with distinct to cop#15500

Merged
SunRunAway merged 28 commits intopingcap:masterfrom
SunRunAway:issue14623-pushdown-distinct
Apr 1, 2020
Merged

planner: push aggregation functions with distinct to cop#15500
SunRunAway merged 28 commits intopingcap:masterfrom
SunRunAway:issue14623-pushdown-distinct

Conversation

@SunRunAway
Copy link
Contributor

@SunRunAway SunRunAway commented Mar 19, 2020

What problem does this PR solve?

Issue Number: close #14623

Problem Summary:

When variable tidb_opt_distinct_agg_push_down is set to 1, TiDB can push down the DISTINCT part of COUNT DISTINCT(or SUM DISTINCT, etc) into coprocessor.

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:

Planner:

mysql> set @@session.tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> desc select /*+ agg_to_cop() */ avg(b), c, avg(b), count(distinct A, B),  count(distinct A), count(distinct c), sum(b) from t group by c;
+-----------------------------+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                          | estRows | task      | operator info                                                                                                                                                                                                                                                            |
+-----------------------------+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_4                | 4.00    | root      | Column#5, test.t.c, Column#5, Column#6, Column#7, Column#8, Column#9                                                                                                                                                                                                     |
| └─HashAgg_8                 | 4.00    | root      | group by:test.t.c, funcs:avg(Column#10, Column#11)->Column#5, funcs:count(distinct test.t.a, test.t.b)->Column#6, funcs:count(distinct test.t.a)->Column#7, funcs:count(distinct test.t.c)->Column#8, funcs:sum(Column#12)->Column#9, funcs:firstrow(test.t.c)->test.t.c |
|   └─TableReader_9           | 4.00    | root      | data:HashAgg_5                                                                                                                                                                                                                                                           |
|     └─HashAgg_5             | 4.00    | cop[tikv] | group by:test.t.a, test.t.b, test.t.c, funcs:count(test.t.b)->Column#10, funcs:sum(test.t.b)->Column#11, funcs:sum(test.t.b)->Column#12                                                                                                                                  |
|       └─TableFullScan_7     | 5.00    | cop[tikv] | table:t, keep order:false, stats:pseudo                                                                                                                                                                                                                                  |
+-----------------------------+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

Performance tested w/ TiFlash:

Low NDV (right scenario),

MySQL [test]> select count(distinct g), count(g) from test;
+-------------------+----------+
| count(distinct g) | count(g) |
+-------------------+----------+
|                10 | 50000000 |
+-------------------+----------+
1 row in set (0.43 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 0;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select count(distinct g) from test;
+-------------------+
| count(distinct g) |
+-------------------+
|                10 |
+-------------------+
1 row in set (4.37 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select count(distinct g) from test;
+-------------------+
| count(distinct g) |
+-------------------+
|                10 |
+-------------------+
1 row in set (0.43 sec)

High NDV (bad scenario),

MySQL [test]> select count(distinct id), count(id) from test;
+--------------------+-----------+
| count(distinct id) | count(id) |
+--------------------+-----------+
|           10000000 |  50000000 |
+--------------------+-----------+
1 row in set (16.57 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 0;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select count(distinct id) from test;
+--------------------+
| count(distinct id) |
+--------------------+
|           10000000 |
+--------------------+
1 row in set (17.09 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.01 sec)

MySQL [test]> select count(distinct id) from test;
+--------------------+
| count(distinct id) |
+--------------------+
|           10000000 |
+--------------------+
1 row in set (24.80 sec)

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • Support pushing down the DISTINCT part of COUNT DISTINCT(or SUM DISTINCT, etc) into coprocessor when variable tidb_opt_distinct_agg_push_down is set to 1

@SunRunAway SunRunAway added the sig/planner SIG: Planner label Mar 19, 2020
@SunRunAway SunRunAway requested review from a team as code owners March 19, 2020 13:41
@ghost ghost requested review from XuHuaiyu, lzmhhh123, qw4990 and winoros and removed request for a team March 19, 2020 13:41
@SunRunAway SunRunAway changed the title planner: push count distinct to cop planner: push aggregation functions with distinct to cop Mar 23, 2020
@codecov
Copy link

codecov bot commented Mar 23, 2020

Codecov Report

Merging #15500 into master will not change coverage by %.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #15500   +/-   ##
===========================================
  Coverage   80.6949%   80.6949%           
===========================================
  Files           505        505           
  Lines        136208     136208           
===========================================
  Hits         109913     109913           
  Misses        17810      17810           
  Partials       8485       8485           

@github-actions github-actions bot added the sig/execution SIG execution label Mar 23, 2020
@SunRunAway SunRunAway force-pushed the issue14623-pushdown-distinct branch from bdb4415 to ed5e7e6 Compare March 23, 2020 11:29
@SunRunAway
Copy link
Contributor Author

/run-all-tests

@SunRunAway SunRunAway requested review from eurekaka, francis0407 and winoros and removed request for francis0407 March 23, 2020 15:32
@github-actions github-actions bot added the sig/execution SIG execution label Mar 23, 2020
@SunRunAway SunRunAway added type/new-feature and removed sig/execution SIG execution labels Mar 23, 2020
@SunRunAway SunRunAway force-pushed the issue14623-pushdown-distinct branch from d861858 to e4a8b94 Compare April 1, 2020 10:51
@SunRunAway SunRunAway requested a review from winoros April 1, 2020 11:45
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 1, 2020
@zz-jason zz-jason added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. status/PTAL labels Apr 1, 2020
@zz-jason
Copy link
Member

zz-jason commented Apr 1, 2020

/run-all-tests

@zz-jason
Copy link
Member

zz-jason commented Apr 1, 2020

/run-check_dev_2
/run-unit-test

@SunRunAway SunRunAway merged commit 4eb9ca3 into pingcap:master Apr 1, 2020
@SunRunAway SunRunAway deleted the issue14623-pushdown-distinct branch April 1, 2020 12:49
sre-bot pushed a commit to sre-bot/tidb that referenced this pull request Apr 1, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented Apr 1, 2020

cherry pick to release-3.1 in PR #15974

@sre-bot
Copy link
Contributor

sre-bot commented Apr 1, 2020

cherry pick to release-4.0 in PR #15975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority/release-blocker This issue blocks a release. Please solve it ASAP. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/execution SIG execution sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/new-feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Push down Count(distinct) cases

5 participants