Skip to content

planner: push aggregation functions with distinct to cop (#15500)#15974

Closed
sre-bot wants to merge 1 commit intopingcap:release-3.1from
sre-bot:release-3.1-4eb9ca3d3fc2
Closed

planner: push aggregation functions with distinct to cop (#15500)#15974
sre-bot wants to merge 1 commit intopingcap:release-3.1from
sre-bot:release-3.1-4eb9ca3d3fc2

Conversation

@sre-bot
Copy link
Contributor

@sre-bot sre-bot commented Apr 1, 2020

cherry-pick #15500 to release-3.1


What problem does this PR solve?

Issue Number: close #14623

Problem Summary:

When variable tidb_opt_distinct_agg_push_down is set to 1, TiDB can push down the DISTINCT part of COUNT DISTINCT(or SUM DISTINCT, etc) into coprocessor.

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:

Planner:

mysql> set @@session.tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> desc select /*+ agg_to_cop() */ avg(b), c, avg(b), count(distinct A, B),  count(distinct A), count(distinct c), sum(b) from t group by c;
+-----------------------------+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                          | estRows | task      | operator info                                                                                                                                                                                                                                                            |
+-----------------------------+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_4                | 4.00    | root      | Column#5, test.t.c, Column#5, Column#6, Column#7, Column#8, Column#9                                                                                                                                                                                                     |
| └─HashAgg_8                 | 4.00    | root      | group by:test.t.c, funcs:avg(Column#10, Column#11)->Column#5, funcs:count(distinct test.t.a, test.t.b)->Column#6, funcs:count(distinct test.t.a)->Column#7, funcs:count(distinct test.t.c)->Column#8, funcs:sum(Column#12)->Column#9, funcs:firstrow(test.t.c)->test.t.c |
|   └─TableReader_9           | 4.00    | root      | data:HashAgg_5                                                                                                                                                                                                                                                           |
|     └─HashAgg_5             | 4.00    | cop[tikv] | group by:test.t.a, test.t.b, test.t.c, funcs:count(test.t.b)->Column#10, funcs:sum(test.t.b)->Column#11, funcs:sum(test.t.b)->Column#12                                                                                                                                  |
|       └─TableFullScan_7     | 5.00    | cop[tikv] | table:t, keep order:false, stats:pseudo                                                                                                                                                                                                                                  |
+-----------------------------+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

Performance tested w/ TiFlash:

Low NDV (right scenario),

MySQL [test]> select count(distinct g), count(g) from test;
+-------------------+----------+
| count(distinct g) | count(g) |
+-------------------+----------+
|                10 | 50000000 |
+-------------------+----------+
1 row in set (0.43 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 0;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select count(distinct g) from test;
+-------------------+
| count(distinct g) |
+-------------------+
|                10 |
+-------------------+
1 row in set (4.37 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select count(distinct g) from test;
+-------------------+
| count(distinct g) |
+-------------------+
|                10 |
+-------------------+
1 row in set (0.43 sec)

High NDV (bad scenario),

MySQL [test]> select count(distinct id), count(id) from test;
+--------------------+-----------+
| count(distinct id) | count(id) |
+--------------------+-----------+
|           10000000 |  50000000 |
+--------------------+-----------+
1 row in set (16.57 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 0;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select count(distinct id) from test;
+--------------------+
| count(distinct id) |
+--------------------+
|           10000000 |
+--------------------+
1 row in set (17.09 sec)

MySQL [test]> set @@session.tidb_opt_distinct_agg_push_down = 1;
Query OK, 0 rows affected (0.01 sec)

MySQL [test]> select count(distinct id) from test;
+--------------------+
| count(distinct id) |
+--------------------+
|           10000000 |
+--------------------+
1 row in set (24.80 sec)

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • Support pushing down the DISTINCT part of COUNT DISTINCT(or SUM DISTINCT, etc) into coprocessor when variable tidb_opt_distinct_agg_push_down is set to 1

Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor Author

sre-bot commented Apr 1, 2020

/run-all-tests

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SunRunAway please resolve conflicts.

@SunRunAway
Copy link
Contributor

It's hard to pick to release 3.1. Closing it right now. If anyone needs this feature with this branch, feel free to contact me.

@SunRunAway SunRunAway closed this Apr 2, 2020
@SunRunAway SunRunAway added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority/release-blocker This issue blocks a release. Please solve it ASAP. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/execution SIG execution sig/planner SIG: Planner type/new-feature type/3.1-cherry-pick

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants