Skip to content

filter on row_number() can be used on optimize window_sort to window_topN #39792

@windtalker

Description

@windtalker

Enhancement

mysql> desc test;
+-------+---------+------+------+---------+-------+
| Field | Type    | Null | Key  | Default | Extra |
+-------+---------+------+------+---------+-------+
| id    | int(11) | YES  |      | NULL    |       |
| value | date    | YES  |      | NULL    |       |
+-------+---------+------+------+---------+-------+
2 rows in set (0.00 sec)

mysql> explain select * from (select *, row_number() over (order by value) as rn from test) d where rn < 10;
+------------------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------------+
| id                                 | estRows | task         | access object | operator info                                                                                  |
+------------------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------------+
| TableReader_30                     | 1.60    | root         |               | data:ExchangeSender_29                                                                         |
| └─ExchangeSender_29                | 1.60    | mpp[tiflash] |               | ExchangeType: PassThrough                                                                      |
|   └─Selection_28                   | 1.60    | mpp[tiflash] |               | lt(Column#5, 10)                                                                               |
|     └─Window_25                    | 2.00    | mpp[tiflash] |               | row_number()->Column#5 over(order by test.test.value rows between current row and current row) |
|       └─Sort_17                    | 2.00    | mpp[tiflash] |               | test.test.value                                                                                |
|         └─ExchangeReceiver_16      | 2.00    | mpp[tiflash] |               |                                                                                                |
|           └─ExchangeSender_15      | 2.00    | mpp[tiflash] |               | ExchangeType: PassThrough                                                                      |
|             └─TableFullScan_14     | 2.00    | mpp[tiflash] | table:test    | keep order:false, stats:pseudo                                                                 |
+------------------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

As shown above, for window function row_number(), there is a common usage that user only wants to get the first N rows, so user will add a filter on the results of row_number().
However, TiDB's plan for this kind of query requires global sort on all data set, if the orignal data set is big, the performance is bad.
A potential optimization is the optimizer push where rn < 100 to the sort operation, and convert sort to topn

Metadata

Metadata

Assignees

No one assigned

    Labels

    sig/plannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions