Skip to content

Implementation of EXTRACT function is very inefficient #22700

@leiysky

Description

@leiysky

For now EXTRACT function takes argument with EvalTypetypes.ETString, which means it will do ParseDatetime for each evaluation.

This inefficient way was introduced by #21601. It fixed a compatibility issue but brought serious performance degradation when argument's type is types.ETDatetime(which is the most common case, IMO 😝), since ParseDatetime is very costly.

I suggest we can add a new extract funciton signature like builtinExtractDatetimeFromStringSig to handle the corner case fixed by #21601, i.e. EXTRACT from a TIME string literal.

In TPCH Query 9, there is a EXTRACT(YEAR FROM orders.o_orderdate), where o_orderdate is a DATE field. We can compare performance of builtinExtractDatetimeFromStringSig and builtinExtractDatetimeSig with this workload.

Here's a simplified benchmark(with branch leiysky@5986878) on TPCH SF1 data set:

# builtinExtractDatetimeFromStringSig
mysql root@localhost:tpch_1> explain analyze select extract(year from o_orderdate) from orders;
+---------------------+------------+-----------+-----------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+----------+--------+
| id                  | estRows    | actRows   | task      | access object   | execution info                                                                                                                                                                                                                         | operator info                                                             | memory   | disk   |
|---------------------+------------+-----------+-----------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+----------+--------|
| Projection_3        | 1500000.00 | 1500000   | root      |                 | time:958.2ms, loops:1466, Concurrency:5                                                                                                                                                                                                | extract(YEAR, cast(tpch_1.orders.o_orderdate, var_string(10)))->Column#11 | 87.3 KB  | N/A    |
| └─TableReader_5     | 1500000.00 | 1500000   | root      |                 | time:573.6ms, loops:1466, cop_task: {num: 1, max: 565.4ms, proc_keys: 1500000, tot_proc: 550ms, rpc_num: 1, rpc_time: 565.3ms, copr_cache_hit_ratio: 0.00}                                                                             | data:TableFullScan_4                                                      | 11.5 MB  | N/A    |
|   └─TableFullScan_4 | 1500000.00 | 1500000   | cop[tikv] | table:orders    | tikv_task:{time:520ms, loops:1469}, scan_detail: {total_process_keys: 1500000, total_keys: 1500001, rocksdb: {delete_skipped_count: 0, key_skipped_count: 1500000, block: {cache_hit_count: 3848, read_count: 0, read_byte: 0 Bytes}}} | keep order:false                                                          | N/A      | N/A    |
+---------------------+------------+-----------+-----------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+----------+--------+
# builtinExtractDatetimeSig
mysql root@localhost:tpch_1> explain analyze select extract(year from o_orderdate) from orders;
+---------------------+------------+-----------+-----------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+----------+--------+
| id                  | estRows    | actRows   | task      | access object   | execution info                                                                                                                                                                                                                         | operator info                                       | memory   | disk   |
|---------------------+------------+-----------+-----------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+----------+--------|
| Projection_3        | 1500000.00 | 1500000   | root      |                 | time:575.2ms, loops:1466, Concurrency:5                                                                                                                                                                                                | extract(YEAR, tpch_1.orders.o_orderdate)->Column#11 | 82.3 KB  | N/A    |
| └─TableReader_5     | 1500000.00 | 1500000   | root      |                 | time:551.6ms, loops:1466, cop_task: {num: 1, max: 547.8ms, proc_keys: 1500000, tot_proc: 540ms, rpc_num: 1, rpc_time: 547.7ms, copr_cache_hit_ratio: 0.00}                                                                             | data:TableFullScan_4                                | 11.5 MB  | N/A    |
|   └─TableFullScan_4 | 1500000.00 | 1500000   | cop[tikv] | table:orders    | tikv_task:{time:490ms, loops:1469}, scan_detail: {total_process_keys: 1500000, total_keys: 1500001, rocksdb: {delete_skipped_count: 0, key_skipped_count: 1500000, block: {cache_hit_count: 3848, read_count: 0, read_byte: 0 Bytes}}} | keep order:false                                    | N/A      | N/A    |
+---------------------+------------+-----------+-----------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/questionThe issue belongs to a question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions