Skip to content

Conversation

@starocean999
Copy link
Contributor

@starocean999 starocean999 commented Mar 17, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

if the order by keys are not simple slot in sort node, the order by exprs have to been added to sort node's output tuple. In that case, we need add a project node above sort node to eliminate the unused order by exprs. for example:

WITH t0 AS 
    (SELECT DATE_FORMAT(date,
         '%Y%m%d') AS date
    FROM cir_1756_t1 ), t3 AS 
    (SELECT date_format(date,
         '%Y%m%d') AS `date`
    FROM `cir_1756_t2`
    GROUP BY  date_format(date, '%Y%m%d')
    **ORDER BY  date_format(date, '%Y%m%d')** )
SELECT t0.date
FROM t0
LEFT JOIN t3
    ON t0.date = t3.date;

before:

+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[159] ( distinct=false, projects=[date#1], excepts=[], canEliminate=true )                                                         |
| +--LogicalJoin[158] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#1 = date#3)], otherJoinConjuncts=[] ) |
|    |--LogicalProject[151] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#1], excepts=[], canEliminate=true )                |
|    |  +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=412339, preAgg=ON )              |
|    +--LogicalSort[157] ( orderKeys=[date_format(cast(date#3 as DATETIME), '%Y%m%d') asc null first] )                                            |
|       +--LogicalAggregate[156] ( groupByExpr=[date#3], outputExpr=[date#3], hasRepeat=false )                                                    |
|          +--LogicalProject[155] ( distinct=false, projects=[date_format(date#2, '%Y%m%d') AS `date`#3], excepts=[], canEliminate=true )          |
|             +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=412352, preAgg=ON )        |
+--------------------------------------------------------------------------------------------------------------------------------------------------+

after:

+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[171] ( distinct=false, projects=[date#2], excepts=[], canEliminate=true )                                                         |
| +--LogicalJoin[170] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#2 = date#4)], otherJoinConjuncts=[] ) |
|    |--LogicalProject[162] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#2], excepts=[], canEliminate=true )                |
|    |  +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=1049812, preAgg=ON )             |
|    +--LogicalProject[169] ( distinct=false, projects=[date#4], excepts=[], canEliminate=false )                                                  |
|       +--LogicalSort[168] ( orderKeys=[date_format(cast(date#4 as DATETIME), '%Y%m%d') asc null first] )                                         |
|          +--LogicalAggregate[167] ( groupByExpr=[date#4], outputExpr=[date#4], hasRepeat=false )                                                 |
|             +--LogicalProject[166] ( distinct=false, projects=[date_format(date#3, '%Y%m%d') AS `date`#4], excepts=[], canEliminate=true )       |
|                +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=1049825, preAgg=ON )    |
+--------------------------------------------------------------------------------------------------------------------------------------------------+

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@starocean999
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

hello-stephen commented Mar 17, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.68 seconds
stream load tsv: 481 seconds loaded 74807831229 Bytes, about 148 MB/s
stream load json: 25 seconds loaded 2358488459 Bytes, about 89 MB/s
stream load orc: 76 seconds loaded 1101869774 Bytes, about 13 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230321085720_clickbench_pr_117877.html

@starocean999
Copy link
Contributor Author

run buildall

@starocean999
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor

924060929 commented Mar 20, 2023

the column pruning problem will be complete fixed by #17579

@starocean999
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 22, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 17a1ce5 into apache:master Mar 22, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
… order by keys (apache#17913)

if the order by keys are not simple slot in sort node, the order by exprs have to been added to sort node's output tuple. In that case, we need add a project node above sort node to eliminate the unused order by exprs. for example:

```sql
WITH t0 AS 
    (SELECT DATE_FORMAT(date,
         '%Y%m%d') AS date
    FROM cir_1756_t1 ), t3 AS 
    (SELECT date_format(date,
         '%Y%m%d') AS `date`
    FROM `cir_1756_t2`
    GROUP BY  date_format(date, '%Y%m%d')
    **ORDER BY  date_format(date, '%Y%m%d')** )
SELECT t0.date
FROM t0
LEFT JOIN t3
    ON t0.date = t3.date;
```

before:
```
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[159] ( distinct=false, projects=[date#1], excepts=[], canEliminate=true )                                                         |
| +--LogicalJoin[158] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#1 = date#3)], otherJoinConjuncts=[] ) |
|    |--LogicalProject[151] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`apache#1], excepts=[], canEliminate=true )                |
|    |  +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=412339, preAgg=ON )              |
|    +--LogicalSort[157] ( orderKeys=[date_format(cast(date#3 as DATETIME), '%Y%m%d') asc null first] )                                            |
|       +--LogicalAggregate[156] ( groupByExpr=[date#3], outputExpr=[date#3], hasRepeat=false )                                                    |
|          +--LogicalProject[155] ( distinct=false, projects=[date_format(date#2, '%Y%m%d') AS `date`apache#3], excepts=[], canEliminate=true )          |
|             +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=412352, preAgg=ON )        |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
```

after:
```
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[171] ( distinct=false, projects=[date#2], excepts=[], canEliminate=true )                                                         |
| +--LogicalJoin[170] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#2 = date#4)], otherJoinConjuncts=[] ) |
|    |--LogicalProject[162] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`apache#2], excepts=[], canEliminate=true )                |
|    |  +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=1049812, preAgg=ON )             |
|    +--LogicalProject[169] ( distinct=false, projects=[date#4], excepts=[], canEliminate=false )                                                  |
|       +--LogicalSort[168] ( orderKeys=[date_format(cast(date#4 as DATETIME), '%Y%m%d') asc null first] )                                         |
|          +--LogicalAggregate[167] ( groupByExpr=[date#4], outputExpr=[date#4], hasRepeat=false )                                                 |
|             +--LogicalProject[166] ( distinct=false, projects=[date_format(date#3, '%Y%m%d') AS `date`apache#4], excepts=[], canEliminate=true )       |
|                +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=1049825, preAgg=ON )    |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/nereids kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants