Skip to content

Conversation

@xzj7019
Copy link
Contributor

@xzj7019 xzj7019 commented Sep 26, 2023

Proposed changes

Current multi-window plan generation has problem on the project sequence, for example:

+--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116, rank() WindowSpec(...) AS `rn`#117], ...)

and correspond physical plan is:

+--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116], ... )
    +--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`#117], ...] )

If the final plan is generated as following:

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[#208], i_brand[#202], cc_name[#203], i_category[#201]

Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems.

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[#219], i_brand[#213], cc_name[#214], i_category[#212]
  PROJECTIONS: i_category[#184], i_brand[#185], cc_name[#186], d_year[#187], d_moy[#188], sum_sales[#189], avg_monthly_sales[#191], rn[#190]
  PROJECTION TUPLE: 20

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@xzj7019
Copy link
Contributor Author

xzj7019 commented Sep 26, 2023

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 26, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.95 seconds
stream load tsv: 554 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17162321746 Bytes

@yiguolei yiguolei merged commit e863cfe into apache:master Sep 28, 2023
vinlee19 pushed a commit to vinlee19/doris that referenced this pull request Oct 7, 2023
…24912)

Current multi-window plan generation has problem on the project sequence, for example:

+--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`apache#116, rank() WindowSpec(...) AS `rn`apache#117], ...)
and correspond physical plan is:

+--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`apache#116], ... )
    +--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`apache#117], ...] )
If the final plan is generated as following:

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[apache#208], i_brand[apache#202], cc_name[apache#203], i_category[apache#201]
Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems.

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[apache#219], i_brand[apache#213], cc_name[apache#214], i_category[apache#212]
  PROJECTIONS: i_category[apache#184], i_brand[apache#185], cc_name[apache#186], d_year[apache#187], d_moy[apache#188], sum_sales[apache#189], avg_monthly_sales[apache#191], rn[apache#190]
  PROJECTION TUPLE: 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants