Skip to content

Conversation

@Hastyshell
Copy link
Collaborator

@Hastyshell Hastyshell commented Dec 24, 2022

Proposed changes

Issue Number: close #13982

Problem summary

Enhance aggregate function collect_set and collect_list to support optional max_size param, which enables to limit the number of elements in result array.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

Regression tests will be added soon.

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner area/sql/function Issues or PRs related to the SQL functions area/vectorization kind/docs Categorizes issue or PR as related to documentation. labels Dec 24, 2022
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Jan 1, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.3 seconds
stream load tsv: 476 seconds loaded 74807831229 Bytes, about 149 MB/s
stream load json: 40 seconds loaded 2358488459 Bytes, about 56 MB/s
stream load orc: 69 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230222142950_clickbench_pr_101952.html

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@Hastyshell Hastyshell force-pushed the groupUniqArray branch 7 times, most recently from 4c74780 to 49ada0f Compare January 8, 2023 04:42
@Hastyshell
Copy link
Collaborator Author

run buildall

@Hastyshell
Copy link
Collaborator Author

run p0

@Hastyshell
Copy link
Collaborator Author

run p1

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Hastyshell Hastyshell requested review from HappenLee and removed request for zhangstar333 February 22, 2023 13:32
Copy link
Collaborator

@Yukang-Lian Yukang-Lian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit c0360f8 into apache:master Feb 27, 2023
@BiteTheDDDDt
Copy link
Contributor

Hi, Does the case in this pr need to add some order by to ensure that the results are always consistent?
I'm running into something that looks like it's sorting related.

2023-02-28 16:58:32.363 ERROR [suite-thread-3] (ScriptContext.groovy:121) - Run test_aggregate_collect in /doris/regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_collect.groovy failed
java.lang.IllegalStateException: Check tag 'select' failed:
Check tag 'select' failed, line 1, CHAR result mismatch.
Expect cell is: [1555555555, 3555555555, 9555555555, 255555555, 55555555555]
But real is: [1555555555, 9555555555, 3555555555, 255555555, 55555555555]

@Hastyshell
Copy link
Collaborator Author

Hi, Does the case in this pr need to add some order by to ensure that the results are always consistent? I'm running into something that looks like it's sorting related.

2023-02-28 16:58:32.363 ERROR [suite-thread-3] (ScriptContext.groovy:121) - Run test_aggregate_collect in /doris/regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_collect.groovy failed
java.lang.IllegalStateException: Check tag 'select' failed:
Check tag 'select' failed, line 1, CHAR result mismatch.
Expect cell is: [1555555555, 3555555555, 9555555555, 255555555, 55555555555]
But real is: [1555555555, 9555555555, 3555555555, 255555555, 55555555555]

It seems so, I will add order_by to ensure the stability.

yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
…nd add group_array aliases (apache#15339)

Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/planner Issues or PRs related to the query planner area/sql/function Issues or PRs related to the SQL functions area/vectorization dev/1.2.2 kind/docs Categorizes issue or PR as related to documentation. kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] group_uniq_array function

7 participants