Improve performance and lower memory usage of GROUP BY with novel method. by palasonic1 · Pull Request #10956 · ClickHouse/ClickHouse

palasonic1 · 2020-05-16T00:31:39Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
placeholder

Detailed description / Documentation draft:
placeholder

…old=0.8

…ld=0.9

CLAassistant · 2022-10-18T02:46:06Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Maxim Serebryakov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

alexey-milovidov · 2022-10-24T18:18:35Z

@alexey-milovidov
@nickitat this is at least worth reading:
https://presentations.clickhouse.com/hse_2020/4th/GroupBySpeedup_pres.pdf
https://presentations.clickhouse.com/hse_2020/4th/GroupBySpeedup_full.pdf
#10956

@nickitat
I’ve read it, the experiment itself looks interesting. I personally don’t really believe that the fastest aggr implementation would be a concurrent one (and we see that even on 32 threads the author was made to increase number of buckets twice because of contention). It should be parallel (data parallel). So IMO more perspective direction would be to try to implement splitting aggregator as efficient as possible. maybe vectorise hash calculation, do not copy rows (only create vector of indices for each partition), reuse calculated hash, maybe smth else. This approach will do only one insertion into HT, have constant overhead per row and no scalability issues. it would be also reusable in distinct, limit by, maybe window functions.
wdyt? (edited)

@alexey-milovidov
Yes, I also think similarly. This approach is also harder to use with distributed aggregation.

draft: group by using shared method

d659e58

alexey-milovidov added the can be tested label May 16, 2020

alexey-milovidov changed the title ~~group by using shared method~~ Improve performance and lower memory usage of GROUP BY with novel method. May 16, 2020

blinkov added doc-alert pr-feature Pull request with new product feature labels May 16, 2020

Maxim Serebryakov added 3 commits May 16, 2020 13:35

build and style fixes

07c35f6

fix PVS and Functional stateless tests

0be109b

fix Functional stateful tests

6472cc0

azat mentioned this pull request May 17, 2020

Memory usage is larger than needed. #10818

Open

Maxim Serebryakov added 6 commits May 17, 2020 04:42

fix wrong data passed error

b19ac47

fix unnecessary slow conversion from two-level to shared

abbd92f

fix build

a48a638

more accurate way to initialize a shared table

eb6c9e6

try use shared group by with group_by_shared_method_proportion_thresh…

bd9e555

…old=0.8

try use shared group by with group_by_shared_method_proportion_threso…

6592b9d

…ld=0.9

azat mentioned this pull request May 18, 2020

Fix memory tracking for two-level GROUP BY when not all rows read from Aggregator (TCP) #11022

Merged

Maxim Serebryakov added 2 commits May 22, 2020 00:40

added parallel aggregation tests

d046992

fix PVS

974a191

alexey-milovidov force-pushed the master branch from 6c77191 to 09b9a30 Compare June 9, 2020 02:04

alexey-milovidov removed the can be tested label Dec 9, 2021

alexey-milovidov removed the doc-alert label Aug 20, 2022

ClickHouse deleted a comment from CLAassistant Oct 9, 2022

alexey-milovidov closed this Oct 24, 2022

alexey-milovidov assigned nickitat Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance and lower memory usage of GROUP BY with novel method.#10956

Improve performance and lower memory usage of GROUP BY with novel method.#10956
palasonic1 wants to merge 12 commits intoClickHouse:masterfrom
palasonic1:palasonic-draft-group-by

palasonic1 commented May 16, 2020

Uh oh!

CLAassistant commented Oct 18, 2022

Uh oh!

alexey-milovidov commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

palasonic1 commented May 16, 2020

Uh oh!

CLAassistant commented Oct 18, 2022

Uh oh!

alexey-milovidov commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants