Make max_bytes_before_external_sort limit depends on total query memory consumption#72598
Conversation
|
This is an automated comment for commit a7e97aa with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
This is ok, but it will be much better to use the same approach as in Aggregator - to use the total memory usage of the query. |
d47232f to
7eb25c1
Compare
Done (was waiting for the #71406, which adds the helper) |
de45f52 to
9441e74
Compare
|
After this change the files on disk can be very small (few kilobytes) and to avoid this, I've added separate setting to control on-disk block size - |
84d7f5a to
608c7ed
Compare
|
Thanks, this sounds good - let's continue. |
|
Right now CI is unstable for external sort, I'm waiting for #73097 to be merged (hope it will help) |
90ec40e to
f347332
Compare
…ry consumption Make max_bytes_before_external_sort more user-friendly, previously it was number of bytes in the sorting block for one sorting thread, now it has the same meaning as max_bytes_before_external_group_by - it is total limit for the whole query memory for all threads. Also one more setting added to control on disk block size - min_external_sort_block_bytes But, after this change the files on disk can be very small (few kilobytes) and to avoid this, I've added separate setting to control on-disk block size - min_external_sort_block_bytes, to avoid too many files. Note, that max_bytes_ratio_before_external_sort already based on the memory consumption not the sorting block size (added in ClickHouse#71406) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
f347332 to
a7e97aa
Compare
|
Test failures does not looks related:
|
Follow-up for: ClickHouse#72598 Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
…xternal ORDER BY In ClickHouse#72598 the behavior of `max_bytes_before_external_sort` has been changed, it has been switched to rely on the total memory usage of the query instead of the size of a sorting data. But, the problem with this approach is that you can have a lot of memory used outside of sorting (FINAL, aggregation), and spilling sorting blocks to disk will not help reduce memory usage obviously, but it will make things even worse, because it will create lots of small files, that will require a lot of memory to merge them. So this patch reverts the behavior of `max_bytes_before_external_sort` back, now it will rely on the sorting block size.
| {"allow_general_join_planning", false, true, "Allow more general join planning algorithm when hash join algorithm is enabled."}, | ||
| {"optimize_extract_common_expressions", false, false, "Introduce setting to optimize WHERE, PREWHERE, ON, HAVING and QUALIFY expressions by extracting common expressions out from disjunction of conjunctions."}, | ||
| {"max_bytes_ratio_before_external_sort", 0., 0., "New setting."}, | ||
| {"min_external_sort_block_bytes", 0., 100_MiB, "New setting."}, |
There was a problem hiding this comment.
The default value is very bad, but, it will be removed anyway - #80777
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Make max_bytes_before_external_sort limit depends on total query memory consumption (previously it was number of bytes in the sorting block for one sorting thread, now it has the same meaning as
max_bytes_before_external_group_by- it is total limit for the whole query memory for all threads). Also one more setting added to control on disk block size -min_external_sort_block_bytesMake max_bytes_before_external_sort more user-friendly, previously it
was number of bytes in the sorting block for one sorting thread, now it
has the same meaning as max_bytes_before_external_group_by - it is total
limit for the whole query memory for all threads.
Also one more setting added to control on disk block size -
min_external_sort_block_bytes
But, after this change the files on disk can be very small (few
kilobytes) and to avoid this, I've added separate setting to control
on-disk block size - min_external_sort_block_bytes, to avoid too many
files.
Note, that max_bytes_ratio_before_external_sort already based on the
memory consumption not the sorting block size (added in #71406)
P.S. maybe it will be better to mark it as
Backward Incompatible Changeto highlight this change in changelogs