Automatic GROUP/ORDER BY to disk based on the memory usage by azat · Pull Request #71406 · ClickHouse/ClickHouse

azat · 2024-11-03T20:19:53Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Automatic GROUP BY/ORDER BY to disk based on the server/user memory usage. Controlled with max_bytes_ratio_before_external_group_by/max_bytes_ratio_before_external_sort query settings.

Fixes: #69286

robot-ch-test-poll2 · 2024-11-03T20:21:04Z

This is an automated comment for commit dad435a with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	❌ failure

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
Builds	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	✅ success

src/Core/Settings.cpp

azat · 2024-11-04T12:46:45Z

test_insert_quorum_with_keeper_loss_connection is flaky - Test test_quorum_inserts/test.py is flaky #67604, but this time it failed with a different reason, which will be fixed here - Fix missing cluster startup for test_quorum_inserts::test_insert_quorum_with_keeper_fail #71418

azat · 2024-11-04T15:53:58Z

Stateless tests (ubsan) [2/2] — Server died, fail: 17, passed: 388, skipped: 5

/var/log/clickhouse-server/clickhouse-server.err.log:2024.11.04 09:31:07.829197 [ 1264 ] {} <Fatal> : Logical error: 'Part all_61_61_0 from table test_j2lmd2kq.alter_table1 (b42bc2ca-2251-436a-990e-3104aa0976cb) remains in ZooKeeper after DROP_RANGE all_0_61_999999999_999999999'.

LOGICAL_ERROR: Part {} remains in ZooKeeper after DROP_RANGE {} #56037

azat · 2024-11-05T10:14:08Z

Conflicting files
src/Core/SettingsChangesHistory.cpp

Fixed and also added separate setting for ORDER BY.

src/Interpreters/Aggregator.cpp

azat · 2024-11-07T13:07:06Z

Wow, green CI, haven't seen this for awhile

Michicosun

I think we can merge per query setting(max_bytes_ratio_before_external_group_by), because it is a sugar extension for currently used max_bytes_before_external_group_by.

Or use the method suggested in the issue and compute the memory limit for the query once before it starts as ratio_setting * remaining_memory_amount and use it similarly to max_bytes_before_external_group_by. In this case, to compute remaining_memory_amount, we need to find the memory tracker in the hierarchy with the most strict hard limit and check its occupancy.

docs/en/operations/server-configuration-parameters/settings.md

src/Interpreters/Aggregator.cpp

src/Processors/QueryPlan/SortingStep.cpp

src/Interpreters/Aggregator.cpp

src/Processors/Transforms/MergeSortingTransform.cpp

azat · 2024-11-25T16:16:48Z

I think we can merge per query setting(max_bytes_ratio_before_external_group_by), because it is a sugar extension for currently used max_bytes_before_external_group_by.

How do you think max_bytes_ratio_before_external_group_by can be merged with max_bytes_before_external_group_by?

I agree that it is kind of configuration sugar, but not completely.

The difference is that this ratio is supported all the time, not only at the query start (like described in #69286, and that 10% of cases that is not covered by that proposal is that are covered by this patches), and I do find it useful, imagine a situation when the user spawned 100 queries at one point in time, that will eventually requires more RAM that the node has.

Or use the method suggested in the issue and compute the memory limit for the query once before it starts as ratio_setting * remaining_memory_amount and use it similarly to max_bytes_before_external_group_by. In this case, to compute remaining_memory_amount, we need to find the memory tracker in the hierarchy with the most strict hard limit and check its occupancy.

Again, I'm not really like this idea, because it does allow to cover the case with query peaks.

And I actually looked through more advanced functionality that had been described in #41887, but I'm not sure that the complexity described there worth it, with taking into respect that this patches covers 99% cases.

P.S. I also followed the same convention as some other settings, like:

remerge_sort_lowered_memory_bytes_ratio
memory_overcommit_ratio_*
max_server_memory_usage_to_ram_ratio

Michicosun · 2024-11-25T19:33:43Z

How do you think max_bytes_ratio_before_external_group_by can be merged with max_bytes_before_external_group_by?

I meant to merge the changes with this setting (max_bytes_ratio_before_external_group_by).

I agree that it is kind of configuration sugar, but not completely.

If you set max_bytes_before_external_group_by = max_bytes_ratio_before_external_group_by * memory_limit then it will have the same logic you wrote for max_bytes_ratio_before_external_group_by.

But I agree that it is more convenient than the previous one.

And I actually looked through more advanced functionality that had been described in #41887, but I'm not sure that the complexity described there worth it, with taking into respect that this patches covers 99% cases.

Right now you implemented 2 new types of settings:

max_bytes_ratio_* - this is by complexity and solving problems is the same as max_bytes_before_*
*_for_server - this adds new behaviour. If server using more memory than this setting forces to spill data to disk of every running query.

Obviously 1-st can't solve #69286 because it has old logic.

Second on the other hand affects all queries at the same time - this is not per-query level setting. For example if limit is 0.7 and current memory consumption is 0.69 and you started new query it will be spilled to disk right after start, because total memory consumption will become > 0.7.

In proposed solution it is important to recalculate limit from free memory related to the current user. So why your solution handles 99% of cases?

azat · 2024-11-25T22:27:29Z

Thanks, I guess now I understand your point and it does make sense, let's verify if I understood your correctly:

leave only max_bytes_ratio_before_external_group_by (no server settings)
use hard memory limit of the server for this ratio
use strictest hard memory limit for this ratio

I meant to merge the changes with this setting (max_bytes_ratio_before_external_group_by).

But it is still unclear what you are suggesting to merge

In proposed solution it is important to recalculate limit from free memory related to the current user. So why your solution handles 99% of cases?

I haven't thought about this case, my point was that #41887 looks too complex to me (with an extra hook), and I think that it is better to try dumb implementation first (like this patch, but using per-user memory limits)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

…r_server under TSan The problem is that TSan has some memory overhead which increases process RSS to 10Gi in the middle of the query and it will fail. Another option is to avoid syncing with RSS too frequently, but I doubt that it is significant to run this test under TSan v1: increase memory to 20Gi Fixes: https://s3.amazonaws.com/clickhouse-test-reports/71406/b33be86dee3c7616e9193c339837b9e250810557/integration_tests__tsan__[6_6].html v2: instead just disable under TSan Fixes: https://s3.amazonaws.com/clickhouse-test-reports/71406/df68cf7362d825c3d83991fd74c391a49e73a8b1/integration_tests__tsan__[6_6].html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

…storage check Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

…::mergeOnBlock() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

…eded() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

- remove max_bytes_ratio_before_external_{order,group_by}_for_server - change the way max_bytes_ratio_before_external_{order,group_by} works Note, that it is not enough to transform ratio to bytes in executeQuery(), since in this case it will not work for merges and internal queries, plus, you have to reset them for Distributed engine and update it for Merge/View/... This patch also introduce some helpers (see MemoryTrackerUtils) and adjust Aggregator::Params constructor to accept Settings object instead of tons of arguments. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

…it is not configured Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

azat · 2024-11-28T15:17:12Z

I've rebased on top of upstream (to fix conflicts in settings changes), but the changes had been done on top for easier incremental review

Michicosun · 2024-11-29T12:25:39Z

test_storage_rabbitmq #71049

* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20241130) * Fix Build due to ClickHouse/ClickHouse#71406 * Fix build due to ClickHouse/ClickHouse#72460 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>

…ry consumption Make max_bytes_before_external_sort more user-friendly, previously it was number of bytes in the sorting block for one sorting thread, now it has the same meaning as max_bytes_before_external_group_by - it is total limit for the whole query memory for all threads. Also one more setting added to control on disk block size - min_external_sort_block_bytes But, after this change the files on disk can be very small (few kilobytes) and to avoid this, I've added separate setting to control on-disk block size - min_external_sort_block_bytes, to avoid too many files. Note, that max_bytes_ratio_before_external_sort already based on the memory consumption not the sorting block size (added in ClickHouse#71406) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

robot-ch-test-poll1 added the pr-improvement Pull request with some product improvements label Nov 3, 2024

alesapin mentioned this pull request Nov 3, 2024

Adaptive thresholds for spilling to disk #69286

Closed

azat force-pushed the automatic-external-aggregation branch 2 times, most recently from d392357 to 8b3ce12 Compare November 4, 2024 10:03

Michicosun self-assigned this Nov 4, 2024

novikd reviewed Nov 4, 2024

View reviewed changes

src/Core/Settings.cpp Show resolved Hide resolved

azat force-pushed the automatic-external-aggregation branch from b72e705 to 2c52da9 Compare November 4, 2024 12:48

azat changed the title ~~Automatic GROUP BY to disk (max_bytes_ratio_before_external_group_by)~~ Automatic GROUP/ORDER BY to disk Nov 5, 2024

azat force-pushed the automatic-external-aggregation branch 2 times, most recently from 19cdf22 to 2df43d4 Compare November 5, 2024 10:39

Michicosun reviewed Nov 5, 2024

View reviewed changes

src/Interpreters/Aggregator.cpp Outdated Show resolved Hide resolved

azat force-pushed the automatic-external-aggregation branch from 2df43d4 to a429980 Compare November 6, 2024 05:28

azat marked this pull request as draft November 6, 2024 09:38

azat force-pushed the automatic-external-aggregation branch from a429980 to b33be86 Compare November 6, 2024 10:03

azat marked this pull request as ready for review November 6, 2024 10:03

azat force-pushed the automatic-external-aggregation branch 2 times, most recently from edd5c4f to a773d3d Compare November 7, 2024 07:22

Michicosun requested changes Nov 7, 2024

View reviewed changes

azat force-pushed the automatic-external-aggregation branch from a773d3d to 9534a35 Compare November 25, 2024 17:45

azat marked this pull request as draft November 26, 2024 21:00

azat changed the title ~~Automatic GROUP/ORDER BY to disk~~ Automatic GROUP/ORDER BY to disk based on the server memory usage Nov 27, 2024

azat changed the title ~~Automatic GROUP/ORDER BY to disk based on the server memory usage~~ Automatic GROUP/ORDER BY to disk based on the memory usage Nov 27, 2024

azat added 14 commits November 28, 2024 16:01

Automatic ORDER BY to disk (max_bytes_ratio_before_external_sort)

378b6c8

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Implement max_bytes_ratio_before_external_group_by_for_server

d5a9903

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Implement max_bytes_ratio_before_external_sort_for_server

909eef4

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Cover max_bytes_ratio_before_external_{order,group_by}_for_server

02aa6ea

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Properly handle unlimited memory limit for automatic GROUP/ORDER BY

2fab184

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Respect max_bytes_ratio_before_external_sort_for_server for ORDER BY …

2d3d0bb

…storage check Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Respect max_bytes_ratio_before_external_sort_for_server in Aggregator…

520b21e

…::mergeOnBlock() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Reduce copy-paste by introducing Aggregator::writeToTemporaryFileIfNe…

ee271c2

…eded() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Move settings changes to 24.12

14e27b7

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Fix copy-paste typo for max_bytes_ratio_before_external_group_by doc

a8eaf3e

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Fix default value for group_by_overflow_mode

b0da170

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

Ignore max_bytes_ratio_before_external_{order,group_by} if memory lim…

dad435a

…it is not configured Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

azat force-pushed the automatic-external-aggregation branch from af822de to dad435a Compare November 28, 2024 15:16

Michicosun approved these changes Nov 29, 2024

View reviewed changes

Michicosun added this pull request to the merge queue Nov 29, 2024

Merged via the queue into ClickHouse:master with commit b81ee27 Nov 29, 2024

azat deleted the automatic-external-aggregation branch November 29, 2024 12:56

robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 29, 2024

baibaichen added a commit to Kyligence/gluten that referenced this pull request Nov 30, 2024

Fix Build due to ClickHouse/ClickHouse#71406

f4bc087

baibaichen added a commit to Kyligence/gluten that referenced this pull request Nov 30, 2024

Fix Build due to ClickHouse/ClickHouse#71406

ab39380

azat mentioned this pull request Dec 1, 2024

Make max_bytes_before_external_sort limit depends on total query memory consumption #72598

Merged

sb230132 mentioned this pull request Dec 3, 2024

For Grace join, what is the ideal value we can set for max_bytes_in_join OR grace_hash_join_initial_buckets ? #72727

Open

Conversation

azat commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

robot-ch-test-poll2 commented Nov 3, 2024 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

azat commented Nov 4, 2024

Uh oh!

azat commented Nov 4, 2024

Uh oh!

azat commented Nov 5, 2024

Uh oh!

Uh oh!

azat commented Nov 7, 2024

Uh oh!

Michicosun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

azat commented Nov 25, 2024

Uh oh!

Michicosun commented Nov 25, 2024

Uh oh!

azat commented Nov 25, 2024

Uh oh!

azat commented Nov 28, 2024

Uh oh!

Michicosun commented Nov 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

azat commented Nov 3, 2024 •

edited

Loading

robot-ch-test-poll2 commented Nov 3, 2024 •

edited by robot-clickhouse

Loading

Michicosun left a comment •

edited

Loading