Use buffering while reading in order in queries with WHERE#64607
Use buffering while reading in order in queries with WHERE#64607CurtizJ merged 17 commits intoClickHouse:masterfrom
WHERE#64607Conversation
|
This is an automated comment for commit a072cd2 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
On example from #40583: CREATE TABLE logs_time_dt
(
`time` DateTime64(9) Codec(Delta, ZSTD(7)),
`project` LowCardinality(String) CODEC(ZSTD(7)),
`service` LowCardinality(String) CODEC(ZSTD(7)),
`message` String CODEC(ZSTD(7)),
`tags_hash` Array(UInt64) CODEC(ZSTD(7)),
INDEX idx_message message TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 3,
INDEX idx_tags_hash tags_hash TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toStartOfHour(time)
ORDER BY (project, service, time)
SETTINGS index_granularity = 1024;
insert into logs_time_dt
(time, project, service, message, tags_hash)
select
fromUnixTimestamp64Nano(toInt64(toUnixTimestamp64Nano(toDateTime64('2022-08-01',9))+number/(2777)*1e9)),
'test' as project,
'test' as service,
'foo',
[ number % 3000 ]
from system.numbers
limit 60*1e6;Before: SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time ASC
FORMAT `Null`
0 rows in set. Elapsed: 1.636 sec. Processed 29.26 million rows, 696.98 MB (17.88 million rows/s., 425.92 MB/s.)
Peak memory usage: 32.66 MiB.After: SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time ASC
FORMAT `Null`;
0 rows in set. Elapsed: 0.205 sec. Processed 29.26 million rows, 696.98 MB (142.46 million rows/s., 3.39 GB/s.)
Peak memory usage: 30.39 MiB. |
48d0bb3 to
6a8bd46
Compare
| class BufferChunksTransform : public IProcessor | ||
| { | ||
| public: | ||
| BufferChunksTransform(const Block & header_, size_t max_bytes_to_buffer_, size_t limit_); |
There was a problem hiding this comment.
it seems to me that if this optimisation will only be based on the number of bytes we will miss a lot of cases when rows are more or less wide. I think we always can buffer up to a full block for example. and that would be a good lower limit.
wdyt?
There was a problem hiding this comment.
Yes, probably it makes sense. However in current implementation we always buffer at least one chunk because we can exceed the threshold.
There was a problem hiding this comment.
I meant 65K rows regardless of their size in bytes, because afaiu we assume individual chunks to be much smaller in terms of number of rows
872a3f5 to
c8be63a
Compare
| else if (input.isFinished()) | ||
| { | ||
| output.finish(); | ||
| return Status::Finished; | ||
| } |
There was a problem hiding this comment.
I'm not sure if any problem could actually happen with the current implementation, but just for a piece of mind I'd move this if outside of the if (output.canPush()) condition just right on the l.28
There was a problem hiding this comment.
Ok, probably makes sense.
f308b1b to
9c071fc
Compare
|
00504_mergetree_arrays_rw.sql: #66248 |
|
Perf tests: replaceRegexp_fallback - #66185 (comment) |
| { | ||
|
|
||
| /// Transform that buffers chunks from the input | ||
| /// up to the certain limit and pushes chunks to |
|
Linked to this performance regression #66578 for anyone who care about this improvement. |



Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Optimized queries with
ORDER BYprimary key andWHEREthat have a condition with high selectivity by using of buffering. It is controlled by settingread_in_order_use_buffering(enabled by default) and can increase memory usage of query.Resolves #40583
Resolves #40675
Resolves #11482
Resolves #17364
CI Settings (Only check the boxes if you know what you are doing):