-
Notifications
You must be signed in to change notification settings - Fork 8.3k
MergingSortedTransform has low performance in read in order optimization #40675
Copy link
Copy link
Closed
Closed
Copy link
Labels
Description
Describe the situation
Queries taken from #40583
CREATE TABLE logs_time_dt
(
`time` DateTime64(9) Codec(Delta, ZSTD(7)),
`project` LowCardinality(String) CODEC(ZSTD(7)),
`service` LowCardinality(String) CODEC(ZSTD(7)),
`message` String CODEC(ZSTD(7)),
`tags_hash` Array(UInt64) CODEC(ZSTD(7)),
INDEX idx_message message TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 3,
INDEX idx_tags_hash tags_hash TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toStartOfHour(time)
ORDER BY (project, service, time)
SETTINGS index_granularity = 1024;
insert into logs_time_dt
(time, project, service, message, tags_hash)
select
fromUnixTimestamp64Nano(toInt64(toUnixTimestamp64Nano(toDateTime64('2022-08-01',9))+number/(2777)*1e9)),
'test' as project,
'test' as service,
'foo',
[ number % 3000 ]
from system.numbers
limit 60*1e6;
SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time
FORMAT `Null`;Query pipeline:
(Expression)
ExpressionTransform
(Sorting)
MergingSortedTransform 45 → 1
(Expression)
ExpressionTransform × 45
(Filter)
FilterTransform × 45
(ReadFromMergeTree)
MergeTreeInOrder × 45 0 → 1Performance:
0 rows in set. Elapsed: 3.425 sec. Processed 29.23 million rows, 1.11 GB (8.54 million rows/s., 324.36 MB/s.)For read in order optimization, sometime the inputs of MergingSortedTransform are already sorted. Therefore, when executing, the processor will read all chunks from input port 0, then all chunks from input port 1, and so on so far (the indices may be not correct, just understand that the processor will scan the input ports sequentially).
Try a simple fix by adding buffer for each input:
(Expression)
ExpressionTransform
(Sorting)
MergingSortedTransform 45 → 1
QueueBuffer × 45
(Expression)
ExpressionTransform × 45
(Filter)
FilterTransform × 45
(ReadFromMergeTree)
MergeTreeInOrder × 45 0 → 1Performance:
0 rows in set. Elapsed: 0.281 sec. Processed 29.23 million rows, 1.11 GB (104.21 million rows/s., 3.96 GB/s.)However, this simple fix has 2 main drawbacks:
- Cannot take advantage of limit pushdown (can fix easily)
- When the result set is big, OOM can happens, so need to add on-disk buffer.
We can discuss a solution to make read in order and normal read to have similar performance.
Reactions are currently unavailable