MergingSortedTransform has low performance in read in order optimization 

**Describe the situation**
Queries taken from #40583 

```sql
CREATE TABLE logs_time_dt
(
    `time` DateTime64(9) Codec(Delta, ZSTD(7)),
    `project` LowCardinality(String) CODEC(ZSTD(7)),
    `service` LowCardinality(String) CODEC(ZSTD(7)),
    `message` String CODEC(ZSTD(7)),
    `tags_hash` Array(UInt64) CODEC(ZSTD(7)),
    INDEX idx_message message TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 3,
    INDEX idx_tags_hash tags_hash TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toStartOfHour(time)
ORDER BY (project, service, time)
SETTINGS index_granularity = 1024;

insert into logs_time_dt
(time, project, service, message, tags_hash)
select
fromUnixTimestamp64Nano(toInt64(toUnixTimestamp64Nano(toDateTime64('2022-08-01',9))+number/(2777)*1e9)),
'test' as project,
'test' as service,
'foo',
[ number % 3000 ]
from system.numbers
limit 60*1e6;

SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time
FORMAT `Null`;
```

Query pipeline:
```sql
(Expression)
ExpressionTransform
  (Sorting)
  MergingSortedTransform 45 → 1
    (Expression)
    ExpressionTransform × 45
      (Filter)
      FilterTransform × 45
        (ReadFromMergeTree)
        MergeTreeInOrder × 45 0 → 1
````

Performance:
```sql
0 rows in set. Elapsed: 3.425 sec. Processed 29.23 million rows, 1.11 GB (8.54 million rows/s., 324.36 MB/s.)
```

For read in order optimization, sometime the inputs of MergingSortedTransform are already sorted. Therefore, when executing, the processor will read all chunks from input port 0, then all chunks from input port 1, and so on so far (the indices may be not correct, just understand that the processor will scan the input ports sequentially).

Try a simple fix by adding buffer for each input:
```sql
(Expression)
ExpressionTransform
 (Sorting)
 MergingSortedTransform 45 → 1
   QueueBuffer × 45
     (Expression)
     ExpressionTransform × 45
       (Filter)
       FilterTransform × 45
         (ReadFromMergeTree)
         MergeTreeInOrder × 45 0 → 1
```
Performance:
```sql
0 rows in set. Elapsed: 0.281 sec. Processed 29.23 million rows, 1.11 GB (104.21 million rows/s., 3.96 GB/s.)
```

However, this simple fix has 2 main drawbacks:
1. Cannot take advantage of limit pushdown (can fix easily)
2. When the result set is big, OOM can happens, so need to add on-disk buffer.

We can discuss a solution to make read in order and normal read to have similar performance.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MergingSortedTransform has low performance in read in order optimization #40675

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MergingSortedTransform has low performance in read in order optimization #40675

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions