Skip to content

Performance becomes worse for queries with LIMIT clause after #64607 #66578

@FrankChen021

Description

@FrankChen021

For the same example

CREATE TABLE logs_time_dt
(
    `time` DateTime64(9) Codec(Delta, ZSTD(7)),
    `project` LowCardinality(String) CODEC(ZSTD(7)),
    `service` LowCardinality(String) CODEC(ZSTD(7)),
    `message` String CODEC(ZSTD(7)),
    `tags_hash` Array(UInt64) CODEC(ZSTD(7)),
    INDEX idx_message message TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 3,
    INDEX idx_tags_hash tags_hash TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toStartOfHour(time)
ORDER BY (project, service, time)
SETTINGS index_granularity = 1024;

insert into logs_time_dt
(time, project, service, message, tags_hash)
select
fromUnixTimestamp64Nano(toInt64(toUnixTimestamp64Nano(toDateTime64('2022-08-01',9))+number/(2777)*1e9)),
'test' as project,
'test' as service,
'foo',
[ number % 3000 ]
from system.numbers
limit 60*1e6;

If the LIMIT claus is used in the query, the performance is worse than before.

set read_in_order_use_buffering=0;

SET read_in_order_use_buffering = 0

Query id: 417916b3-ce57-4c75-871a-cc8b0b583812

Ok.

0 rows in set. Elapsed: 0.001 sec.

ip-10-169 :) SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time desc
limit 100 FORMAT `Null`; //<-----------------LIMIT clause is added

SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time DESC
LIMIT 100
FORMAT `Null`

Query id: 7e7c48b5-7190-4346-922b-30e5bba179c8

Ok.

0 rows in set. Elapsed: 0.087 sec. Processed 306.23 thousand rows, 7.53 MB (3.51 million rows/s., 86.24 MB/s.)
Peak memory usage: 24.93 MiB.

And when the optimization is turned on(which is by default now)

set read_in_order_use_buffering=1;

SET read_in_order_use_buffering = 1

Query id: 7a50f132-6198-4098-a2e1-9afc39e6d3d1

Ok.

0 rows in set. Elapsed: 0.001 sec.

ip-10 :) SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time desc
limit 100 FORMAT `Null`;

SELECT *
FROM logs_time_dt
WHERE (project = 'test') AND (service = 'test') AND has(tags_hash, 42)
ORDER BY time DESC
LIMIT 100
FORMAT `Null`

Query id: 96ed4b86-2398-4678-b293-8d08910e70b4

Ok.

0 rows in set. Elapsed: 0.154 sec. Processed 5.27 million rows, 135.31 MB (34.24 million rows/s., 878.31 MB/s.)
Peak memory usage: 58.70 MiB.

For simplicity:

BEFORE: Elapsed: 0.087 sec. Processed 306.23 thousand rows, 7.53 MB (3.51 million rows/s., 86.24 MB/s.)
Peak memory usage: 24.93 MiB.
AFTER:  Elapsed: 0.154 sec. Processed 5.27 million rows, 135.31 MB (34.24 million rows/s., 878.31 MB/s.)
Peak memory usage: 58.70 MiB.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions