Disable data skipping indexes by default for queries with FINAL#34243
Disable data skipping indexes by default for queries with FINAL#34243kitaisreal merged 1 commit intoClickHouse:masterfrom
Conversation
This patch adds use_skip_indexes_if_final setting, that is OFF by default. Since skipping data for queries with FINAL may produce incorrect result. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
e90993c to
1d19851
Compare
|
@azat could you please provide example, when using skip indexes in final can introduce different results. Maybe it is better to try fix this without setting. |
There is example in the tests, I've added few comments into it, hope it helps: -- This tests will show the difference in data with use_skip_indexes_if_final and w/o
CREATE TABLE data_02201 (
key Int,
value_max SimpleAggregateFunction(max, Int),
INDEX idx value_max TYPE minmax GRANULARITY 1
)
Engine=AggregatingMergeTree()
ORDER BY key
PARTITION BY key;
SYSTEM STOP MERGES data_02201;
INSERT INTO data_02201 SELECT number, number FROM numbers(10);
INSERT INTO data_02201 SELECT number, number+1 FROM numbers(10);
--
-- So here we have index for value_max which is SimpleAggregateFunction(max), so it depends on merging the data:
--
SELECT * FROM data_02201 FINAL WHERE value_max = 1 ORDER BY key, value_max SETTINGS use_skip_indexes=1, use_skip_indexes_if_final=0;
-- 0 1
SELECT * FROM data_02201 FINAL WHERE value_max = 1 ORDER BY key, value_max SETTINGS use_skip_indexes=1, use_skip_indexes_if_final=1;
-- 0 1
-- 1 1So the problem is kind of similar to the PREWHERE case (there is |
|
@azat make sense to introduce such setting for consistency with |
|
Hello, sorry for commenting on an already merged pull request. I just got bitten by this change of behavior where my queries using skip index got suddenly very much slower after Clickhouse upgrade. I would like to understand the conditions where the problem with using a skip index can and cannot occur. Is it only relevant for indexes where aggregation is involved? I am using a |
|
Hi @beda42 , sorry for the delay, and no problem on commenting here.
Yes, and not just aggregation, but
Your case should be fine. |
CREATE TABLE test
(
`phone` String,
`arr` Array(UInt32),
`v` UInt16,
INDEX arr_idx arr TYPE set(0) GRANULARITY 3
) ENGINE = ReplacingMergeTree(v) ORDER BY phone SETTINGS index_granularity = 8192;insert into test (phone, arr, v) values ('+123', [1, 2], 1);
insert into test (phone, arr, v) values ('+123', [1, 2, 3], 2);set use_skip_indexes_if_final = 1;select * from test final where has(arr, 3) = 0;produces which is incorrect :) so, |
|
Does it make sense to enable skipping index on column which does not change ? Like high cardinality id columns which does not change. Is it possible to selectively enable specific skipping index ? |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Apply data skipping indexes for queries with FINAL may produce incorrect result. Disable data skipping indexes by default for queries with FINAL (introduce new
use_skip_indexes_if_finalsetting and disable it by default)NOTE: maybe this should be marked as
backward incompatible, since this may decrease the performance. Thoughts?Refs: #29405 (cc @kitaisreal )
Refs: #25940 (cc @kssenii )
Refs: #8684 (cc @CurtizJ @alexey-milovidov )