Additional primary key scan after using skip index for query with FINAL clause.#78350
Conversation
|
Greetings @nickitat ! I have incorporated review feedback from #70210 and raised this PR. Regarding the suggestion to not introduce a new index type in EXPLAIN indexes = 1
SELECT count(id1)
FROM xtx
FINAL
WHERE v = 688899949
SETTINGS use_skip_indexes_if_final_exact_mode = 1, use_skip_indexes_if_final = 1
Query id: be54c2c3-393f-43fc-9794-dfd549b60545
┌─explain────────────────────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ Aggregating │
3. │ Expression (Before GROUP BY) │
4. │ Filter ((WHERE + Change column names to column identifiers)) │
5. │ ReadFromMergeTree (default.xtx) │
6. │ Indexes: │
7. │ PrimaryKey │
8. │ Condition: true │
9. │ Parts: 5/5 │
10. │ Granules: 7812502/7812502 │
11. │ Skip │
12. │ Name: secondaryidx │
13. │ Description: minmax GRANULARITY 1 │
14. │ Parts: 3/5 │
15. │ Granules: 3/7812502 │
16. │ PrimaryKeyExpand │
17. │ Description: PrimaryKeyExpandForFinal │
18. │ Parts: 5/3 │
19. │ Granules: 14/3 │
└────────────────────────────────────────────────────────────────────┘changed to EXPLAIN indexes = 1
SELECT count(id1)
FROM xtx
FINAL
WHERE v = 688899949
SETTINGS use_skip_indexes_if_final_exact_mode = 1, use_skip_indexes_if_final = 1
Query id: be54c2c3-393f-43fc-9794-dfd549b60545
┌─explain────────────────────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ Aggregating │
3. │ Expression (Before GROUP BY) │
4. │ Filter ((WHERE + Change column names to column identifiers)) │
5. │ ReadFromMergeTree (default.xtx) │
6. │ Indexes: │
7. │ PrimaryKey │
8. │ Condition: true │
9. │ Parts: 5/5 │
10. │ Granules: 7812502/7812502 │
11. │ Skip │
12. │ Name: secondaryidx │
13. │ Description: minmax GRANULARITY 1 │
14. │ Parts: 3/5 │
15. │ Granules: 3/7812502 │
16. │ Skip │
17. | Name: <internal> |
18. │ Description: Find primary keys in all parts for FINAL │
19. │ Parts: 5/3 │
20. │ Granules: 14/3 │
└────────────────────────────────────────────────────────────────────┘ |
|
There are 5 "AST fuzzer" failures and 3 other failures. I did a check of the "AST fuzzer" failures and the logs show that the new testcase in this PR is being fuzz'ed and concurrently run during the failure. But the logs also show that the new code has not been entered before the time of failure (because of various reasons). The testcase inserts 1M rows in 5 parts, maybe something there. |
|
2 Failures -
|
I'm fine with both. Let's make |
|
As for additional testing. I have two ideas:
|
|
Implemented the change in the description of EXPLAIN, now looks like - $ explain indexes=1 select id from tx FINAL where v = 33 order by id SETTINGS use_skip_indexes_if_final=1,use_skip_indexes_if_final_exact_mode=1;
EXPLAIN indexes = 1
SELECT id
FROM tx
FINAL
WHERE v = 33
ORDER BY id ASC
SETTINGS use_skip_indexes_if_final = 1, use_skip_indexes_if_final_exact_mode = 1
Query id: 4c830c90-4cc0-4afc-8d41-e869e18ee485
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression (Project names) │
2. │ Sorting (Sorting for ORDER BY) │
3. │ Expression ((Before ORDER BY + Projection)) │
4. │ Filter ((WHERE + Change column names to column identifiers)) │
5. │ ReadFromMergeTree (default.tx) │
6. │ Indexes: │
7. │ PrimaryKey │
8. │ Condition: true │
9. │ Parts: 4/4 │
10. │ Granules: 13087317/13087317 │
11. │ Skip │
12. │ Name: vx │
13. │ Description: minmax GRANULARITY 1 │
14. │ Parts: 4/4 │
15. │ Granules: 55/13087317 │
16. │ PrimaryKeyExpand │
17. │ Description: Selects all granules that intersect by PK values with the previous skip indexes selection │
18. │ Parts: 4/4 │
19. │ Granules: 492/55 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
19 rows in set. Elapsed: 10.353 sec.Testing - CREATE TABLE tx (id UInt32, v UInt32, INDEX vx v TYPE minmax) ENGINE = ReplacingMergeTree ORDER BY (id) SETTINGS index_granularity=64;
SYSTEM STOP MERGES x;
insert into tx select abs(floor(randUniform(1,100000000))),floor(randExponential(1 / 2)) from numbers(10000000);
..
insert into tx select abs(floor(randUniform(1,100000000))),floor(randExponential(1 / 2)) from numbers(10000000);Inserted millions of rows and created 13.08 million granules. Then ran queries in set Q1 and set Q2 and verified consolidated output is exactly the same. Q1 set use_skip_indexes=0;
set use_skip_indexes_if_final=0;
set use_skip_indexes_if_final_exact_mode=0;
select id from tx FINAL where v = 33 order by id;
select id from tx FINAL where v = 32 order by id;
…
///100 queries with =, >, < and different predicate valuesQ2 Verified output is exactly byte by byte same form multiple runs with varying predicate values in Q1 & Q2 Ran |
|
4 failures -
Failures unrelated to work in PR. |
|
My bad; I should have explained myself clearly. We have a way to randomize settings in tests: ClickHouse/tests/clickhouse-test Line 1089 in d8620a6 To test different combinations. Let's support new settings there. Regarding the randomized test case, let's also add it to the whole test fleet to avoid regression. |
| @@ -1,4 +1,6 @@ | |||
| -- This tests will show the difference in data with use_skip_indexes_if_final and w/o | |||
| -- Tags: no-random-settings | |||
There was a problem hiding this comment.
we better fix values for specific settings than disable randomization completely:
set use_skip_indexes_if_final=0,use_skip_indexes_if_final_exact_mode = 0;There was a problem hiding this comment.
Good point, I will correct that!
| CREATE TABLE st (id Int32, v Int32, r Int32, INDEX bfv v TYPE bloom_filter) ENGINE=ReplacingMergeTree ORDER BY (id) SETTINGS index_granularity = 64; | ||
| SYSTEM STOP MERGES st; | ||
|
|
||
| INSERT INTO st SELECT id % 9999999, if(id % 729 = 0, 4, v), 1 FROM (SELECT * FROM generateRandom('id UInt32, v UInt32', toUnixTimestamp(now())) limit 1000000) SETTINGS max_threads = 1; |
There was a problem hiding this comment.
There should be a way to know the seed to reproduce the failure. I think the simplest solution is to rework the test as a bash script, save the seed into a variable, and print it in case of failure.
There was a problem hiding this comment.
I chose toUnixTimestamp(now())) as seed to get randomness literally in every CI run. And in case of unlikely test failure tomorrow, deduce the seed from the timestamp in the failure log. I will try to write the bash script and upload back.
|
Only failure in recent CI run is |
|
Not sure what happened for cancellation in Only other failure in |
659da48
…inal_exact_mode Improvements skip index final exact mode (follow up to PR #78350)
Resolves #70292, #31411 and #34243 . Solution will make sure that correct results are returned by a
SELECTquery withFINALclause on aReplacingMergeTreetable that used a skip index. This should significantly improve performance of FINAL queries and reduce memory usage. Solution takes the initial set of PK ranges retrieved from the skip index and then finds matching PK ranges across all parts.Earlier version of this PR : #70210
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
use_skip_indexes_in_final_exact_mode. If a query on aReplacingMergeTreetable has FINAL clause, reading only table ranges based on skip indexes may produce incorrect result. This setting can ensure that correct results are returned by scanning newer parts that have overlap with primary key ranges returned by the skip index. Set to 0 to disable, 1 to enable