Filtering by skip indexes on data reading by amosbird · Pull Request #81526 · ClickHouse/ClickHouse

amosbird · 2025-06-09T08:00:04Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support filtering data parts using skip indexes during reading to reduce unnecessary index reads. Controlled by the new setting use_skip_indexes_on_data_read (disabled by default). This addresses #75774. This includes some common groundwork shared with #81021.

An illustrative example of the optimization effect:

create table test (s String, index s_idx s type bloom_filter(0.0001) granularity 1) engine MergeTree order by () settings index_granularity = 1024;

insert into test select if(number % 1024 == 0, 'needle', randomPrintableASCII(64)) from numbers_mt(10000000);

-- 39ms
select * from test where s = 'needle' limit 1 settings max_threads = 1, use_skip_indexes_on_data_read = 0, use_query_condition_cache = 0, send_logs_level = 'test';

-- 16ms
select * from test where s = 'needle' limit 1 settings max_threads = 1, use_skip_indexes_on_data_read = 1, use_query_condition_cache = 0, send_logs_level = 'test';

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-06-09T08:00:30Z

Workflow [PR], commit [d5b5044]

clickhouse-gh · 2025-06-22T02:24:11Z

Workflow [PR], commit [3d3791c]

Summary: ✅

…-reading

amosbird · 2025-07-19T09:47:57Z

I've examined all 31 test failures and found them all incompatible with the use_skip_indexes_on_data_read setting due to one of the following reasons:

The number of rows to read is limited by max_rows_to_read.
The index usage is ineffective, as EXPLAIN INDEX shows additional granules being read.
query_plan_join_swap_table cannot swap tables because the plan stage does not utilize indexes.
force_data_skipping_indices fails since the plan stage omits index usage.

I've disabled use_skip_indexes_on_data_read for them explicitly.

CurtizJ

In general the idea is good.

CurtizJ · 2025-08-25T12:52:49Z

src/Storages/MergeTree/MergeTreeSelectProcessor.cpp

+    MergeTreeIndexReadResultPtr index_read_result;
+    if (merge_tree_index_build_context)
+    {
+        const auto & part_ranges = merge_tree_index_build_context->read_ranges.at(task->getInfo().part_index_in_query);
+        auto & remaining_marks = merge_tree_index_build_context->part_remaining_marks.at(task->getInfo().part_index_in_query).value;
+        index_read_result = merge_tree_index_build_context->index_reader->getOrBuildIndexReadResult(part_ranges);
+
+        /// Atomically subtract the number of marks this task will read from the total remaining marks. If the
+        /// remaining marks after subtraction reach zero, this is the last task for the part, and we can trigger
+        /// cleanup of any per-part cached resources (e.g., skip index read result).
+        size_t task_marks = task->getNumMarksToRead();
+        bool part_last_task = remaining_marks.fetch_sub(task_marks, std::memory_order_acq_rel) == task_marks;
+        if (part_last_task)
+            merge_tree_index_build_context->index_reader->clear(task->getInfo().data_part);
+    }


Now you read the index for all ranges of the part here. Is it possible to read only the ranges from the current task?

Also, now this line is incorrect because of this:

bool part_last_task = remaining_marks.fetch_sub(task_marks, std::memory_order_acq_rel) == task_marks;

Is it possible to read only the ranges from the current task?

This isn't really feasible. First, an index may span multiple granules, which makes alignment difficult. There can also be multiple indexes with different granule boundaries, which adds even more complexity. In addition, reading at the granule level is not IO-friendly, e.g, with minmax indexes the IO granularity becomes too small, which hurts performance. It could also introduce a lot of unnecessary random IO.

Also, now this line is incorrect because of this:

I don't quite understand what you mean by "incorrect." The purpose of this flag is simply to allow the last reader of this part to clean up the shared index structure that was built.

First, an index may span multiple granules, which makes alignment difficult. There can also be multiple indexes with different granule boundaries, which adds even more complexity

Yes, that makes sense.

It could also introduce a lot of unnecessary random IO.

I thought that since selected ranges are not contiguous, they would require a random read anyway. But indeed, for minmax and some other indexes, almost all index data may fit into the read buffer and won't cause a random read even for non-contiguous ranges.

I don't quite understand what you mean by "incorrect."

Sorry, I misunderstood one thing.

CurtizJ · 2025-08-25T12:56:49Z

src/Storages/MergeTree/MergeTreeReaderIndex.h

+        size_t offset,
+        Columns & res_columns) override;
+
+    bool canReadIncompleteGranules() const override { return main_reader->canReadIncompleteGranules(); }


Is it important for the index reader? I'd expect that canReadIncompleteGranules returns false because this method is not applicable to the index reader. Also returning false will allow to remove dependency from the main_reader.

It is important, because this behavior only takes effect in the main reader. When the index reader is used, it effectively becomes the main reader, so the flag still matters, especially for skewed wide parts.

Ok, let's keep it.

CurtizJ · 2025-08-26T16:05:01Z

It looks like the failure in Fast Test is related. Please fix, and I'll merge the PR.

EmeraldShift · 2025-08-27T13:03:29Z

Thank you for this feature, I'm excited to see it merged! Is there any chance it can be included in 25.8?

alexey-milovidov · 2025-08-29T06:32:14Z

+1, thanks for the feature and the review, my dreams come true!

@EmeraldShift, it is just a few days late for 25.8, so it will be in 25.9.

rschu1ze · 2025-10-13T14:54:38Z

Issue for broken EXPLAIN indexes = 1: #88467

shankar-iyer · 2026-01-01T04:51:50Z

@amosbird @CurtizJ @nickitat Greetings! The previous attempt to default enable use_skip_indexes_on_data_read was reverted : #88638. That attempt led to identification and fixes listed here : #88504 (comment) . Right now, we have 2 issues pending to default enable this setting -

Cardinality estimation in join order planning - I am working on that
How do we handle parallel replicas? Reference : link. Based on current implementation, use_skip_indexes_on_data_read=1 will read entire index granules of skip index on each replica where a part's sub-ranges are processed (above comment Filtering by skip indexes on data reading #81526 (comment)). As explained in the comment, this aspect is not a overhead for minmax indexes as they are very small. Vector index does not need use_skip_indexes_on_data_read=1 because static index analysis is best for vector index. I understand that text index has it's own equivalent of index reading at runtime. That leaves set index and bloom filter index. Please let me know your opinion! Is it okay to read/cache the entire index on multiple replicas even if they are processing partial ranges.

There is another feature (use_skip_indexes_for_top_k : #89835) which relies on use_skip_indexes_on_data_read and hence I need to default enable both settings planned for 26.1. That feature uses only minmax index.

alexey-milovidov · 2026-01-01T05:17:59Z

Is it okay to read/cache the entire index on multiple replicas even if they are processing partial ranges.

Yes, it is ok.

clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jun 9, 2025

amosbird force-pushed the apply-skip-index-on-reading branch 3 times, most recently from 60b4e9f to 77abd63 Compare June 10, 2025 05:03

devcrafter self-assigned this Jun 11, 2025

amosbird force-pushed the apply-skip-index-on-reading branch from 020c7b8 to dcf2543 Compare June 12, 2025 03:51

amosbird mentioned this pull request Jun 21, 2025

Add _part_granule_offset column #82341

Merged

1 task

alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 22, 2025

amosbird force-pushed the apply-skip-index-on-reading branch from 62dd6db to 1aabfed Compare June 26, 2025 01:53

amosbird added 6 commits July 16, 2025 00:16

Filtering by skip indexes on data reading

d603041

Fix flaky test

db994e7

Fix flaky test

4ce2346

Refactor for upcoming projection index support

6b7c7e8

Update some comments and remove some if checks

3170bf8

Next release

c5936d7

amosbird force-pushed the apply-skip-index-on-reading branch from b23ad59 to c5936d7 Compare July 15, 2025 16:39

devcrafter and others added 8 commits July 16, 2025 13:54

Merge remote-tracking branch 'origin/master' into apply-skip-index-on…

cca03b8

…-reading

Merge branch 'master' into apply-skip-index-on-reading

226aecc

Update SettingsChangesHistory.cpp

a86521f

Merge branch 'master' into apply-skip-index-on-reading

0f365c8

Merge remote-tracking branch 'origin/master' into apply-skip-index-on…

288112c

…-reading

Let's enable the feature, so tests will run with it

3bab89c

Disable use_skip_indexes_on_data_read for incompactible tests

35a7f20

Merge branch 'master' into apply-skip-index-on-reading

81a8249

amosbird added 3 commits July 19, 2025 17:56

Fix build

0162f3d

Merge branch 'master' into apply-skip-index-on-reading

9baf0e9

Fix more tests

5d80fb6

amosbird and others added 4 commits August 4, 2025 18:02

Merge branch 'master' into apply-skip-index-on-reading

089b3e6

Fix github conflict resolve style issue

41d8d5e

Fix more tests

822ef91

Merge branch 'master' into apply-skip-index-on-reading

e97625e

rienath self-assigned this Aug 14, 2025

CurtizJ mentioned this pull request Aug 18, 2025

Put the result of skip index analysis into query condition cache #85779

Closed

CurtizJ self-assigned this Aug 19, 2025

CurtizJ reviewed Aug 25, 2025

View reviewed changes

amosbird added 2 commits August 26, 2025 17:05

Merge branch 'master' into apply-skip-index-on-reading

70c279b

Fix more tests

a3b5be3

amosbird added 2 commits August 27, 2025 01:48

Merge branch 'master' into apply-skip-index-on-reading

4200a22

Fix tests

3d3791c

CurtizJ approved these changes Aug 27, 2025

View reviewed changes

CurtizJ added this pull request to the merge queue Aug 27, 2025

Merged via the queue into ClickHouse:master with commit d530e1c Aug 27, 2025
122 checks passed

robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 27, 2025

amosbird mentioned this pull request Aug 27, 2025

Disable use_skip_indexes_on_data_read by default #86273

Merged

1 task

davenger mentioned this pull request Aug 29, 2025

Rewrite IN (subquery) so that it can be executed as JOIN instead of CreatingSets #83991

Merged

1 task

CurtizJ mentioned this pull request Sep 11, 2025

[RFC] Disk friendly format for inverted text index #87003

Closed

frankh mentioned this pull request Sep 11, 2025

s3_plain_rewritable disk making unnecessary HEAD requests #86990

Closed

rschu1ze mentioned this pull request Sep 16, 2025

Skip Indexes are not used when the predicate has OR on two different columns #75228

Closed

rschu1ze mentioned this pull request Oct 13, 2025

Fix EXPLAIN PLAN indexes = 1 in ClickHouse >= 25.9 #88467

Closed

rschu1ze mentioned this pull request Oct 16, 2025

Revert "Enable skip index usage during read by default" #88638

Merged

Conversation

amosbird commented Jun 9, 2025

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clickhouse-gh bot commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amosbird commented Jul 19, 2025

Uh oh!

CurtizJ left a comment

Choose a reason for hiding this comment

Uh oh!

CurtizJ Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amosbird Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CurtizJ Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CurtizJ Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

amosbird Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CurtizJ Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

CurtizJ commented Aug 26, 2025

Uh oh!

Uh oh!

EmeraldShift commented Aug 27, 2025

Uh oh!

alexey-milovidov commented Aug 29, 2025

Uh oh!

rschu1ze commented Oct 13, 2025

Uh oh!

shankar-iyer commented Jan 1, 2026

Uh oh!

alexey-milovidov commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

clickhouse-gh bot commented Jun 9, 2025 •

edited

Loading

clickhouse-gh bot commented Jun 22, 2025 •

edited

Loading

CurtizJ Aug 25, 2025 •

edited

Loading