Support bloom_filter usage for "has" on const array#83945
Support bloom_filter usage for "has" on const array#83945shankar-iyer merged 2 commits intoClickHouse:masterfrom
Conversation
|
hope I haven’t bitten off more than I can chew with this PR 😅. From what I understand, has(const, column) behaves somewhat similarly to hasAny, in the sense that it requires creating a column on the fly. I was unsure whether I should extract a common static method for creating that column, as the logic feels duplicated. Also, I’m not fully certain whether the RPNElement should be set to FUNCTION_IN or FUNCTION_HAS_ANY, since both seem to be handled the same way later in mayBeTrueOnGranule. I’d really appreciate any guidance or corrections on this. |
|
I am reviewing the PR. For avoiding the test failures due to variations in the plan output, can you please only record the lines |
|
The 2 failures ( |
dd4bf83
|
Reverted in #84142 |
Changelog category (leave one):
Changelog entry
The bloom filter index is now used for conditions like
has([c1, c2, ...], column), wherecolumnis not of anArraytype.This improves performance for such queries, making them as efficient as the
INoperator.Motivation
This change extends the power of bloom filter indexes to a common query pattern.
Previously, to use a bloom filter on a scalar column, users had to write
column IN (c1, c2).Now, they can also use
has([c1, c2], column)syntax and receive the same performance benefit, allowing the query to skip data granules that don't contain the relevant values.Example use
Given a table with a bloom filter index on a non-Array column:
The following query will now efficiently use the bf_idx index to filter granules, whereas previously it would have resulted in a full table scan: