Skip to content

Filters on high cardinality dimensions should sometimes use dim index bitset + full scan instead of unioning bitsets of dim values #3878

@leventov

Description

@leventov

Sometimes Filters/DimFilters (Like, Regex, Bound, etc.) on dimensions of very high cardinality (thousands of values) end up unioning bitsets of significant fraction of individual dimension values, and the resulting bitset is 50% or more full.

So instead of making a union of 100k bitsets when filtering on a 200k-cardinality dimension, we better make a bitset of matching dimension value indexes (like in DimensionSelectorUtils.makeDictionaryEncodedValueMatcherGeneric()), scan all rows in the segment and apply a simple check "matchingDimValuesBitset.get(index)" on each row.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions