-
Notifications
You must be signed in to change notification settings - Fork 8.3k
[RFC] Use global LowCardinality dictionary for optimizations if it is small enough #72717
Copy link
Copy link
Open
Labels
Description
Use case
Optimization of aggregation and JOINs over LowCardinality columns that have low number of unique values. In these cases LowCardinality column usually can be replaced with Enum but it is less convinient since it requires to change schema every time when set of possible values changes.
Describe the solution you'd like
- Build global dictionary for
LowCardinalitycolumns which are suitable for optimization (are inGROUP BYkey or inONsection ofJOIN) up to a certain size (refuse optimization if the dictionary becomes large). It will require reading dictionaries on a new stage of query execution: after filtering parts by primary key and before pipeline execution is started. Dictionaries can be pushed down and reused. Also global dictionary can be cached inMergeTreeData. - Pushdown the global dictionary to
LowCardinalityserializations in data parts. Encode positions ofLowCardinalitycolumns with new dictionary and set shared dictionary to them. - Use positions in dictionary as keys for hash table in aggregation or in JOIN. It will allow to choose more optimal hash method:
- method with single numeric key (often
UInt8which has its own optimization of aggregation) instead of specializedLowCardinalitymethod in case of one column - method with fixed numeric keys in case of aggregation by
LowCardinalityand numeric columns
- method with single numeric key (often
Reactions are currently unavailable