add mechanism to control filter optimization in historical query processing#8209
add mechanism to control filter optimization in historical query processing#8209clintropolis merged 26 commits intoapache:masterfrom
Conversation
👍
I haven't read through the patch yet, so if it's not already, could you make sure this is in the javadoc for the major classes involved (probably just FilterTuning)? Something like: intentionally undocumented in user-facing docs because at this time the feature is meant for experimentation. And a comment that we'd like to document this (or whatever it turns into) in a future release. |
| @JsonProperty("bloomKFilter") BloomKFilterHolder bloomKFilterHolder, | ||
| @JsonProperty("extractionFn") ExtractionFn extractionFn | ||
| @JsonProperty("extractionFn") ExtractionFn extractionFn, | ||
| @JsonProperty("filterTuning") FilterTuning filterTuning |
There was a problem hiding this comment.
Are you missing a @JsonProperty getter for this?
There was a problem hiding this comment.
added (also a couple of other missing this)
| ); | ||
|
|
||
| @Override | ||
| default FilterTuning getManualTuning() |
There was a problem hiding this comment.
Please annotate with @Nullable.
| return new FilterTuning(useIndex, 0, Integer.MAX_VALUE); | ||
| } | ||
|
|
||
| private final Boolean useIndex; |
There was a problem hiding this comment.
Please mark these @Nullable too, just like the constructor parameters.
Or, apply the defaults you have in getUseIndex(), etc, in the constructors and make these fields primitives.
There was a problem hiding this comment.
applied defaults in constructor
| return ordering; | ||
| } | ||
|
|
||
| @JsonProperty |
There was a problem hiding this comment.
Please include @JsonInclude(JsonInclude.Include.NON_NULL) here, and in all other places FilterTuning appears as a json property, to avoid polluting jsonifications of filters with this experimental feature.
There was a problem hiding this comment.
added to DimFilter getters
|
|
||
| /** | ||
| * "Manual" {@link FilterTuning} to allow explicit control of filter optimization behavior. This will likely be | ||
| * supplied by the {@link DimFilter} that is creating this {@link Filter} |
There was a problem hiding this comment.
Is it intended that anything outside of computeTuning call this method? If not, please update the javadocs to tell people not to call it. Maybe mention that you would have made it protected, except that Filter is an interface, so you can't do that.
There was a problem hiding this comment.
this method has been removed
| FilterTuning getManualTuning(); | ||
|
|
||
| /** | ||
| * This method allows a filter to automatically compute a {@link FilterTuning} based on information it can gather |
There was a problem hiding this comment.
Similar comment to computeTuning, it seems like this might only be called by shouldUseIndex.
There was a problem hiding this comment.
Btw, what do you think about not having getManualTuning and computeTuning, and instead having a static function in Filters that provides a useful default impl for shouldUseIndex? So most Filters would do something like:
@Override
boolean shouldUseIndex(BitmapIndexSelector selector)
{
return Filters.shouldUseIndex(this, selector, filterTuning);
}It's not really more work or lines of code than overriding getManualTuning, which they'd have to do anyway with the current design. And it cuts 3 interface methods down to 1.
It would also be less spooky in terms of action-at-a-distance. It's nice to have methods defined in concrete classes rather than inherited from supertypes, since it makes it clearer to readers of the one class what is going on (without having to flip around as much between concrete impl and supertype).
There was a problem hiding this comment.
This makes sense, I think the current structure is a an artifact of my original version of this where the work was done by QueryableIndexStorageAdapter, will refactor
| @JsonProperty | ||
| public Boolean getUseIndex() | ||
| { | ||
| return useIndex != null ? useIndex : true; |
There was a problem hiding this comment.
This'll never be null, so it'd be better to return a primitive.
Same comment for the other two methods.
| } | ||
|
|
||
| @VisibleForTesting | ||
| public IntervalDimFilter( |
There was a problem hiding this comment.
In general I assume you left the old constructors in to avoid a bunch of test churn. I think that's fine but could you please double check that they aren't being used in production code? (You got the one in InDimFilter.optimize that I was thinking of when I wrote this, but there are potentially others)
There was a problem hiding this comment.
there were a few being created in calcite stuff, adjusted them to set null for tuning.
| private final BoundDimFilter boundDimFilter; | ||
| private final Comparator<String> comparator; | ||
| private final ExtractionFn extractionFn; | ||
| private final FilterTuning manualFilterTuning; |
There was a problem hiding this comment.
Sometimes the field is called manualFilterTuning and sometimes it's called filterTuning. Could you please pick one?
There was a problem hiding this comment.
I believe it is consistently called filterTuning on DimFilter and manualFilterTuning on Filter implementations. This distinction is because everything in DimFilter is manual since it was provided by the json query, where as this is explictly the user set value on Filter implementations that was supplied by the DimFilter.
There was a problem hiding this comment.
renamed all to filterTuning
| @Override | ||
| public Set<String> getRequiredColumns() | ||
| { | ||
| return dimensions.stream().map(dim -> dim.getDimension()).collect(Collectors.toSet()); |
There was a problem hiding this comment.
map(DimensionSpec::getDimension)?
|
Oh wait, sorry, my browser had an old version cached. Nevermind. |
leventov
left a comment
There was a problem hiding this comment.
Reviewed until ExpressionDimFilter.java
| private final BloomKFilter bloomKFilter; | ||
| private final HashCode hash; | ||
| private final ExtractionFn extractionFn; | ||
| private final FilterTuning filterTuning; |
There was a problem hiding this comment.
Should be @Nullable, along with the corresponding constructor parameter
There was a problem hiding this comment.
I think I've updated all DimFilter implementations to have the appropriate things annotated with @Nullable
|
|
||
| @JsonInclude(JsonInclude.Include.NON_NULL) | ||
| @JsonProperty | ||
| public FilterTuning getFilterTuning() |
There was a problem hiding this comment.
This method should be @Nullable
| BloomDimFilter that = (BloomDimFilter) o; | ||
| return dimension.equals(that.dimension) && | ||
| hash.equals(that.hash) && | ||
| Objects.equals(extractionFn, that.extractionFn) && |
There was a problem hiding this comment.
Everything about extractionFn: the field, constructor parameter, getter should be @Nullable
| druidExpression.getSimpleExtraction().getColumn(), | ||
| holder, | ||
| druidExpression.getSimpleExtraction().getExtractionFn() | ||
| druidExpression.getSimpleExtraction().getExtractionFn(), |
There was a problem hiding this comment.
Constructor annotated @VisibleForTesting doesn't mean it can't be used in production code.
There was a problem hiding this comment.
Yeah, this is true, but I was trying to avoid using these constructors in production code to minimize the chance of errors and make things more obvious.
I initially had it more like you are suggesting and was using some of the 'test' constructors where we were always passing null, but this comment suggested the way it is now and I agree, I think the production code explicitly passing in null for the filterTuning parameter makes it less ambiguous about if not having a tuning is intentional or not, without requiring a comment.
There was a problem hiding this comment.
I think it's anyway strange that there is a chaining constructor and any code is not using an existing constructor with fewer parameters.
I think it would be better to create static factory methods in this situation. Some of them may have "InTest" suffix to strongly indicate that they are not for prod code.
There was a problem hiding this comment.
I agree, but the reason I have a second set of constructors at all was so I didn't have to change hundreds of lines of test code to add a null parameter, so making a static method would sort of defeat the purpose of that because the lines would change anyway.
I do think this is worth doing, I'll add this to the ticket i create about refactoring DimFilter to also add useful test filter creation static methods, so we can also eliminate passing in null everywhere for filters without extractionFn.
There was a problem hiding this comment.
mentioned making static test object creators to decouple test filters from json schema in #8256
| return new BloomDimFilter( | ||
| virtualColumn.getOutputName(), | ||
| holder, | ||
| null, |
| private static final Joiner COMMA_JOINER = Joiner.on(", "); | ||
|
|
||
| private final List<DimensionSpec> dimensions; | ||
| private final FilterTuning filterTuning; |
There was a problem hiding this comment.
Please annotate with @Nullable everything that is needed in this class.
| public TimeseriesQueryBuilder filters(String dimensionName, String value, String... values) | ||
| { | ||
| dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values), null); | ||
| dimFilter = new InDimFilter(dimensionName, Lists.asList(value, values), null, null); |
| public SearchQueryBuilder filters(String dimensionName, String value) | ||
| { | ||
| dimFilter = new SelectorDimFilter(dimensionName, value, null); | ||
| dimFilter = new SelectorDimFilter(dimensionName, value, null, null); |
|
|
||
| @VisibleForTesting | ||
| public BloomDimFilter( | ||
| String dimension, |
There was a problem hiding this comment.
Unnecessary wrapping, also in many other newly added constructors below in this PR
| { | ||
| private final String expression; | ||
| private final Supplier<Expr> parsed; | ||
| private final FilterTuning filterTuning; |
There was a problem hiding this comment.
Please annotate with @Nullable everything that is needed in this class.
gianm
left a comment
There was a problem hiding this comment.
Just some small comments about some of the new javadocs. The rest LGTM.
| } else { | ||
| return StringUtils.format("%s = %s", dimension, hash.toString()); | ||
| } | ||
| return new DimFilterToStringBuilder().appendDimension(dimension, extractionFn) |
| * used. This method, by default, will consider a {@link FilterTuning} to make decisions about when to use an | ||
| * available index. Override this method in a {@link Filter} implementation when {@link FilterTuning} alone is not | ||
| * adequate for making this decision. | ||
| */ |
There was a problem hiding this comment.
I'd reword slightly. This method doesn't do anything by default anymore (it did in an earlier version). Maybe cut the last two sentences, and replace with:
Implementations of this methods typically consider a {@link FilterTuning} to make decisions about when to use an available index.
There was a problem hiding this comment.
Oops, forgot about this, will fix this and add javadoc for Filters.shouldUseIndex and maybe link this one to that
There was a problem hiding this comment.
Thanks, these new ones look good to me.
| } | ||
| } | ||
|
|
||
| public static boolean shouldUseIndex(Filter filter, BitmapIndexSelector indexSelector, FilterTuning filterTuning) |
There was a problem hiding this comment.
Could use a bit of javadocs as to its purpose ("standard" implementation of Filter.shouldUseIndex) and how it works (returns true if the filter supports indexes and if all underlying columns match the thresholds in filterTuning). Also that filterTuning is nullable.
There was a problem hiding this comment.
Thanks, this looks good. Marking the filterTuning parameter @Nullable would be helpful too.
| } | ||
| } | ||
|
|
||
| public static boolean shouldUseIndex(Filter filter, BitmapIndexSelector indexSelector, FilterTuning filterTuning) |
There was a problem hiding this comment.
Thanks, this looks good. Marking the filterTuning parameter @Nullable would be helpful too.
| /** | ||
| * Append dimension name OR {@link ExtractionFn#toString()} with dimension wrapped in parenthesis | ||
| */ | ||
| DimFilterToStringBuilder appendDimension(String dimension, ExtractionFn extractionFn) |
There was a problem hiding this comment.
extractionFn should be @Nullable
| /** | ||
| * Add filter tuning to {@link #builder} if tuning exists | ||
| */ | ||
| DimFilterToStringBuilder appendFilterTuning(FilterTuning tuning) |
| ); | ||
| } | ||
|
|
||
| @Override |
| @JsonProperty("dimension") String dimension, | ||
| @JsonProperty("values") Collection<String> values, | ||
| @JsonProperty("extractionFn") ExtractionFn extractionFn | ||
| @Nullable @JsonProperty("extractionFn") ExtractionFn extractionFn, |
There was a problem hiding this comment.
Optional: I think it's more semantic to put @Nullable immediately before the type so that @Nullable ExtractionFn is the "extended type" of the variable.
There was a problem hiding this comment.
agree, done to all occurrences I found in the codebase
| * Determine if a filter *should* use a bitmap index based on information collected from the supplied | ||
| * {@link BitmapIndexSelector}. This method differs from {@link #supportsBitmapIndex(BitmapIndexSelector)} in that | ||
| * the former only indicates if a bitmap index is available and {@link #getBitmapIndex(BitmapIndexSelector)} may be | ||
| * used. Implementations of this methods typically consider a {@link FilterTuning} to make decisions about when to |
There was a problem hiding this comment.
Before the sentence "Implementations of this methods...", please add
*
* If shouldUseFilter(selector) returns true, {@link #supportsBitmapIndex} must also return true when called with the
* same selector object.
*
as a separate paragraph.
| private final String dimension; | ||
| private final Bound bound; | ||
| @Nullable | ||
| private final FilterTuning filterTuning; |
There was a problem hiding this comment.
There is significant repetition between all these Filter implementations. Did you research if there is a way to cross-cut the things, e. g. by somehow providing filterTuning externally, so that each Filter doesn't have to have it as a field? Or, maybe an abstract superclass could be extracted?
Same applies to extractionFn.
There was a problem hiding this comment.
I totally agree here. Doing it externally seemed difficult because I wanted it to be per filter, but I did heavily consider making some sort of abstract class on top of DimFilter to handle extractionFn and filterTuning, but I think at this point would prefer to do as a follow-up so I will open an issue soon about refactoring DimFilter, and potentially merging Filter and DimFilter as well since there isn't a good reason for them to be split.
| ); | ||
| } else { | ||
| if (postFilter.supportsBitmapIndex(bitmapIndexSelector)) { | ||
| if (postFilter.shouldUseIndex(bitmapIndexSelector)) { |
There was a problem hiding this comment.
Please use consistent naming: either "supportsBitmapIndex" and "shouldUseBitmapIndex", or both names without "Bitmap" part.
There was a problem hiding this comment.
renamed to shouldUseBitmapIndex
| } | ||
|
|
||
| @Override | ||
| public boolean shouldUseIndex(BitmapIndexSelector bitmapIndexSelector) |
There was a problem hiding this comment.
Since shouldUseIndex related to supportsBitmapIndex and their implementations look similar, please keep them adjacent in code. It makes easier to visually validate that there are no bugs like forgetting !.
Same applies to all other classes where shouldUseIndex is implemented, and to the definition of this method in Filter interface itself.
| @Override | ||
| public Set<String> getRequiredColumns() | ||
| { | ||
| return null; |
| @Override | ||
| public Set<String> getRequiredColumns() | ||
| { | ||
| return null; |
| boundRefKey.getExtractionFn(), | ||
| boundRefKey.getComparator() | ||
| boundRefKey.getComparator(), | ||
| null |
| } | ||
|
|
||
| private final boolean useIndex; | ||
| private final int useIndexMinCardinalityThreshold; |
There was a problem hiding this comment.
I don't think it's that important here to use the same prefix for all three fields. It's not very clear what does "useIndexMinCardinalityThreshold" mean. My suggestion is "minBitmapIndexCardinalityToUse".
There was a problem hiding this comment.
changed to minCardinalityToUseBitmapIndex
| return new FilterTuning(filter.supportsBitmapIndex(selector), null, null); | ||
| } | ||
|
|
||
| private final boolean useIndex; |
There was a problem hiding this comment.
If you decide to rename Filter.shouldUseIndex() to shouldUseBitmapIndex then this field should be renamed, too, for total consistency.
| * Indicates whether this filter can return a bitmap index for filtering, based on | ||
| * the information provided by the input BitmapIndexSelector. | ||
| * Indicates whether this filter can return a bitmap index for filtering, based on the information provided by the | ||
| * input BitmapIndexSelector. |
There was a problem hiding this comment.
Please wrap BitmapIndexSelector with {@link }
| public boolean shouldUseBitmapIndex(BitmapIndexSelector selector) | ||
| { | ||
| return Filters.shouldUseIndex(this, bitmapIndexSelector, filterTuning); | ||
| return Filters.shouldUseBitmapIndex(this, selector, filterTuning); |
There was a problem hiding this comment.
It's confusing that this method doesn't just return false while supportsBitmapIndex() does.
| return cardinality >= tuning.getUseIndexMinCardinalityThreshold() | ||
| && cardinality <= tuning.getUseIndexMaxCardinalityThreshold(); | ||
| final BitmapIndex index = indexSelector.getBitmapIndex(column); | ||
| Preconditions.checkNotNull(index, "WTF?! column doesn't have a bitmap index"); |
There was a problem hiding this comment.
I suggest not to use "WTF" from now on in the project. See also #8269.
Description
This PR introduces the ability for most
DimFilterimplementations to control whether or notQueryableIndexStorageAdapterwill use a bitmap index while processing queries. I consider this to be laying the groundwork for allowingQueryableIndexStorageAdapterthe opportunity for more sophisticated decision making when deciding when to use bitmap indexes or not, but the first step of this is to enable easy experimentation so that we can shake out what good "automatic" decisions might be.This PR is related to #3878, and replaces #6633, which was a more targeted attempt at something similar.
This is done by adding a new type,
FilterTuning, which mostDimFilterimplementations will accept if specified in the filter json, which is supplied to allFilterimplementations. TheFilterinterface itself has been modified with a few new methods:getRequiredColumnswhich is analog toDimFilter.getRequiredColumnsshouldUseIndexwhichQueryableIndexStorageAdapterandFilteringOffsetcall now instead ofsupportsBitmapIndexdirectly, to determine if a bitmap index should be used.And a static default implementation of
supportsBitmapIndexsupplied byFiltersthat currently allFilterimplementations are using.FilterTuningcurrently has 3 properties:useIndexmaxCardinalityToUseBitmapIndexInteger.MAX_VALUEminCardinalityToUseBitmapIndexSince this is an advanced feature intended primarily for experimentation, I am leaving it undocumented for now.
This PR has:
For reviewers: the key changed/added classes in this PR
FilterQueryableIndexStorageAdapterFilterTuningFilterPartitionTest