Skip to content

make bitmap index usage optional for bloom filters#6633

Closed
clintropolis wants to merge 3 commits intoapache:masterfrom
clintropolis:bloom-filter-skip-index
Closed

make bitmap index usage optional for bloom filters#6633
clintropolis wants to merge 3 commits intoapache:masterfrom
clintropolis:bloom-filter-skip-index

Conversation

@clintropolis
Copy link
Member

Testing with large filters against higher cardinality dimensions yields better performance if we skip using bitmap indexes during filtering in my experiments thus far. Making the default value for the newly introduced useBitmapIndex parameter to false is making the assumption that these sorts of filters will be more useful on such higher cardinality dimensions, but I can default the value to true to preserve existing behavior if preferred, and I think optimally this value would probably be adaptive based on cardinality.

Metrics collected by continuously running identical bloom filter queries differing only in the value for useBitmapIndex for a period of time.

query/cpu/time
screen shot 2018-11-15 at 11 46 36 pm

query/time
screen shot 2018-11-16 at 12 46 27 am

query/segment/time
screen shot 2018-11-15 at 11 46 59 pm

In my tests, the effect is more dramatic when the bloom filter is combined with additional filters as part of an and filter, I will try to produce metrics to show this as well when I have the chance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to add this blank line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if you could add the @Nullable annotation to this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these spaces.

@leventov
Copy link
Member

@clintropolis is this #3878?

@nishantmonu51
Copy link
Member

I think figuring whether to use a bitmap index or not, would be bit too much for the clients sending the queries for making an optimal decision on.

Instead of this, how about adding a config for cardinality threshold in bloom filter for using the bitmap index and we only use bitmaps for columns with cardinality below the configured threshold.

Additionally, for more flexibility, we can make this threshold overridable via query context.

@leventov
Copy link
Member

leventov commented Nov 16, 2018

I agree that there should be a threshold rather than flag. However global config doesn't make a lot of sense. There should be a heuristic (in the future the heuristic could be aided with constantly sized info written in the segment format), overridable through a query context.

@fjy fjy added this to the 0.13.1 milestone Nov 19, 2018
@clintropolis
Copy link
Member Author

@clintropolis is this #3878?

I think it's trying to achieve the same goal, though this PR is very primitive and manually controlled.

I think figuring whether to use a bitmap index or not, would be bit too much for the clients sending the queries for making an optimal decision on.

Yeah, I made the parameter optional and defaulting to skipping indexes to err on the side of likely better performance, but yeah I think you're probably right that it's not useful unless the user is deeply familiar with both the data and the implications of using them or not during query processing.

I was planning on following this up with investigation on what a good threshold would be to automatically set this value, so that way at least a way to control this behavior would be out sooner than later and since I think adaptively using bitmaps or not based on cardinality is probably useful for a lot of filters beyond the bloom filter, but I think I will go ahead and begin to look into it and think about a general solution for #3878 as soon as I have a chance based on feedback so far.

@stale
Copy link

stale bot commented Apr 2, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Apr 2, 2019
@clintropolis clintropolis removed the stale label Apr 2, 2019
@stale
Copy link

stale bot commented Jun 1, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 1, 2019
@clintropolis clintropolis removed the stale label Jun 1, 2019
@stale
Copy link

stale bot commented Jul 31, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants