Skip to content

KAFKA-4850:RocksDb cannot use Bloom Filters#3048

Closed
bharatviswa504 wants to merge 3 commits into
apache:trunkfrom
bharatviswa504:KAFKA-4850
Closed

KAFKA-4850:RocksDb cannot use Bloom Filters#3048
bharatviswa504 wants to merge 3 commits into
apache:trunkfrom
bharatviswa504:KAFKA-4850

Conversation

@bharatviswa504

Copy link
Copy Markdown

Added BloomFilter to speedup rocksdb lookup.

@bharatviswa504

bharatviswa504 commented May 14, 2017

Copy link
Copy Markdown
Author

@enothereska @mjsax @guozhangwang Could you please review.

@asfbot

asfbot commented May 14, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3882/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot

asfbot commented May 14, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3871/
Test FAILed (JDK 8 and Scala 2.12).

@bharatviswa504

bharatviswa504 commented May 14, 2017

Copy link
Copy Markdown
Author

retest this please

@asfbot

asfbot commented May 14, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3884/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot

asfbot commented May 14, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3873/
Test FAILed (JDK 8 and Scala 2.12).

@bharatviswa504

bharatviswa504 commented May 14, 2017

Copy link
Copy Markdown
Author

tests passed locally.

@asfbot

asfbot commented May 14, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3895/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot

asfbot commented May 14, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3884/
Test PASSed (JDK 8 and Scala 2.12).

@bharatviswa504

Copy link
Copy Markdown
Author

The test failure on JDK 7 is happening on other PR's too.
Test failure is not related to this change.

@bharatviswa504

Copy link
Copy Markdown
Author

retest this please

@asfbot

asfbot commented May 15, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3957/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot

asfbot commented May 15, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3946/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot

asfbot commented May 15, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/3963/
Test PASSed (JDK 7 and Scala 2.11).

@asfbot

asfbot commented May 15, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3952/
Test PASSed (JDK 8 and Scala 2.12).

@enothereska

Copy link
Copy Markdown
Contributor

Thank you. I think it looks good. I've started some system tests to see what impact this has on performance, will post link here once done. @bharatviswa504 do you know what the memory overhead might be when turning on bloom filters?

final BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE);
tableConfig.setBlockSize(BLOCK_SIZE);
tableConfig.setFilter(new BloomFilter(10));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the 10 mean? Could you extract it to a constant so it is named?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dguy It is bits per key. Added a constant to name it.

@enothereska

Copy link
Copy Markdown
Contributor

system test passed: https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/291/console. I don't necessarily see a perf improvement, but I'm not sure the tests are designed for showing off any improvement. @bharatviswa504 any suggestions on a good test/benchmark?

@asfbot

asfbot commented May 16, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4014/
Test FAILed (JDK 7 and Scala 2.11).

@asfbot

asfbot commented May 16, 2017

Copy link
Copy Markdown

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4000/
Test PASSed (JDK 8 and Scala 2.12).

@bharatviswa504

bharatviswa504 commented May 16, 2017

Copy link
Copy Markdown
Author

@enothereska
https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#memory-usage

Do you think increasing block size will improve performance?

I don't have much details with rocksdb, let me know if I am missing something.

@dguy dguy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - though would be good to understand how this impacts memory usage, performance etc

@guozhangwang

Copy link
Copy Markdown
Contributor

My understanding is that bloom filters is only beneficial for single key-value lookups, so the current system tests may not be best fit for analyzing its impacts. More specifically, for windowed rocksdb store since we always store the window start timestamp as prefix of the key, which is 64bits, hence first 10bits bloom filter would not help much on filtering those keys; for such cases it is better to have the prefix seeking support as discussed here:

https://confluentinc.atlassian.net/wiki/display/Engineering/Discussion%3A+Non-key+KTable-KTable+Joins#Discussion:Non-keyKTable-KTableJoins-Option1:useRocksDBprefixseeking

The code itself LGTM, but I think it is better to execute a benchmark with the following settings before merging the PR:

  1. have a running aggregate in Streams with RocksDBStore (i.e. not windowed store).
  2. have another thread continuously query the rocksDB store on single key lookup.
  3. compare the query QPS with / wo this PR; expect to have non-neglectible perf diffs.

@bharatviswa504 Do you want to do this benchmark?

@mjsax mjsax added the streams label Jan 30, 2018
@mjsax mjsax mentioned this pull request Jan 14, 2019
3 tasks
@mjsax

mjsax commented Jan 15, 2019

Copy link
Copy Markdown
Member

Replaced by #6012

@mjsax mjsax closed this Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants