Conversation
PR facebook#6944 transitioned `BlockIter` from using `Comparator*` to using concrete `UserComparatorWrapper` and `InternalKeyComparator`. However, adding them as instance variables to `BlockIter` was not optimal. Bloating `BlockIter` caused the `ArenaWrappedDBIter`'s arena allocator to do more heap allocations which harmed performance of `DB::NewIterator()`. This PR pushes down the concrete comparator objects to the point of usage, which forces them to be on the stack. As a result, the `BlockIter` is back to its original size prior to facebook#6944 (actually a bit smaller since there were two `Comparator*` before). Test Plan: verified our internal `DB::NewIterator()`-heavy regression test no longer reports regression.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: PR facebook#6944 transitioned `BlockIter` from using `Comparator*` to using concrete `UserComparatorWrapper` and `InternalKeyComparator`. However, adding them as instance variables to `BlockIter` was not optimal. Bloating `BlockIter` caused the `ArenaWrappedDBIter`'s arena allocator to do more heap allocations (in certain cases) which harmed performance of `DB::NewIterator()`. This PR pushes down the concrete comparator objects to the point of usage, which forces them to be on the stack. As a result, the `BlockIter` is back to its original size prior to facebook#6944 (actually a bit smaller since there were two `Comparator*` before). Pull Request resolved: facebook#7149 Test Plan: verified our internal `DB::NewIterator()`-heavy regression test no longer reports regression. Reviewed By: riversand963 Differential Revision: D22623189 Pulled By: ajkr fbshipit-source-id: f6d69accfe5de51e0bd9874a480b32b29909bab6
| int CompareCurrentKey(const Slice& other) { | ||
| if (raw_key_.IsUserKey()) { | ||
| assert(global_seqno_ == kDisableGlobalSequenceNumber); | ||
| return ucmp_wrapper_.Compare(raw_key_.GetUserKey(), other); | ||
| return ucmp().Compare(raw_key_.GetUserKey(), other); | ||
| } else if (global_seqno_ == kDisableGlobalSequenceNumber) { | ||
| return icmp_.Compare(raw_key_.GetInternalKey(), other); | ||
| return icmp().Compare(raw_key_.GetInternalKey(), other); | ||
| } | ||
| return icmp_.Compare(raw_key_.GetInternalKey(), global_seqno_, other, | ||
| kDisableGlobalSequenceNumber); | ||
| return icmp().Compare(raw_key_.GetInternalKey(), global_seqno_, other, | ||
| kDisableGlobalSequenceNumber); |
There was a problem hiding this comment.
Hi @ajkr , I found that after this PR, each time when we call a CompareCurrentKey operation a new comparator will be initiated and constructed. I did some basic profiling and found that this may be a bottleneck. I was wondering what is the reason for this change, and would it harm performance? Thanks!
There was a problem hiding this comment.
Hi @ajkr , I found that after this PR, each time when we call a CompareCurrentKey operation a new comparator will be initiated and constructed. I did some basic profiling and found that this may be a bottleneck. I was wondering what is the reason for this change, and would it harm performance? Thanks!
A quick glance at the code looks like that the DataBlockIterator could be created/initialized with the InternalKeyComparator and avoid the extra allocations.
There was a problem hiding this comment.
Thanks for your reply. I am still a little confused. Could you explain more? From what I see, DataBlockIterator still calls CompareCurrentKey in BlockIter and either icmp() or ucmp() will allocate a new comparator.
There was a problem hiding this comment.
@XinyuZeng The places that create a DataBlockIter already have the equivalent of an icmp and extract from it the "raw comparator". If this "icmp" was passed in (and stored) in the DataBlockIter, then there is no need to create an icmp. Further, inside the icmp is the UserComparator (ucmp), so there would be no need to create that either.
I did not exhaustively look to see if this approach will truly work but a quick glance looks like it should.
There was a problem hiding this comment.
@mrambacher Thanks. I get that the "raw comparator" will be saved as raw_ucmp_ and passed as pointer to become UserComparatorWrapper when calling either icmp() or ucmp(), so this step has no allocation cost. However, calling icmp() or ucmp() will create a new Comparator instance, which is a subclass of Customizable(which means some allocations of such classes are also needed to be performed). My concern is that the construction and destruction of such instances in every comparison operation may have a cost, as showed in some of my profiling.
There was a problem hiding this comment.
@XinyuZeng I was suggesting something different. Store the "Internal Comparator" in the iterator (the icmp) instead of the raw comparator. The icmp now returns this without creating anything. The ucmp returns the "user_comparator" from the InternalKeyComparator -- again without creating anything. The size of the object does not change (just storing a different pointer) and there is no overhead of creating objects on every call.
If you are not comfortable attempting this change, I can try to look into it more later in the week.
There was a problem hiding this comment.
@mrambacher Yes that is exactly what I am thinking about----I thought you were saying such logic already exists in code yesterday. I can attemp it this weekend and come back to you if I encounter any challenge since I did not commit to rocksdb before, if this change is not in a hurry. Many thanks for your explaination!
|
@XinyuZeng In block.h, make InitializeBase a "protected" function. Otherwise keep it the same. In DataBlockIter, make the constructor and Initialize take an InternalKeyComparator (and then fix up the remaining logic as mentioned earlier). Please submit this as a new PR. If you can, please report on the performance delta as well. I will look for it and review it when you submit it. Thanks for your effort! |
Summary: This PR solves the problem discussed in #7149. By storing the pointer of InternalKeyComparator as icmp_ in BlockIter, the object size remains the same. And for each call to CompareCurrentKey, there is no need to create Comparator objects. One can use icmp_ directly or use the "user_comparator" from the icmp_. Pull Request resolved: #9611 Test Plan: with #9903, ``` $ TEST_TMPDIR=/dev/shm python3.6 ../benchmark/tools/compare.py benchmarks ./db_basic_bench ../rocksdb-pr9611/db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1 --benchmark_repetitions=50 ... Comparing ./db_basic_bench to ../rocksdb-pr9611/db_basic_bench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ... DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_pvalue 0.0001 0.0001 U Test, Repetitions: 50 vs 50 DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_mean -0.0483 -0.0483 3924 3734 3924 3734 DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_median -0.0713 -0.0713 3971 3687 3970 3687 DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_stddev -0.0342 -0.0344 225 217 225 217 DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_cv +0.0148 +0.0146 0 0 0 0 OVERALL_GEOMEAN -0.0483 -0.0483 0 0 0 0 ``` Reviewed By: akankshamahajan15 Differential Revision: D35882037 Pulled By: ajkr fbshipit-source-id: 9e5337bbad8f1239dff7aa9f6549020d599bfcdf
PR #6944 transitioned
BlockIterfrom usingComparator*to usingconcrete
UserComparatorWrapperandInternalKeyComparator. However,adding them as instance variables to
BlockIterwas not optimal.Bloating
BlockItercaused theArenaWrappedDBIter's arena allocator to do more heapallocations (in certain cases) which harmed performance of
DB::NewIterator(). This PRpushes down the concrete comparator objects to the point of usage, which
forces them to be on the stack. As a result, the
BlockIteris back toits original size prior to #6944 (actually a bit smaller since there
were two
Comparator*before).Test Plan: verified our internal
DB::NewIterator()-heavy regressiontest no longer reports meaningful regression.