Skip to content

minimize BlockIter comparator scope#7149

Closed
ajkr wants to merge 1 commit intofacebook:masterfrom
ajkr:push-down-block-iter-comparators
Closed

minimize BlockIter comparator scope#7149
ajkr wants to merge 1 commit intofacebook:masterfrom
ajkr:push-down-block-iter-comparators

Conversation

@ajkr
Copy link
Copy Markdown
Contributor

@ajkr ajkr commented Jul 20, 2020

PR #6944 transitioned BlockIter from using Comparator* to using
concrete UserComparatorWrapper and InternalKeyComparator. However,
adding them as instance variables to BlockIter was not optimal.
Bloating BlockIter caused the ArenaWrappedDBIter's arena allocator to do more heap
allocations (in certain cases) which harmed performance of DB::NewIterator(). This PR
pushes down the concrete comparator objects to the point of usage, which
forces them to be on the stack. As a result, the BlockIter is back to
its original size prior to #6944 (actually a bit smaller since there
were two Comparator* before).

Test Plan: verified our internal DB::NewIterator()-heavy regression
test no longer reports meaningful regression.

PR facebook#6944 transitioned `BlockIter` from using `Comparator*` to using
concrete `UserComparatorWrapper` and `InternalKeyComparator`. However,
adding them as instance variables to `BlockIter` was not optimal.
Bloating `BlockIter` caused the `ArenaWrappedDBIter`'s arena allocator
to do more heap
allocations which harmed performance of `DB::NewIterator()`. This PR
pushes down the concrete comparator objects to the point of usage, which
forces them to be on the stack. As a result, the `BlockIter` is back to
its original size prior to facebook#6944 (actually a bit smaller since there
were two `Comparator*` before).

Test Plan: verified our internal `DB::NewIterator()`-heavy regression
test no longer reports regression.
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from riversand963 July 20, 2020 18:04
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ajkr merged this pull request in 643c863.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
PR facebook#6944 transitioned `BlockIter` from using `Comparator*` to using
concrete `UserComparatorWrapper` and `InternalKeyComparator`. However,
adding them as instance variables to `BlockIter` was not optimal.
Bloating `BlockIter` caused the `ArenaWrappedDBIter`'s arena allocator to do more heap
allocations (in certain cases) which harmed performance of `DB::NewIterator()`. This PR
pushes down the concrete comparator objects to the point of usage, which
forces them to be on the stack. As a result, the `BlockIter` is back to
its original size prior to facebook#6944 (actually a bit smaller since there
were two `Comparator*` before).

Pull Request resolved: facebook#7149

Test Plan:
verified our internal `DB::NewIterator()`-heavy regression
test no longer reports regression.

Reviewed By: riversand963

Differential Revision: D22623189

Pulled By: ajkr

fbshipit-source-id: f6d69accfe5de51e0bd9874a480b32b29909bab6
Comment on lines 421 to +429
int CompareCurrentKey(const Slice& other) {
if (raw_key_.IsUserKey()) {
assert(global_seqno_ == kDisableGlobalSequenceNumber);
return ucmp_wrapper_.Compare(raw_key_.GetUserKey(), other);
return ucmp().Compare(raw_key_.GetUserKey(), other);
} else if (global_seqno_ == kDisableGlobalSequenceNumber) {
return icmp_.Compare(raw_key_.GetInternalKey(), other);
return icmp().Compare(raw_key_.GetInternalKey(), other);
}
return icmp_.Compare(raw_key_.GetInternalKey(), global_seqno_, other,
kDisableGlobalSequenceNumber);
return icmp().Compare(raw_key_.GetInternalKey(), global_seqno_, other,
kDisableGlobalSequenceNumber);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajkr , I found that after this PR, each time when we call a CompareCurrentKey operation a new comparator will be initiated and constructed. I did some basic profiling and found that this may be a bottleneck. I was wondering what is the reason for this change, and would it harm performance? Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajkr , I found that after this PR, each time when we call a CompareCurrentKey operation a new comparator will be initiated and constructed. I did some basic profiling and found that this may be a bottleneck. I was wondering what is the reason for this change, and would it harm performance? Thanks!

A quick glance at the code looks like that the DataBlockIterator could be created/initialized with the InternalKeyComparator and avoid the extra allocations.

Copy link
Copy Markdown
Contributor

@XinyuZeng XinyuZeng Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply. I am still a little confused. Could you explain more? From what I see, DataBlockIterator still calls CompareCurrentKey in BlockIter and either icmp() or ucmp() will allocate a new comparator.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinyuZeng The places that create a DataBlockIter already have the equivalent of an icmp and extract from it the "raw comparator". If this "icmp" was passed in (and stored) in the DataBlockIter, then there is no need to create an icmp. Further, inside the icmp is the UserComparator (ucmp), so there would be no need to create that either.

I did not exhaustively look to see if this approach will truly work but a quick glance looks like it should.

Copy link
Copy Markdown
Contributor

@XinyuZeng XinyuZeng Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrambacher Thanks. I get that the "raw comparator" will be saved as raw_ucmp_ and passed as pointer to become UserComparatorWrapper when calling either icmp() or ucmp(), so this step has no allocation cost. However, calling icmp() or ucmp() will create a new Comparator instance, which is a subclass of Customizable(which means some allocations of such classes are also needed to be performed). My concern is that the construction and destruction of such instances in every comparison operation may have a cost, as showed in some of my profiling.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinyuZeng I was suggesting something different. Store the "Internal Comparator" in the iterator (the icmp) instead of the raw comparator. The icmp now returns this without creating anything. The ucmp returns the "user_comparator" from the InternalKeyComparator -- again without creating anything. The size of the object does not change (just storing a different pointer) and there is no overhead of creating objects on every call.

If you are not comfortable attempting this change, I can try to look into it more later in the week.

Copy link
Copy Markdown
Contributor

@XinyuZeng XinyuZeng Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrambacher Yes that is exactly what I am thinking about----I thought you were saying such logic already exists in code yesterday. I can attemp it this weekend and come back to you if I encounter any challenge since I did not commit to rocksdb before, if this change is not in a hurry. Many thanks for your explaination!

@mrambacher
Copy link
Copy Markdown
Contributor

@XinyuZeng In block.h, make InitializeBase a "protected" function. Otherwise keep it the same. In DataBlockIter, make the constructor and Initialize take an InternalKeyComparator (and then fix up the remaining logic as mentioned earlier).

Please submit this as a new PR. If you can, please report on the performance delta as well. I will look for it and review it when you submit it.

Thanks for your effort!

facebook-github-bot pushed a commit that referenced this pull request May 4, 2022
Summary:
This PR solves the problem discussed in #7149. By storing the pointer of InternalKeyComparator as icmp_ in BlockIter, the object size remains the same. And for each call to CompareCurrentKey, there is no need to create Comparator objects. One can use icmp_ directly or use the "user_comparator" from the icmp_.

Pull Request resolved: #9611

Test Plan:
with #9903,

```
$ TEST_TMPDIR=/dev/shm python3.6 ../benchmark/tools/compare.py benchmarks ./db_basic_bench ../rocksdb-pr9611/db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1 --benchmark_repetitions=50
...
Comparing ./db_basic_bench to ../rocksdb-pr9611/db_basic_bench
Benchmark                                                                                                                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_pvalue                 0.0001          0.0001      U Test, Repetitions: 50 vs 50
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_mean                  -0.0483         -0.0483          3924          3734          3924          3734
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_median                -0.0713         -0.0713          3971          3687          3970          3687
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_stddev                -0.0342         -0.0344           225           217           225           217
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_cv                    +0.0148         +0.0146             0             0             0             0
OVERALL_GEOMEAN                                                                                                                                                      -0.0483         -0.0483             0             0             0             0
```

Reviewed By: akankshamahajan15

Differential Revision: D35882037

Pulled By: ajkr

fbshipit-source-id: 9e5337bbad8f1239dff7aa9f6549020d599bfcdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants