Optimize the is_permutation family and _Hash::operator== for multicontainers#423
Conversation
…taniers slightly.
<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.
<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
* For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
* Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
* We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.
Performance results:
```
Benchmark Before (ns) After (ns) Percent Better
HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83%
HashRandomUnequal<unordered_multimap>/10 137 97 41.24%
HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98%
HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61%
HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58%
HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47%
HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51%
HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17%
HashRandomUnequal<hash_multimap>/10 138 101 36.63%
HashRandomUnequal<hash_multimap>/100 1613 1154 39.77%
HashRandomUnequal<hash_multimap>/512 10385 7178 44.68%
HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96%
HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97%
HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12%
HashRandomEqual<unordered_multimap>/1 16 9.4 70.21%
HashRandomEqual<unordered_multimap>/10 126 89.2 41.26%
HashRandomEqual<unordered_multimap>/100 1644 1133 45.10%
HashRandomEqual<unordered_multimap>/512 10532 7183 46.62%
HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45%
HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30%
HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65%
HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51%
HashRandomEqual<hash_multimap>/10 123 94.1 30.71%
HashRandomEqual<hash_multimap>/100 1645 1151 42.92%
HashRandomEqual<hash_multimap>/512 10177 7144 42.46%
HashRandomEqual<hash_multimap>/4096 172994 121381 42.52%
HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85%
HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11%
HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14%
HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98%
HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50%
HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10%
HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46%
HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49%
HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27%
HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59%
HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33%
HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10%
HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02%
HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53%
HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94%
HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84%
```
Benchmark code:
```
#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>
using namespace std;
template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
std::minstd_rand rng(std::random_device{}());
const auto range0 = static_cast<ptrdiff_t>(state.range(0));
vector<pair<unsigned, unsigned>> testData;
testData.resize(range0 * 5);
const auto dataEnd = testData.begin() + range0;
std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
testData.clear();
std::unordered_multimap<unsigned, unsigned> b = a;
next(b.begin(), b.size() - 1)->second = 1u;
for (auto &&_ : state) {
(void)_;
assert(a != b);
}
}
BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
std::minstd_rand rng(std::random_device{}());
const auto range0 = static_cast<ptrdiff_t>(state.range(0));
vector<pair<unsigned, unsigned>> testData;
testData.resize(range0 * 5);
const auto dataEnd = testData.begin() + range0;
std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd,
std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
testData.clear();
std::unordered_multimap<unsigned, unsigned> b = a;
for (auto &&_ : state) {
(void)_;
assert(a == b);
}
}
BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
std::unordered_multimap<unsigned, unsigned> a;
std::unordered_multimap<unsigned, unsigned> b;
const auto range0 = static_cast<ptrdiff_t>(state.range(0));
for (ptrdiff_t idx = 0; idx < range0; ++idx) {
a.emplace(0, 1);
b.emplace(1, 0);
}
a.emplace(1, 0);
b.emplace(0, 1);
for (auto &&_ : state) {
(void)_;
assert(a != b);
}
}
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_MAIN();
|
I am specifically interested in more perf tests folks think I should run. |
This comment has been minimized.
This comment has been minimized.
Fixed, feel free to just edit stuff like that :P |
I started to, but was concerned you might have a dual internal PR that needs to be corrected also. |
There are no test changes needed for this so I'm going to let it be approved here before making a dual one |
Resolves microsoftGH-6, microsoftGH-38, and drive-by fixes microsoftGH-414. Constexprizes the following algorithms and enables all relevant libc++ tests: * adjacent_find * all_of * any_of * binary_search * copy * copy_backward * copy_if * copy_n * count * count_if * equal * equal_range * exchange * fill * fill_n * find * find_end * find_first_of * find_if * find_if_not * for_each * for_each_n * generate * generate_n * includes * is_heap * is_heap * is_heap_until * is_partitioned * is_permutation * is_sorted * is_sorted_until * iter_swap * lexicographical_compare * lower_bound * make_heap * merge * mismatch * move * move_backward * next_permutation * none_of * nth_element * partial_sort * partial_sort_copy * partition * partition_copy * partition_point * pop_heap * prev_permutation * push_heap * remove * remove_copy * remove_copy_if * remove_if * replace * replace_copy * replace_copy_if * replace_if * reverse_copy * revese * rotate * rotate_copy * search * search_n * set_difference * set_intersection * set_symmetric_difference * set_union * sort * sort_heap * swap * swap_ranges * transform * unique * unique_copy * upper_bound This commit also contains the contents of microsoft#423 to workaround DevCom-883631, it will be rebased and that content removed once that is merged (the other one needs to be merged first). Specific notes: `<xutility>`: * The `_Ptr_copy_cat` family are moved down next to std::copy as that is their first consumer. * The core language loop for `copy_n` is fairly long and so it was extracted into its own function, `_Copy_n_core` (note similar name schema to `_Copy_memmove`) * `reverse` was changed to use early-returns for its optimization passes; this removes the nice thing of having if constexpr not instantiate the loop. However, this form allows the loop to not be duplicated.
| // check the first elements for equivalence when !_Standard | ||
| if (_Right._Traitsobj(_Keyval, _Traits::_Kfn(*_First2))) { | ||
| return {}; | ||
| } | ||
|
|
||
| // trim matching prefixes | ||
| const size_t _LHashval = _Traitsobj(_Keyval); | ||
| const size_type _LBucket = _LHashval & _Mask; | ||
| const auto _LBucket_hi = _Vec._Mypair._Myval2._Myfirst[(_LBucket << 1) + 1]; |
There was a problem hiding this comment.
If I had sufficient coffee this time this is the main difference between the two if constexpr branches.
I you would store the pointer after _LBucket_hi you could also write
++_First1;
const bool _Left_range_end =
_First1 == _LBucket_hi_next || _Traitsobj(_Keyval, _Traits::_Kfn(*_First1));With that the only difference between the two loops (besides comments) is _LBucket_hi_next vs _Unchecked_end(). Given that you use a temporary variable anyway you could assign _Unchecked_end() to _LBucket_hi_next in the _Traits::_Standard case
Dunno if that is truly worth it
There was a problem hiding this comment.
That will do one extra pointer dereference per bucket (to find _LBucket_hi_next); but maybe we don't care about the perf of !_Standard (that is, hash_foo rather than unordered_foo) ...
CaseyCarter
left a comment
There was a problem hiding this comment.
My non-resolved comments are suggestions, so I "Approve with suggestions."
Is there anything I missed other than the |
No - I think I was confused by clicking the through the comments and observing old state which I thought was current state. |
| #endif // _HAS_CXX17 | ||
|
|
||
| // FUNCTION TEMPLATE _Count_pr | ||
| // FUNCTION TEMPLATE is_permutation |
There was a problem hiding this comment.
FYI, #279 tracks a possible improvement to is_permutation, although it isn't necessary to implement at this time.
There was a problem hiding this comment.
I do agree that looks interesting but would prefer that be a separate change (as it'll need its own perf tests and similar justification)
stl/inc/xhash
Outdated
| } | ||
|
|
||
| struct _Multi_equal_check_result { | ||
| bool _Equal_possible; |
There was a problem hiding this comment.
I am mildly uncomfortable with the lack of DMIs here; return {} initializes this bool to false but the risk of garbage-init is higher than I would like. Would it be better to add DMIs for {false} and {} for the iterator?
There was a problem hiding this comment.
In my tests all the functions returning this type were inlined so I have no objection to DMIs.
stl/inc/xhash
Outdated
|
|
||
| _NODISCARD _Multi_equal_check_result _Multi_equal_check_equal_range( | ||
| const _Hash& _Right, _Unchecked_const_iterator _First1) const { | ||
| // check equal_range of elements matching *_First1 match the equal_range of elements in _Right |
There was a problem hiding this comment.
This comment grammar is difficult to parse.
stl/inc/xhash
Outdated
| const auto _Last1 = _Unchecked_end(); | ||
| auto _First1 = _Unchecked_begin(); | ||
| while (_First1 != _Last1) { | ||
| auto _Result = _Multi_equal_check_equal_range(_Right, _First1); |
stl/inc/xutility
Outdated
| // narrowing _Iter_diff_t<_FwdIt1> to _Iter_diff_t<_FwdIt2> is OK because if the 2nd range is shorter than | ||
| // the 1st, the user already triggered UB |
There was a problem hiding this comment.
Style: I recommend saying "second" and "first" instead of aggressively abbreviating.
There was a problem hiding this comment.
I agree, though technically I didn't edit that here :P. Will fix!
stl/inc/xutility
Outdated
| for (; _First1 != _Last1 && _First2 != _Last2; ++_First1, (void) ++_First2) { | ||
| for (;;) { // trim matching prefix | ||
| if (_First1 == _Last1) { | ||
| return _First2 == _Last2; // sequences are equal |
There was a problem hiding this comment.
This comment is only true if we return true, although I suppose it is fairly obvious.
…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.
…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.
<xutility>4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implementation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.
<xhash>1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:
The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.
Performance results:
Benchmark code:
Checklist
Be sure you've read README.md and understand the scope of this repo.
If you're unsure about a box, leave it unchecked. A maintainer will help you.
_Uglyas perhttps://eel.is/c++draft/lex.name#3.1 or there are no product code changes.
verified by an STL maintainer before automated testing is enabled on GitHub,
leave this unchecked for initial submission).
members, adding virtual functions, changing whether a type is an aggregate
or trivially copyable, etc.).
the C++ Working Draft (including any cited standards), other WG21 papers
(excluding reference implementations outside of proposed standard wording),
and LWG issues as reference material. If they were derived from a project
that's already listed in NOTICE.txt, that's fine, but please mention it.
If they were derived from any other project (including Boost and libc++,
which are not yet listed in NOTICE.txt), you must mention it here,
so we can determine whether the license is compatible and what else needs
to be done.