Optimize the is_permutation family and _Hash::operator== for multicontainers by BillyONeal · Pull Request #423 · microsoft/STL

BillyONeal · 2020-01-15T21:24:02Z

<xutility>
4660: _Find_pr is a helper for is_permutation, so move it down to that area.
4684: The SHOUTY banners were attached to functions which were implementation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions.
4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized.
4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format.
4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment.
4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us.

<xhash>
1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it.
1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts.
Improvements over the old implementation:

For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we started with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large.
Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths.
We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range.

The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range.

Performance results:

Benchmark code:

#undef NDEBUG
#define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS
#include <assert.h>
#include <benchmark/benchmark.h>
#include <hash_map>
#include <random>
#include <stddef.h>
#include <unordered_map>
#include <utility>
#include <vector>

using namespace std;

template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    next(b.begin(), b.size() - 1)->second = 1u;
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) {
    std::minstd_rand rng(std::random_device{}());
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    vector<pair<unsigned, unsigned>> testData;
    testData.resize(range0 * 5);
    const auto dataEnd = testData.begin() + range0;
    std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; });
    std::copy(testData.begin(), dataEnd,
              std::copy(testData.begin(), dataEnd,
                        std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd))));
    std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end());
    testData.clear();
    std::unordered_multimap<unsigned, unsigned> b = a;
    for (auto &&_ : state) {
        (void)_;
        assert(a == b);
    }
}

BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000);

template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) {
    std::unordered_multimap<unsigned, unsigned> a;
    std::unordered_multimap<unsigned, unsigned> b;
    const auto range0 = static_cast<ptrdiff_t>(state.range(0));
    for (ptrdiff_t idx = 0; idx < range0; ++idx) {
        a.emplace(0, 1);
        b.emplace(1, 0);
    }

    a.emplace(1, 0);
    b.emplace(0, 1);
    for (auto &&_ : state) {
        (void)_;
        assert(a != b);
    }
}

BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);
BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000);

BENCHMARK_MAIN();

Checklist

Be sure you've read README.md and understand the scope of this repo.

If you're unsure about a box, leave it unchecked. A maintainer will help you.

Identifiers in product code changes are properly _Ugly as per
https://eel.is/c++draft/lex.name#3.1 or there are no product code changes.
The STL builds successfully and all tests have passed (must be manually
verified by an STL maintainer before automated testing is enabled on GitHub,
leave this unchecked for initial submission).
These changes introduce no known ABI breaks (adding members, renaming
members, adding virtual functions, changing whether a type is an aggregate
or trivially copyable, etc.).
These changes were written from scratch using only this repository,
the C++ Working Draft (including any cited standards), other WG21 papers
(excluding reference implementations outside of proposed standard wording),
and LWG issues as reference material. If they were derived from a project
that's already listed in NOTICE.txt, that's fine, but please mention it.
If they were derived from any other project (including Boost and libc++,
which are not yet listed in NOTICE.txt), you must mention it here,
so we can determine whether the license is compatible and what else needs
to be done.

…taniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN();

BillyONeal · 2020-01-15T21:28:50Z

I am specifically interested in more perf tests folks think I should run.

BillyONeal · 2020-01-15T21:55:14Z

The SHOUTY banners were attached to functions which were implmentation details

s/implmentation/implementation/

Fixed, feel free to just edit stuff like that :P

CaseyCarter · 2020-01-15T22:05:03Z

Fixed, feel free to just edit stuff like that :P

I started to, but was concerned you might have a dual internal PR that needs to be corrected also.

BillyONeal · 2020-01-15T22:10:15Z

Fixed, feel free to just edit stuff like that :P

I started to, but was concerned you might have a dual internal PR that needs to be corrected also.

There are no test changes needed for this so I'm going to let it be approved here before making a dual one

stl/inc/xutility

stl/inc/xhash

Resolves microsoftGH-6, microsoftGH-38, and drive-by fixes microsoftGH-414. Constexprizes the following algorithms and enables all relevant libc++ tests: * adjacent_find * all_of * any_of * binary_search * copy * copy_backward * copy_if * copy_n * count * count_if * equal * equal_range * exchange * fill * fill_n * find * find_end * find_first_of * find_if * find_if_not * for_each * for_each_n * generate * generate_n * includes * is_heap * is_heap * is_heap_until * is_partitioned * is_permutation * is_sorted * is_sorted_until * iter_swap * lexicographical_compare * lower_bound * make_heap * merge * mismatch * move * move_backward * next_permutation * none_of * nth_element * partial_sort * partial_sort_copy * partition * partition_copy * partition_point * pop_heap * prev_permutation * push_heap * remove * remove_copy * remove_copy_if * remove_if * replace * replace_copy * replace_copy_if * replace_if * reverse_copy * revese * rotate * rotate_copy * search * search_n * set_difference * set_intersection * set_symmetric_difference * set_union * sort * sort_heap * swap * swap_ranges * transform * unique * unique_copy * upper_bound This commit also contains the contents of microsoft#423 to workaround DevCom-883631, it will be rebased and that content removed once that is merged (the other one needs to be merged first). Specific notes: `<xutility>`: * The `_Ptr_copy_cat` family are moved down next to std::copy as that is their first consumer. * The core language loop for `copy_n` is fairly long and so it was extracted into its own function, `_Copy_n_core` (note similar name schema to `_Copy_memmove`) * `reverse` was changed to use early-returns for its optimization passes; this removes the nice thing of having if constexpr not instantiate the loop. However, this form allows the loop to not be duplicated.

miscco · 2020-01-17T20:25:17Z

stl/inc/xhash

+            // check the first elements for equivalence when !_Standard
+            if (_Right._Traitsobj(_Keyval, _Traits::_Kfn(*_First2))) {
+                return {};
+            }
+
+            // trim matching prefixes
+            const size_t _LHashval   = _Traitsobj(_Keyval);
+            const size_type _LBucket = _LHashval & _Mask;
+            const auto _LBucket_hi   = _Vec._Mypair._Myval2._Myfirst[(_LBucket << 1) + 1];


If I had sufficient coffee this time this is the main difference between the two if constexpr branches.

I you would store the pointer after _LBucket_hi you could also write

++_First1; const bool _Left_range_end = _First1 == _LBucket_hi_next || _Traitsobj(_Keyval, _Traits::_Kfn(*_First1));

With that the only difference between the two loops (besides comments) is _LBucket_hi_next vs _Unchecked_end(). Given that you use a temporary variable anyway you could assign _Unchecked_end() to _LBucket_hi_next in the _Traits::_Standard case

Dunno if that is truly worth it

That will do one extra pointer dereference per bucket (to find _LBucket_hi_next); but maybe we don't care about the perf of !_Standard (that is, hash_foo rather than unordered_foo) ...

Looks good, applied.

stl/inc/xutility

CaseyCarter

My non-resolved comments are suggestions, so I "Approve with suggestions."

BillyONeal · 2020-01-18T00:54:37Z

My non-resolved comments are suggestions, so I "Approve with suggestions."

Is there anything I missed other than the !_Pred thing?

CaseyCarter · 2020-01-18T01:04:00Z

Is there anything I missed other than the !_Pred thing?

No - I think I was confused by clicking the through the comments and observing old state which I thought was current state.

StephanTLavavej · 2020-01-18T02:25:22Z

stl/inc/xutility

 #endif // _HAS_CXX17

-// FUNCTION TEMPLATE _Count_pr
+// FUNCTION TEMPLATE is_permutation


FYI, #279 tracks a possible improvement to is_permutation, although it isn't necessary to implement at this time.

I do agree that looks interesting but would prefer that be a separate change (as it'll need its own perf tests and similar justification)

StephanTLavavej · 2020-01-18T02:29:58Z

stl/inc/xhash

    }

+    struct _Multi_equal_check_result {
+        bool _Equal_possible;


I am mildly uncomfortable with the lack of DMIs here; return {} initializes this bool to false but the risk of garbage-init is higher than I would like. Would it be better to add DMIs for {false} and {} for the iterator?

In my tests all the functions returning this type were inlined so I have no objection to DMIs.

StephanTLavavej · 2020-01-18T02:30:52Z

stl/inc/xhash

+
+    _NODISCARD _Multi_equal_check_result _Multi_equal_check_equal_range(
+        const _Hash& _Right, _Unchecked_const_iterator _First1) const {
+        // check equal_range of elements matching *_First1 match the equal_range of elements in _Right


This comment grammar is difficult to parse.

stl/inc/xhash

StephanTLavavej · 2020-01-18T02:37:31Z

stl/inc/xhash

+        const auto _Last1 = _Unchecked_end();
+        auto _First1      = _Unchecked_begin();
+        while (_First1 != _Last1) {
+            auto _Result = _Multi_equal_check_equal_range(_Right, _First1);


Can _Result be const?

StephanTLavavej · 2020-01-18T02:41:02Z

stl/inc/xutility

+            // narrowing _Iter_diff_t<_FwdIt1> to _Iter_diff_t<_FwdIt2> is OK because if the 2nd range is shorter than
+            // the 1st, the user already triggered UB


Style: I recommend saying "second" and "first" instead of aggressively abbreviating.

I agree, though technically I didn't edit that here :P. Will fix!

StephanTLavavej · 2020-01-18T02:42:14Z

stl/inc/xutility

-    for (; _First1 != _Last1 && _First2 != _Last2; ++_First1, (void) ++_First2) {
+    for (;;) { // trim matching prefix
+        if (_First1 == _Last1) {
+            return _First2 == _Last2; // sequences are equal


This comment is only true if we return true, although I suppose it is fairly obvious.

@miscco

…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.

@miscco

…tainers (microsoft#423) * Optimize the is_permutation family and _Hash::operator== for multicontaniers slightly. <xutility> 4660: _Find_pr is a helper for is_permutation, so move it down to that area. 4684: The SHOUTY banners were attached to functions which were implmentation details of is_permutation, so I fixed them up to say is_permutation and removed the banners for helper functions. 4711: Use if constexpr to avoid a tag dispatch call for _Trim_matching_suffixes. Optimizers will like this because they generally hate reference-to-pointer, and it also serves to workaround DevCom-883631 when this algorithm is constexprized. 4766: Indicate that we are trimming matching prefixes in this loop body, and break apart comment block that was incorrectly merged by clang-format. 4817: In the dual range forward version of the algorithm, calculate the distances concurrently to avoid wasting lots of time when the distances vary by a lot. For example, is_permutation( a forward range of length 1, a forward range of length 1'000'000 ) used to do the million increments, now it stops at 1 increment. 4862: In the dual range random-access version, avoid recalculating _Last2 when it has already been supplied to us. <xhash> 1404: Move down construction of _Bucket_hi in _Equal_range to before the first loop body using it. 1918: Added a new function to calculate equality for unordered multicontainers. We loop over the elements in the left container, find corresponding ranges in the right container, trim prefixes, then dispatch to is_permutation's helper _Check_match_counts. Improvements over the old implementation: * For standard containers, we no longer need to hash any elements from the left container; we know that we've found the "run" of equivalent elements because we *started* with an element in that container. We also never go "backwards" or multiply enumerate _Left (even for !_Standard), which improves cache use when the container becomes large. * Just like the dual range is_permutation improvement above, when the equal_ranges of the containers are of wildly varying lengths, this will stop on the shorter of the lengths. * We avoid the 3-arg is_permutation doing a linear time operation to discover _Last2 that we already had calculated in determining _Right's equal_range. The function _Multi_equal_check_equal_range tests one equal_range from the left container against the corresponding equal_range from the right container, while _Multi_equal invokes _Multi_equal_check_equal_range for each equal_range. Performance results: ``` Benchmark Before (ns) After (ns) Percent Better HashRandomUnequal<unordered_multimap>/1 18.7 11.7 59.83% HashRandomUnequal<unordered_multimap>/10 137 97 41.24% HashRandomUnequal<unordered_multimap>/100 1677 1141 46.98% HashRandomUnequal<unordered_multimap>/512 10386 7036 47.61% HashRandomUnequal<unordered_multimap>/4096 173807 119391 45.58% HashRandomUnequal<unordered_multimap>/32768 2898405 1529710 89.47% HashRandomUnequal<unordered_multimap>/100000 27441112 18602792 47.51% HashRandomUnequal<hash_multimap>/1 18.9 11.8 60.17% HashRandomUnequal<hash_multimap>/10 138 101 36.63% HashRandomUnequal<hash_multimap>/100 1613 1154 39.77% HashRandomUnequal<hash_multimap>/512 10385 7178 44.68% HashRandomUnequal<hash_multimap>/4096 171718 120115 42.96% HashRandomUnequal<hash_multimap>/32768 3352231 1510245 121.97% HashRandomUnequal<hash_multimap>/100000 26532471 19209741 38.12% HashRandomEqual<unordered_multimap>/1 16 9.4 70.21% HashRandomEqual<unordered_multimap>/10 126 89.2 41.26% HashRandomEqual<unordered_multimap>/100 1644 1133 45.10% HashRandomEqual<unordered_multimap>/512 10532 7183 46.62% HashRandomEqual<unordered_multimap>/4096 174580 120029 45.45% HashRandomEqual<unordered_multimap>/32768 3031653 1455416 108.30% HashRandomEqual<unordered_multimap>/100000 26100504 19240571 35.65% HashRandomEqual<hash_multimap>/1 15.9 9.38 69.51% HashRandomEqual<hash_multimap>/10 123 94.1 30.71% HashRandomEqual<hash_multimap>/100 1645 1151 42.92% HashRandomEqual<hash_multimap>/512 10177 7144 42.46% HashRandomEqual<hash_multimap>/4096 172994 121381 42.52% HashRandomEqual<hash_multimap>/32768 3045242 1966513 54.85% HashRandomEqual<hash_multimap>/100000 26013781 22025482 18.11% HashUnequalDifferingBuckets<unordered_multimap>/2 5.87 3.41 72.14% HashUnequalDifferingBuckets<unordered_multimap>/10 12 3.39 253.98% HashUnequalDifferingBuckets<unordered_multimap>/100 106 3.41 3008.50% HashUnequalDifferingBuckets<unordered_multimap>/512 691 3.46 19871.10% HashUnequalDifferingBuckets<unordered_multimap>/4096 6965 3.47 200620.46% HashUnequalDifferingBuckets<unordered_multimap>/32768 91451 3.46 2642992.49% HashUnequalDifferingBuckets<unordered_multimap>/100000 290430 3.52 8250752.27% HashUnequalDifferingBuckets<hash_multimap>/2 5.97 3.4 75.59% HashUnequalDifferingBuckets<hash_multimap>/10 11.8 3.54 233.33% HashUnequalDifferingBuckets<hash_multimap>/100 105 3.54 2866.10% HashUnequalDifferingBuckets<hash_multimap>/512 763 3.46 21952.02% HashUnequalDifferingBuckets<hash_multimap>/4096 6862 3.4 201723.53% HashUnequalDifferingBuckets<hash_multimap>/32768 94583 3.4 2781752.94% HashUnequalDifferingBuckets<hash_multimap>/100000 287996 3.43 8396284.84% ``` Benchmark code: ``` #undef NDEBUG #define _SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS #include <assert.h> #include <benchmark/benchmark.h> #include <hash_map> #include <random> #include <stddef.h> #include <unordered_map> #include <utility> #include <vector> using namespace std; template <template <class...> class MapType> void HashRandomUnequal(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0u}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; next(b.begin(), b.size() - 1)->second = 1u; for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashRandomUnequal, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomUnequal, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashRandomEqual(benchmark::State &state) { std::minstd_rand rng(std::random_device{}()); const auto range0 = static_cast<ptrdiff_t>(state.range(0)); vector<pair<unsigned, unsigned>> testData; testData.resize(range0 * 5); const auto dataEnd = testData.begin() + range0; std::generate(testData.begin(), dataEnd, [&]() { return pair<unsigned, unsigned>{rng(), 0}; }); std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, std::copy(testData.begin(), dataEnd, dataEnd)))); std::unordered_multimap<unsigned, unsigned> a(testData.begin(), testData.end()); testData.clear(); std::unordered_multimap<unsigned, unsigned> b = a; for (auto &&_ : state) { (void)_; assert(a == b); } } BENCHMARK_TEMPLATE1(HashRandomEqual, unordered_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashRandomEqual, hash_multimap)->Arg(1)->Arg(10)->Range(100, 100'000); template <template <class...> class MapType> void HashUnequalDifferingBuckets(benchmark::State &state) { std::unordered_multimap<unsigned, unsigned> a; std::unordered_multimap<unsigned, unsigned> b; const auto range0 = static_cast<ptrdiff_t>(state.range(0)); for (ptrdiff_t idx = 0; idx < range0; ++idx) { a.emplace(0, 1); b.emplace(1, 0); } a.emplace(1, 0); b.emplace(0, 1); for (auto &&_ : state) { (void)_; assert(a != b); } } BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, unordered_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_TEMPLATE1(HashUnequalDifferingBuckets, hash_multimap)->Arg(2)->Arg(10)->Range(100, 100'000); BENCHMARK_MAIN(); * Apply a bunch of code review comments from Casey. * clang-format * Apply @miscco's code deduplication idea for <xhash>. * Fix code review comments from Stephan: comments and add DMIs.

BillyONeal requested a review from a team as a code owner January 15, 2020 21:24

BillyONeal self-assigned this Jan 15, 2020

BillyONeal requested review from CaseyCarter and StephanTLavavej January 15, 2020 21:26

BillyONeal changed the title ~~Optimize the is_permutation family and _Hash::operator== for multicontainers slightly.~~ Optimize the is_permutation family and _Hash::operator== for multicontainers Jan 15, 2020

This comment has been minimized.

Sign in to view

CaseyCarter suggested changes Jan 15, 2020

View reviewed changes

BillyONeal mentioned this pull request Jan 16, 2020

Implement constexpr algorithms. #425

Merged

4 tasks

BillyONeal added 2 commits January 16, 2020 14:39

Apply a bunch of code review comments from Casey.

3d7f760

clang-format

ce1ff2d

BillyONeal requested a review from CaseyCarter January 16, 2020 22:47

miscco reviewed Jan 17, 2020

View reviewed changes

stl/inc/xutility Show resolved Hide resolved

CaseyCarter approved these changes Jan 17, 2020

View reviewed changes

CaseyCarter approved these changes Jan 18, 2020

View reviewed changes

Apply @miscco's code deduplication idea for <xhash>.

02c0d90

StephanTLavavej approved these changes Jan 18, 2020

View reviewed changes

BillyONeal added 2 commits January 18, 2020 01:55

Merge remote-tracking branch 'origin/master' into optimize_permutations

49bac6f

Fix code review comments from Stephan: comments and add DMIs.

38453b8

BillyONeal merged commit ff403e3 into microsoft:master Jan 19, 2020

BillyONeal deleted the optimize_permutations branch January 19, 2020 23:48

StephanTLavavej mentioned this pull request Feb 11, 2021

<xhash>: Fix compiler error when dllexporting a class derived from unordered_set #1639

Merged

		// narrowing _Iter_diff_t<_FwdIt1> to _Iter_diff_t<_FwdIt2> is OK because if the 2nd range is shorter than
		// the 1st, the user already triggered UB

Conversation

BillyONeal commented Jan 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

BillyONeal commented Jan 15, 2020

Uh oh!

This comment has been minimized.

BillyONeal commented Jan 15, 2020

Uh oh!

CaseyCarter commented Jan 15, 2020

Uh oh!

BillyONeal commented Jan 15, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CaseyCarter left a comment

Choose a reason for hiding this comment

Uh oh!

BillyONeal commented Jan 18, 2020

Uh oh!

CaseyCarter commented Jan 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BillyONeal commented Jan 15, 2020 •

edited

Loading