Implement P0040R3 parallel specialized memory algorithms#3145
Implement P0040R3 parallel specialized memory algorithms#3145StephanTLavavej merged 33 commits intomicrosoft:mainfrom
Conversation
Parallelized algorithms: * destroy * destroy_n * uninitialized_default_construct * uninitialized_default_construct_n * uninitialized_value_construct * uninitialized_value_construct_n Non-parallelized algorithms: * uninitialized_copy * uninitialized_copy_n * uninitialized_fill * uninitialized_fill_n * uninitialized_move * uninitialized_move_n
Why can't we just ignore it?
|
Do we want to have some benchmark to prove parallelizing works? |
- PARALLEL - const _NoThrowFwdIt _First - consistent `_Value_type` - iteration on `_UFirst` (fixing copy-pasta)
| struct test_case_uninitialized_value_construct_unwrap_parallel { | ||
| template <class ExecutionPolicy> | ||
| void operator()(const size_t testSize, const ExecutionPolicy& exec) { | ||
| auto vec = vector<int>(testSize, bad_int); |
There was a problem hiding this comment.
To other reviewers: I'm not complaining about the use of AAA here, because I appreciate how it increases consistency in these test cases. I prefer auto x = f(args...); and auto y = T(args...); to auto x = f(args...); and T y(args...);. YMMV. (Also on 208.) (No change requested.)
There was a problem hiding this comment.
I'm (reluctantly) okay with this rationale.
| for (; _Count > 0; --_Count, (void) ++_UFirst) { | ||
| _STD _Construct_in_place(*_UFirst); | ||
| } | ||
| _STD _Seek_wrapped(_First, _UFirst); |
There was a problem hiding this comment.
No change requested: I observe that we could _STD _Seek_wrapped(_First, _Operation._Basis._Populate(_Operation._Team, _UFirst)); above (right line 5186), then exhaust parallelism resources and fall through to here, where we _Seek_wrapped again. However, this is correct because _Seek_wrapped is a "seek to"/assignment, not a "seek by", and there are no other uses of _First that could be damaged by it having been unexpectedly updated.
| unique_ptr<T, deallocating_only_deleter<T>> up{al.allocate(n), deallocating_only_deleter<T>{n}}; | ||
| for (size_t i = 0; i != n; ++i) { | ||
| allocator_traits<allocator<T>>::construct(al, up.get() + i); | ||
| } |
There was a problem hiding this comment.
No change requested: it seems unusual that an array of elements is being stored in unique_ptr<T, DEL> instead of unique_ptr<T[], DEL>. However, I am unsure of the implications of making such a change (some are positive; unique_ptr<T[]> disables conversions that are bogus for arrays). This is only test code, so it's allowed to be somewhat squirrelly. 🐿️
Changes requested in Nov 2022 have been made, purr!
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
|
I pushed a conflict-free merge with |
|
Thanks for truly finishing C++17! 😻 💯 🎉 |
Fixes #525.
I decided to parallelize the following algorithms:
destroydestroy_nuninitialized_default_constructuninitialized_default_construct_nuninitialized_value_constructuninitialized_value_construct_nbecause IMO they are similar enough to
for_eachandfor_each_n.The following algorithms are currently not parallelized:
uninitialized_copyuninitialized_copy_nuninitialized_filluninitialized_fill_nuninitialized_moveuninitialized_move_nbecause the normal corresponding algorithms in
<algorithm>are also not parallelized.It is doubtful to me whether
copy/move/fillfamilies are really not worth being parallelized. But perhaps we should address this in another issue.