-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
While investigating #3244, I found something curious - count() was affected by that correctness bug only in C++20/23 mode when vector iterators are used (see https://godbolt.org/z/dcTscbeTn ). This is because there's a performance bug in C++14/17 mode.
There are 4 uses of _Within_limits in the STL (affected by said correctness bug, which is irrelevant for the purposes of this performance bug). Each use is preceded by checking _Vector_alg_in_find_is_safe.
- 💚
ranges::_Count_fn::_Count_unchecked:Lines 442 to 453 in e28f956
template <class _It, class _Se, class _Ty, class _Pj> _NODISCARD static constexpr iter_difference_t<_It> _Count_unchecked( _It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj) { _STL_INTERNAL_STATIC_ASSERT(input_iterator<_It>); _STL_INTERNAL_STATIC_ASSERT(sentinel_for<_Se, _It>); _STL_INTERNAL_STATIC_ASSERT(indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>); #if _USE_STD_VECTOR_ALGORITHMS if constexpr (is_same_v<_Pj, identity> && _Vector_alg_in_find_is_safe<_It, _Ty> && sized_sentinel_for<_Se, _It>) { if (!_STD is_constant_evaluated()) { if (!_Within_limits(_First, _Val)) { - 💚
std::_Find_unchecked:Lines 5603 to 5612 in e28f956
template <class _InIt, class _Ty> _NODISCARD _CONSTEXPR20 _InIt _Find_unchecked(_InIt _First, const _InIt _Last, const _Ty& _Val) { // find first matching _Val; choose optimization // activate optimization for contiguous iterators to most scalar types (possibly const-qualified) if constexpr (_Vector_alg_in_find_is_safe<_InIt, _Ty>) { #if _HAS_CXX20 if (!_STD is_constant_evaluated()) #endif // _HAS_CXX20 { if (!_Within_limits(_First, _Val)) { - 💚
ranges::_Find_unchecked:Lines 5674 to 5681 in e28f956
template <input_iterator _It, sentinel_for<_It> _Se, class _Ty, class _Pj = identity> requires indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*> _NODISCARD constexpr _It _Find_unchecked(_It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj = {}) { constexpr bool _Is_sized = sized_sentinel_for<_Se, _It>; if constexpr (_Vector_alg_in_find_is_safe<_It, _Ty> && _Sized_or_unreachable_sentinel_for<_Se, _It> && same_as<_Pj, identity>) { if (!_STD is_constant_evaluated()) { if (!_Within_limits(_First, _Val)) { - 🐞
std::count:Lines 5779 to 5795 in e28f956
_EXPORT_STD template <class _InIt, class _Ty> _NODISCARD _CONSTEXPR20 _Iter_diff_t<_InIt> count(const _InIt _First, const _InIt _Last, const _Ty& _Val) { // count elements that match _Val _Adl_verify_range(_First, _Last); if constexpr (_Is_vb_iterator<_InIt> && is_same_v<_Ty, bool>) { return _Count_vbool(_First, _Last, _Val); } else { auto _UFirst = _Get_unwrapped(_First); const auto _ULast = _Get_unwrapped(_Last); #if _USE_STD_VECTOR_ALGORITHMS if constexpr (_Vector_alg_in_find_is_safe<_InIt, _Ty>) { #if _HAS_CXX20 if (!_STD is_constant_evaluated()) #endif // _HAS_CXX20 { if (!_Within_limits(_UFirst, _Val)) {
The performance bug 🐞 is that std::count is templated on a wrapped iterator _InIt (in my example, a vector iterator), and we check _Vector_alg_in_find_is_safe<_InIt, _Ty> with that wrapped iterator. However, the following code correctly uses the unwrapped iterator _UFirst (with both _Within_limits and __std_count_trivial).
The 14/17 versus 20/23 dependency is because _Vector_alg_in_find_is_safe uses _Iterator_is_contiguous:
Lines 5591 to 5593 in e28f956
| template <class _Iter, class _Ty, class _Elem = _Iter_value_t<_Iter>> | |
| _INLINE_VAR constexpr bool _Vector_alg_in_find_is_safe = // Can we activate the vector algorithms for find/count? | |
| _Iterator_is_contiguous<_Iter> // The iterator must be contiguous so we can get raw pointers. |
In C++20/23 mode, it's powered by concepts and can detect arbitrary contiguous iterators, compensating for the perf bug:
Lines 4226 to 4229 in e28f956
| #ifdef __cpp_lib_concepts | |
| // When concepts are available, we can detect arbitrary contiguous iterators. | |
| template <class _Iter> | |
| inline constexpr bool _Iterator_is_contiguous = contiguous_iterator<_Iter>; |
But in C++14/17 mode, it can only detect pointers, so the perf bug is revealed:
Lines 4236 to 4239 in e28f956
| #else // ^^^ defined(__cpp_lib_concepts) ^^^ / vvv !defined(__cpp_lib_concepts) vvv | |
| // When concepts aren't available, we can detect pointers. (Iterators should be unwrapped before using this.) | |
| template <class _Iter> | |
| _INLINE_VAR constexpr bool _Iterator_is_contiguous = is_pointer_v<_Iter>; |
As the comment says: "(Iterators should be unwrapped before using this.)"