Skip to content

<algorithm>: count() is missing vectorization opportunities in C++14/17 #3245

@StephanTLavavej

Description

@StephanTLavavej

While investigating #3244, I found something curious - count() was affected by that correctness bug only in C++20/23 mode when vector iterators are used (see https://godbolt.org/z/dcTscbeTn ). This is because there's a performance bug in C++14/17 mode.

There are 4 uses of _Within_limits in the STL (affected by said correctness bug, which is irrelevant for the purposes of this performance bug). Each use is preceded by checking _Vector_alg_in_find_is_safe.

  • 💚 ranges::_Count_fn::_Count_unchecked:

    STL/stl/inc/algorithm

    Lines 442 to 453 in e28f956

    template <class _It, class _Se, class _Ty, class _Pj>
    _NODISCARD static constexpr iter_difference_t<_It> _Count_unchecked(
    _It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj) {
    _STL_INTERNAL_STATIC_ASSERT(input_iterator<_It>);
    _STL_INTERNAL_STATIC_ASSERT(sentinel_for<_Se, _It>);
    _STL_INTERNAL_STATIC_ASSERT(indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>);
    #if _USE_STD_VECTOR_ALGORITHMS
    if constexpr (is_same_v<_Pj, identity> && _Vector_alg_in_find_is_safe<_It, _Ty>
    && sized_sentinel_for<_Se, _It>) {
    if (!_STD is_constant_evaluated()) {
    if (!_Within_limits(_First, _Val)) {
  • 💚 std::_Find_unchecked:

    STL/stl/inc/xutility

    Lines 5603 to 5612 in e28f956

    template <class _InIt, class _Ty>
    _NODISCARD _CONSTEXPR20 _InIt _Find_unchecked(_InIt _First, const _InIt _Last, const _Ty& _Val) {
    // find first matching _Val; choose optimization
    // activate optimization for contiguous iterators to most scalar types (possibly const-qualified)
    if constexpr (_Vector_alg_in_find_is_safe<_InIt, _Ty>) {
    #if _HAS_CXX20
    if (!_STD is_constant_evaluated())
    #endif // _HAS_CXX20
    {
    if (!_Within_limits(_First, _Val)) {
  • 💚 ranges::_Find_unchecked:

    STL/stl/inc/xutility

    Lines 5674 to 5681 in e28f956

    template <input_iterator _It, sentinel_for<_It> _Se, class _Ty, class _Pj = identity>
    requires indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>
    _NODISCARD constexpr _It _Find_unchecked(_It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj = {}) {
    constexpr bool _Is_sized = sized_sentinel_for<_Se, _It>;
    if constexpr (_Vector_alg_in_find_is_safe<_It, _Ty> && _Sized_or_unreachable_sentinel_for<_Se, _It>
    && same_as<_Pj, identity>) {
    if (!_STD is_constant_evaluated()) {
    if (!_Within_limits(_First, _Val)) {
  • 🐞 std::count:

    STL/stl/inc/xutility

    Lines 5779 to 5795 in e28f956

    _EXPORT_STD template <class _InIt, class _Ty>
    _NODISCARD _CONSTEXPR20 _Iter_diff_t<_InIt> count(const _InIt _First, const _InIt _Last, const _Ty& _Val) {
    // count elements that match _Val
    _Adl_verify_range(_First, _Last);
    if constexpr (_Is_vb_iterator<_InIt> && is_same_v<_Ty, bool>) {
    return _Count_vbool(_First, _Last, _Val);
    } else {
    auto _UFirst = _Get_unwrapped(_First);
    const auto _ULast = _Get_unwrapped(_Last);
    #if _USE_STD_VECTOR_ALGORITHMS
    if constexpr (_Vector_alg_in_find_is_safe<_InIt, _Ty>) {
    #if _HAS_CXX20
    if (!_STD is_constant_evaluated())
    #endif // _HAS_CXX20
    {
    if (!_Within_limits(_UFirst, _Val)) {

The performance bug 🐞 is that std::count is templated on a wrapped iterator _InIt (in my example, a vector iterator), and we check _Vector_alg_in_find_is_safe<_InIt, _Ty> with that wrapped iterator. However, the following code correctly uses the unwrapped iterator _UFirst (with both _Within_limits and __std_count_trivial).

The 14/17 versus 20/23 dependency is because _Vector_alg_in_find_is_safe uses _Iterator_is_contiguous:

STL/stl/inc/xutility

Lines 5591 to 5593 in e28f956

template <class _Iter, class _Ty, class _Elem = _Iter_value_t<_Iter>>
_INLINE_VAR constexpr bool _Vector_alg_in_find_is_safe = // Can we activate the vector algorithms for find/count?
_Iterator_is_contiguous<_Iter> // The iterator must be contiguous so we can get raw pointers.

In C++20/23 mode, it's powered by concepts and can detect arbitrary contiguous iterators, compensating for the perf bug:

STL/stl/inc/xutility

Lines 4226 to 4229 in e28f956

#ifdef __cpp_lib_concepts
// When concepts are available, we can detect arbitrary contiguous iterators.
template <class _Iter>
inline constexpr bool _Iterator_is_contiguous = contiguous_iterator<_Iter>;

But in C++14/17 mode, it can only detect pointers, so the perf bug is revealed:

STL/stl/inc/xutility

Lines 4236 to 4239 in e28f956

#else // ^^^ defined(__cpp_lib_concepts) ^^^ / vvv !defined(__cpp_lib_concepts) vvv
// When concepts aren't available, we can detect pointers. (Iterators should be unwrapped before using this.)
template <class _Iter>
_INLINE_VAR constexpr bool _Iterator_is_contiguous = is_pointer_v<_Iter>;

As the comment says: "(Iterators should be unwrapped before using this.)"

Metadata

Metadata

Assignees

No one assigned

    Labels

    fixedSomething works now, yay!performanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions