Skip to content

Conversation

@pauldreik
Copy link
Collaborator

@pauldreik pauldreik commented Dec 3, 2025

This makes most of simdutf constexpr. it relies on the if consteval feature from C++23 and is used within the span api.

See issue #865

Overall description

The scalar implementation has been slightly modified to be able to run as constexpr. The scalar implementation is already tested and proven. It is also differentially fuzzed against the other implementations. It was therefore better to "promote it" to constexpr instead of writing a new, untested implementation.

It is best described through an example. Here is an example of a changed function which is part of the public api, where dispatch is being made at compile time to either select the constexpr version or the normal, performant code:

  #if SIMDUTF_SPAN
constexpr bool
validate_utf8(const detail::input_span_of_byte_like auto &input) noexcept {
    #if SIMDUTF_CPLUSPLUS23
  if consteval {
   // at compile time - performance is not that important
    return scalar::utf8::validate(
        detail::constexpr_cast_ptr<uint8_t>(input.data()), input.size());
  } else
    #endif
  {
    // at runtime, same blazing performance as earlier!
    return validate_utf8(reinterpret_cast<const char *>(input.data()),
                         input.size());
  }
}

It can be seen that the normal, simd accelerated path works just as before. At constexpr time, the scalar implementation is used. The scalar implementation looks like this:

template <class BytePtr>
simdutf_constexpr23 simdutf_warn_unused bool validate(BytePtr data,
                                                      size_t len) noexcept {
  while (pos < len) {
    uint64_t next_pos;
#if SIMDUTF_CPLUSPLUS23
    if !consteval
#endif
    { // check if the next 16 bytes are ascii (for performance)
      // (use of memcpy, forbidden during constant evaluation)
     }
   // normal scalar code
 }

so the scalar implementation is mostly behaving as before, but if evaluated at compile time problematic optimizations involving memcpy and reinterpret_cast are avoided.

Header reorganization

Making the scalar implementations available in the header required quite a bit of code/file reorganization. It makes the header bigger, but it does not seem to hurt compilation time.

Compile time performance

As measured by the added scripts/compilation_benchmark.py which does a release build and median of three:

  • before, C++11: median compile is 20.326s
  • before, C++23: median compile is 22.828s
  • now, C++11: median compile is 18.538s
  • now, C++23: median compile is 23.471s

so there is a slight increase of compilation time.

Runtime performance

The runtime is not expected to change at all, the increased amount of dispatch should be optimized away already at compile time.

What is not constexpr?

  • The autodetect functions are still not constexpr. I ran out of christmas vacation and those were not so easy to reorganize.
  • The atomic base64 functions do not make sense to make constexpr
  • The dynamic dispatch (implementation selection) is only used at runtime.

@lemire
Copy link
Member

lemire commented Dec 4, 2025

This is great, brilliant even !

Yes, having distinct implementations is not great (makes testing more difficult and so forth).

Also, I do not know how much changes are needed for the scalar implementation to be able to run at constexpr time (apart from the code organization).

Hmmm... what do you think can be a blocker ? They are non allocating in general. So they should be straight out immediately easy to turn into constexpr. There are a few instances of memcpy, but we know how to handle this (bit_cast?).

The scalar implementation may however not be ideal, I presume compile time performance may benefit from a simpler implementation.

Fair point. But maybe we can narrow it down to a few functions that are too messy in the scalar namespace.

So maybe we could do something like this...

  1. Have scalar/constexpr in (public) header files, in some new namespace (like simdutf_constexpr or something).
  2. Most of them would be taken out of the scalar namespace, so we would empty partially the src/scalar functions, moving them to new headers.
  3. In specific cases, we would have new, simplier functions for the constexpr version. We would add few of those, so they would be easy to test.

@pauldreik pauldreik force-pushed the simdutf_is_constexpr branch from 8b93956 to a5f203d Compare December 4, 2025 16:39
@pauldreik
Copy link
Collaborator Author

pauldreik commented Dec 4, 2025

I made partial progress:

  • I moved the src/scalar directory to include/simdutf/scalar and that seems to keep existing things working with very minor changes
  • I invoked the simdutf::scalar::utf8::validate() function from within the ìf consteval` block in simdutf.h. this is where the problems started.
  • because simdutf::scalar::utf8::validate uses reinterpret_cast internally (fully legal in non-constexpr code), it is not a constant expression and fails to work during constexpr evaluation. see https://eel.is/c++draft/expr.const#10.15

I was able to work around this by making a std::vector with a copy of the data:

 const uint8_t *data{nullptr};
  std::vector<uint8_t> tmp(buf, buf + len);
  if consteval {
    data = tmp.data();
  } else {
    data = reinterpret_cast<const uint8_t *>(buf);
  }

this kind of works but I am going to do some more experimentation.

@lemire
Copy link
Member

lemire commented Dec 5, 2025

@pauldreik Are you sure ?

https://godbolt.org/z/EY45GWY6o

@lemire
Copy link
Member

lemire commented Dec 5, 2025

Gemini say:

Reference in N4950

Section 7.7, "Constant expressions," paragraph 5, item (5.15) states that an expression E is not a core constant expression if the evaluation of a potential result involves a:

"(5.15) a reinterpret_cast (7.6.1.10);"

@pauldreik
Copy link
Collaborator Author

@pauldreik Are you sure ?

https://godbolt.org/z/EY45GWY6o

yes, you have to also use the function to see it. https://godbolt.org/z/7GeKb7r76

@lemire
Copy link
Member

lemire commented Dec 5, 2025

@pauldreik Using AI and a bit C++ knowledge, I wrote a wrapper that seems to work:

https://godbolt.org/z/n86945sa4

The idea goes like this:

template <typename to, typename from>
    requires (sizeof(to) == sizeof(from))
struct constexpr_ptr {
    from* p;

    constexpr constexpr_ptr() noexcept = default;
    constexpr constexpr_ptr(from* ptr) noexcept : p(ptr) {}

    constexpr to operator*() const noexcept { return std::bit_cast<to>(*p); }

    constexpr constexpr_ptr& operator++() noexcept { ++p; return *this; }
    constexpr constexpr_ptr operator++(int) noexcept { auto old = *this; ++p; return old; }

    constexpr constexpr_ptr& operator--() noexcept { --p; return *this; }
    constexpr constexpr_ptr operator--(int) noexcept { auto old = *this; --p; return old; }

    constexpr constexpr_ptr& operator+=(std::ptrdiff_t n) noexcept { p += n; return *this; }
    constexpr constexpr_ptr& operator-=(std::ptrdiff_t n) noexcept { p -= n; return *this; }

    constexpr constexpr_ptr operator+(std::ptrdiff_t n) const noexcept { return p + n; }
    constexpr constexpr_ptr operator-(std::ptrdiff_t n) const noexcept { return p - n; }

    constexpr std::ptrdiff_t operator-(const constexpr_ptr& o) const noexcept { return p - o.p; }

    constexpr to operator[](std::ptrdiff_t n) const noexcept { return std::bit_cast<to>(*(p + n)); }

    constexpr auto operator<=>(const constexpr_ptr&) const noexcept = default;
};

template <typename to, typename from>
constexpr constexpr_ptr<to, from> constexpr_cast_ptr(from* p) noexcept {
    return p;
}

Then using it like so:

constexpr char g(const uint8_t* c) {
    return constexpr_cast_ptr<char>(c)[0];
}

This seems to work.

I wonder why this is not in the std library?

(When I indicate that it is made by AI, it is understood that it is not to be trusted. Just a demo.)

@lemire
Copy link
Member

lemire commented Dec 5, 2025

@pauldreik I think that such an abstraction should be zero (runtime) cost, so we could use it for all out casts.

@pauldreik pauldreik force-pushed the simdutf_is_constexpr branch from a5f203d to 7ee5f7e Compare December 6, 2025 08:26
@pauldreik
Copy link
Collaborator Author

@pauldreik Using AI and a bit C++ knowledge, I wrote a wrapper that seems to work:

https://godbolt.org/z/n86945sa4

The idea goes like this:

template <typename to, typename from>
    requires (sizeof(to) == sizeof(from))
struct constexpr_ptr {
    from* p;

    constexpr constexpr_ptr() noexcept = default;
    constexpr constexpr_ptr(from* ptr) noexcept : p(ptr) {}

    constexpr to operator*() const noexcept { return std::bit_cast<to>(*p); }

    constexpr constexpr_ptr& operator++() noexcept { ++p; return *this; }
    constexpr constexpr_ptr operator++(int) noexcept { auto old = *this; ++p; return old; }

    constexpr constexpr_ptr& operator--() noexcept { --p; return *this; }
    constexpr constexpr_ptr operator--(int) noexcept { auto old = *this; --p; return old; }

    constexpr constexpr_ptr& operator+=(std::ptrdiff_t n) noexcept { p += n; return *this; }
    constexpr constexpr_ptr& operator-=(std::ptrdiff_t n) noexcept { p -= n; return *this; }

    constexpr constexpr_ptr operator+(std::ptrdiff_t n) const noexcept { return p + n; }
    constexpr constexpr_ptr operator-(std::ptrdiff_t n) const noexcept { return p - n; }

    constexpr std::ptrdiff_t operator-(const constexpr_ptr& o) const noexcept { return p - o.p; }

    constexpr to operator[](std::ptrdiff_t n) const noexcept { return std::bit_cast<to>(*(p + n)); }

    constexpr auto operator<=>(const constexpr_ptr&) const noexcept = default;
};

template <typename to, typename from>
constexpr constexpr_ptr<to, from> constexpr_cast_ptr(from* p) noexcept {
    return p;
}

Then using it like so:

constexpr char g(const uint8_t* c) {
    return constexpr_cast_ptr<char>(c)[0];
}

This seems to work.

I wonder why this is not in the std library?

(When I indicate that it is made by AI, it is understood that it is not to be trusted. Just a demo.)

interesting! I will give it a try.

@pauldreik pauldreik force-pushed the simdutf_is_constexpr branch 6 times, most recently from cd0d06e to dd18395 Compare December 7, 2025 21:42
@pauldreik
Copy link
Collaborator Author

I have now implemented constexpr support for a few randomly selected functions and I think it definitely is doable. I also think the changes needed are quite reasonable, and the runtime performance should not be affected at all.

It would be nice to hear from someone who would actually benefit from this feature!

@pauldreik
Copy link
Collaborator Author

I am working on this (slowly). It goes in the right direction!

@pauldreik pauldreik force-pushed the simdutf_is_constexpr branch 9 times, most recently from a20c0de to 9fae3ef Compare December 18, 2025 20:35
@pauldreik pauldreik force-pushed the simdutf_is_constexpr branch from 2efaa98 to edcace4 Compare January 1, 2026 09:18
@pauldreik pauldreik merged commit 448e32c into master Jan 1, 2026
105 checks passed
@pauldreik pauldreik deleted the simdutf_is_constexpr branch January 1, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants