Use division by 100 in to_string for integers#5691
Use division by 100 in to_string for integers#5691StephanTLavavej merged 29 commits intomicrosoft:mainfrom
to_string for integers#5691Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
672f1db to
7ea6121
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
The benchmark shows a minor speedup: Benchmark | 9 digits | 8 digits | Speedup for 8 digits ----------------------------------------------------------|----------|----------|--------------------- `internal_integer_to_buff<char, uint64_t, 20.0, 12.0>` | 9.46 ns | 8.47 ns | 1.12 `internal_integer_to_buff<wchar_t, uint64_t, 20.0, 12.0>` | 8.40 ns | 8.13 ns | 1.03 `integer_to_string<uint64_t, 20.0, 12.0>` | 19.7 ns | 18.4 ns | 1.07 `integer_to_string<int64_t, 20.0, 12.0>` | 20.9 ns | 19.6 ns | 1.07
This helps a little more: Benchmark | 4 loop | special | speedup ----------------------------------------------------------|---------|----------|-------- `internal_integer_to_buff<char, uint64_t, 20.0, 12.0>` | 8.49 ns | 7.97 ns | 1.07 `internal_integer_to_buff<wchar_t, uint64_t, 20.0, 12.0>` | 8.15 ns | 7.79 ns | 1.05 `integer_to_string<uint64_t, 20.0, 12.0>` | 18.5 ns | 18.0 ns | 1.03 `integer_to_string<int64_t, 20.0, 12.0>` | 19.6 ns | 19.3 ns | 1.02
|
Thanks! I pushed major changes, please double-check. |
|
Looks good. 3708bbf might change codegen, but I assume it is fine, as you have benchmarked (at least for |
|
Yeah, I checked and the differences between your codegen and mine appeared to be lost in the noise. |
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
|
I've pushed commits to fix a bug found by internal AI code review. The |
| constexpr auto max_val = static_cast<double>(numeric_limits<T>::max()); | ||
| if (dbl <= max_val) { |
There was a problem hiding this comment.
As pointed out by @statementreply on Discord, for uint_64 this rounds to dbl < max_val.
There was a problem hiding this comment.
Thanks, I can fix this in a followup.
💯 💯 💯 |
⚙️ Optimization
Resolves #3857. Divides by 100 instead of by 10, as proposed.
There's similar place in
to_chars, skipped for now.🏁 Benchmark
Large and small numbers., like numbers naturally seen when counting things.
Generated via log-normal distribution, as @statementreply suggested.
Picked some arbitrary parameters, to approximately fit in the integer ranges.
Benchmarked also
std::_UIntegral_to_buffseparetely as well to see how much the optimization helps on its own, avoiding #1024 limitation.⏱️ Benchmark results
i5-1235U P cores:
i5-1235U E cores:
🥉 Results interpretation
I'm not even sure if this is worth doing.
Allocating the string and copying the result there takes roughly half of the time, so the effect of micro-optimization in digits generation is small.
However, the internal function seem to show improvement. This looks like an indication that #1024 improvement would help here. It could be that the performance is limited due to failed store-to-load forwarding, as individual character stores are followed by bulk memcpy; in this case, the improvement may be somewhat negated by a longer stall.