Skip to content

Use division by 100 in to_string for integers#5691

Merged
StephanTLavavej merged 29 commits intomicrosoft:mainfrom
AlexGuteniev:integers
Jan 8, 2026
Merged

Use division by 100 in to_string for integers#5691
StephanTLavavej merged 29 commits intomicrosoft:mainfrom
AlexGuteniev:integers

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Aug 23, 2025

⚙️ Optimization

Resolves #3857. Divides by 100 instead of by 10, as proposed.

There's similar place in to_chars, skipped for now.

🏁 Benchmark

Large and small numbers., like numbers naturally seen when counting things.
Generated via log-normal distribution, as @statementreply suggested.
Picked some arbitrary parameters, to approximately fit in the integer ranges.

Benchmarked also std::_UIntegral_to_buff separetely as well to see how much the optimization helps on its own, avoiding #1024 limitation.

⏱️ Benchmark results

i5-1235U P cores:

Benchmark Before After Speedup
internal_integer_to_buff<uint8_t, 2.5, 1.5> 2.30 ns 3.42 ns 0.67
internal_integer_to_buff<uint16_t, 5.0, 3.0> 3.70 ns 2.64 ns 1.40
internal_integer_to_buff<uint32_t, 10.0, 6.0> 4.69 ns 2.86 ns 1.64
internal_integer_to_buff<uint64_t, 20.0, 12.0> 10.5 ns 5.29 ns 1.98
integer_to_string<uint8_t, 2.5, 1.5> 5.87 ns 5.44 ns 1.08
integer_to_string<uint16_t, 5.0, 3.0> 6.79 ns 6.32 ns 1.07
integer_to_string<uint32_t, 10.0, 6.0> 8.11 ns 7.28 ns 1.11
integer_to_string<uint64_t, 20.0, 12.0> 14.5 ns 14.2 ns 1.02
integer_to_string<int8_t, 2.5, 1.5> 6.64 ns 5.96 ns 1.11
integer_to_string<int16_t, 5.0, 3.0> 6.23 ns 5.88 ns 1.06
integer_to_string<int32_t, 10.0, 6.0> 7.58 ns 6.33 ns 1.20
integer_to_string<int64_t, 20.0, 12.0> 17.8 ns 18.8 ns 0.95

i5-1235U E cores:

Benchmark Before After Speedup
internal_integer_to_buff<uint8_t, 2.5, 1.5> 4.14 ns 4.79 ns 0.86
internal_integer_to_buff<uint16_t, 5.0, 3.0> 8.08 ns 4.76 ns 1.70
internal_integer_to_buff<uint32_t, 10.0, 6.0> 11.4 ns 5.41 ns 2.11
internal_integer_to_buff<uint64_t, 20.0, 12.0> 23.8 ns 13.9 ns 1.71
integer_to_string<uint8_t, 2.5, 1.5> 17.2 ns 12.7 ns 1.35
integer_to_string<uint16_t, 5.0, 3.0> 17.1 ns 13.6 ns 1.26
integer_to_string<uint32_t, 10.0, 6.0> 18.3 ns 14.0 ns 1.31
integer_to_string<uint64_t, 20.0, 12.0> 36.6 ns 29.4 ns 1.24
integer_to_string<int8_t, 2.5, 1.5> 17.8 ns 12.0 ns 1.48
integer_to_string<int16_t, 5.0, 3.0> 20.0 ns 13.4 ns 1.49
integer_to_string<int32_t, 10.0, 6.0> 21.5 ns 15.1 ns 1.42
integer_to_string<int64_t, 20.0, 12.0> 39.7 ns 35.0 ns 1.13

🥉 Results interpretation

I'm not even sure if this is worth doing.

Allocating the string and copying the result there takes roughly half of the time, so the effect of micro-optimization in digits generation is small.

However, the internal function seem to show improvement. This looks like an indication that #1024 improvement would help here. It could be that the performance is limited due to failed store-to-load forwarding, as individual character stores are followed by bulk memcpy; in this case, the improvement may be somewhat negated by a longer stall.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner August 23, 2025 19:42
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Aug 23, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster decision needed We need to choose something before working on this labels Aug 24, 2025
@StephanTLavavej StephanTLavavej self-assigned this Aug 24, 2025
@StephanTLavavej

This comment was marked as resolved.

@azure-pipelines

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as outdated.

@AlexGuteniev AlexGuteniev force-pushed the integers branch 2 times, most recently from 672f1db to 7ea6121 Compare August 25, 2025 13:09
@StephanTLavavej StephanTLavavej removed their assignment Nov 14, 2025
@StephanTLavavej

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej removed the decision needed We need to choose something before working on this label Dec 7, 2025
AlexGuteniev and others added 11 commits December 19, 2025 18:16
The benchmark shows a minor speedup:

Benchmark                                                 | 9 digits | 8 digits | Speedup for 8 digits
----------------------------------------------------------|----------|----------|---------------------
`internal_integer_to_buff<char, uint64_t, 20.0, 12.0>`    |  9.46 ns |  8.47 ns | 1.12
`internal_integer_to_buff<wchar_t, uint64_t, 20.0, 12.0>` |  8.40 ns |  8.13 ns | 1.03
`integer_to_string<uint64_t, 20.0, 12.0>`                 |  19.7 ns |  18.4 ns | 1.07
`integer_to_string<int64_t, 20.0, 12.0>`                  |  20.9 ns |  19.6 ns | 1.07
This helps a little more:

Benchmark                                                 |  4 loop |  special | speedup
----------------------------------------------------------|---------|----------|--------
`internal_integer_to_buff<char, uint64_t, 20.0, 12.0>`    | 8.49 ns |  7.97 ns | 1.07
`internal_integer_to_buff<wchar_t, uint64_t, 20.0, 12.0>` | 8.15 ns |  7.79 ns | 1.05
`integer_to_string<uint64_t, 20.0, 12.0>`                 | 18.5 ns |  18.0 ns | 1.03
`integer_to_string<int64_t, 20.0, 12.0>`                  | 19.6 ns |  19.3 ns | 1.02
@StephanTLavavej
Copy link
Member

Thanks! I pushed major changes, please double-check.

@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jan 7, 2026
@AlexGuteniev
Copy link
Contributor Author

Looks good. 3708bbf might change codegen, but I assume it is fine, as you have benchmarked (at least for wchar_t and on x86)

@StephanTLavavej
Copy link
Member

Yeah, I checked and the differences between your codegen and mine appeared to be lost in the noise.

@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Jan 7, 2026
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej
Copy link
Member

I've pushed commits to fix a bug found by internal AI code review. The lognormal_distribution was generating out-of-range values up to 4% of the time, triggering UB when we static_cast from double to integer. I've replaced this with a retry loop, so that we preserve the distribution's behavior, without generating the maximum value unusually often (which would happen if we just used std::clamp()). This carefully uses floor() to actually generate the maximum value some of the time. Needless to say, my code was written without any AI assistance.

Comment on lines +25 to +26
constexpr auto max_val = static_cast<double>(numeric_limits<T>::max());
if (dbl <= max_val) {
Copy link
Contributor

@MattStephanson MattStephanson Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by @statementreply on Discord, for uint_64 this rounds to $2^{64}$, so to be rigorous you would need to either special-case that or make the test dbl < max_val.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I can fix this in a followup.

@StephanTLavavej StephanTLavavej merged commit 641410d into microsoft:main Jan 8, 2026
45 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Jan 8, 2026
@StephanTLavavej
Copy link
Member

💯 💯 💯

@AlexGuteniev AlexGuteniev deleted the integers branch January 8, 2026 16:56
AlexGuteniev added a commit to AlexGuteniev/STL that referenced this pull request Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

<string>: to_string() for integers could be faster

3 participants