Skip to content

Optimize integral ToString#32413

Closed
ts2do wants to merge 7 commits intodotnet:masterfrom
ts2do:IntFormat
Closed

Optimize integral ToString#32413
ts2do wants to merge 7 commits intodotnet:masterfrom
ts2do:IntFormat

Conversation

@ts2do
Copy link
Contributor

@ts2do ts2do commented Feb 16, 2020

  • Make the fast path for Number.FormatXX & Number.TryFormatXX inlineable
  • Make parameterless ToString in SByte, Int16, Int32, and Int64 directly invoke the default formatting method
  • Consolidate Number.FormatX & Number.TryFormatX code

I'm including the benchmark results below for only Perf_Int32.ToString and Perf_Int32.ToStringHex, as I think they adequately demonstrate the impact of the changes.

Method value Mean Error StdDev Median Min Max Ratio MannWhitney(3%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
ToString -2147483648 34.125 ns 0.7468 ns 0.6985 ns 34.113 ns 33.247 ns 35.474 ns 0.70 Faster 0.02 0.0115 - - 48 B
ToString -2147483648 48.705 ns 0.7564 ns 0.6316 ns 48.588 ns 47.896 ns 50.102 ns 1.00 Base 0.00 0.0114 - - 48 B
ToStringHex -2147483648 35.181 ns 0.4066 ns 0.3804 ns 35.009 ns 34.728 ns 35.933 ns 0.99 Same 0.02 0.0095 - - 40 B
ToStringHex -2147483648 35.632 ns 0.6563 ns 0.6139 ns 35.581 ns 34.989 ns 37.150 ns 1.00 Base 0.00 0.0095 - - 40 B
ToString 4 5.201 ns 0.0570 ns 0.0476 ns 5.206 ns 5.135 ns 5.299 ns 0.29 Faster 0.00 - - - -
ToString 4 17.646 ns 0.1348 ns 0.1261 ns 17.583 ns 17.526 ns 17.893 ns 1.00 Base 0.00 - - - -
ToStringHex 4 27.612 ns 0.2678 ns 0.2505 ns 27.541 ns 27.353 ns 28.199 ns 0.97 Same 0.02 0.0056 - - 24 B
ToStringHex 4 28.421 ns 0.6172 ns 0.5471 ns 28.326 ns 27.846 ns 29.929 ns 1.00 Base 0.00 0.0057 - - 24 B
ToString 12345 15.465 ns 0.1663 ns 0.1556 ns 15.385 ns 15.315 ns 15.748 ns 0.52 Faster 0.01 0.0076 - - 32 B
ToString 12345 29.487 ns 0.2029 ns 0.1695 ns 29.410 ns 29.254 ns 29.862 ns 1.00 Base 0.00 0.0076 - - 32 B
ToStringHex 12345 31.218 ns 0.3360 ns 0.3142 ns 31.222 ns 30.850 ns 31.865 ns 1.00 Same 0.01 0.0076 - - 32 B
ToStringHex 12345 31.151 ns 0.2888 ns 0.2702 ns 31.029 ns 30.847 ns 31.697 ns 1.00 Base 0.00 0.0076 - - 32 B
ToString 2147483647 24.278 ns 0.4594 ns 0.3836 ns 24.226 ns 23.914 ns 25.299 ns 0.64 Faster 0.01 0.0114 - - 48 B
ToString 2147483647 38.018 ns 0.2509 ns 0.2224 ns 37.950 ns 37.712 ns 38.460 ns 1.00 Base 0.00 0.0114 - - 48 B
ToStringHex 2147483647 35.360 ns 0.2893 ns 0.2565 ns 35.377 ns 35.014 ns 35.849 ns 0.94 Faster 0.01 0.0095 - - 40 B
ToStringHex 2147483647 37.461 ns 0.1867 ns 0.1458 ns 37.461 ns 37.243 ns 37.690 ns 1.00 Base 0.00 0.0095 - - 40 B

- Make the fast path for Number.FormatXX & Number.TryFormatXX inlineable
- Make parameterless ToString in SByte, Int16, Int32, and Int64 directly invoke the default formatting method
- Consolidate Number.FormatX & Number.TryFormatX code
@jkotas jkotas added tenet-performance Performance related issue area-System.Runtime labels Feb 17, 2020

public static unsafe string FormatInt32(int value, ReadOnlySpan<char> format, IFormatProvider? provider)
/// <summary>
/// This interface lets us use shared implementations (FormatInt and TryFormatInt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true the code in the steady state is going to be almost identical. There is extra overhead in the size of runtime generics control structures. My back of the envelope calculation says that it is 10kB - 20kB (once all integer formatting routines are used).

@marek-safar How do you feel about the trade-offs in this implementation for mobile and web-assembly (footprint-sensitive targets)?

The technique that can be used to eliminate the generics overhead, while keeping the code shared, is generating the code using .tt template. We use it for SIMD types: https://github.com/dotnet/runtime/tree/master/src/libraries/System.Private.CoreLib/src/System/Numerics . It does not look as pretty, but gets the job done. Thoughts about whether it would be worth doing it in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formating code is already casing troubles when linking, introducing more interface abstraction will make the matter worse. It's not a large amount of code but it's quite likely we'll need to redo the formating code to be linker friendly.

Although the numbers look good for JIT we should not overoptimize for JIT and make e.g. AOT slower. What is the perf impact on AOT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formating code is already causing troubles when linking

Can you share details?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have precise data at hand as we are still working on the tooling but the fact is that the formatting code is a large chunk of any app (minimal amount code is linked out for formatting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I thought you were saying it's worse on core than it was previously on mono; that's not the case, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the exact data to confirm that but my guess would be that it should not differ significantly between mono and netcore (in a negative or positive way) at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Private.CoreLib.dll R2R image size is 9,203,200 before this change and 9,213,440 after this change (Win x64 release). This size delta is not the entire overhead - there is also extra private memory at runtime. It would be nice to reduce it.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (module the open question about whether the generics footprint is acceptable).

Thank you for doing this! The improvements for the default case are very nice.

@jkotas jkotas requested a review from stephentoub February 18, 2020 06:30
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the perf improvement and the code reduction are nice. I agree it looks good subject to a decision around the generic expansions.

@jkotas
Copy link
Member

jkotas commented Feb 18, 2020

There are really two independent changes in this PR. It may be a good idea to split this into 2 PRs to make forward progress:

  • Change that introduces the Int32ToDecStr and Int64ToDecStr methods and improves the throughput of the default case. This one is pretty straightforward.
  • Change that reduces source code-duplication. This one we may need to iterate on some.

@stephentoub
Copy link
Member

It may be a good idea to split this into 2 PRs to make forward progress

👍

@jkotas
Copy link
Member

jkotas commented Mar 25, 2020

We are fine with the code duplication here. Adding generics to save a couple hundred lines is not a good tradeoff for the runtime footprint as discussed above. Thank you for giving this a try!

@jkotas jkotas closed this Mar 25, 2020
@ts2do ts2do deleted the IntFormat branch April 1, 2020 00:33
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants