Reduce allocations in string.Normalize#34774
Conversation
|
Tagging @tarekgh, @safern as an area owner |
src/libraries/Common/src/Interop/Unix/System.Globalization.Native/Interop.Normalization.cs
Outdated
Show resolved
Hide resolved
This one is 4096 (bytes, not chars) I assume we should pick a number that is suitable everywhere since we generally do not know how much stack we have - and presumably it's the same number on all platforms for simplicity. @stephentoub are you comfortable with 4096? 1024? |
|
We've generally tried to not go above 1K. 4K feels like a lot, especially on macOS where the default stack size is lower than other platforms. In some cases there's a natural upper-bound based on the scenario, and we've been a bit more liberal in those cases. In others where it's arbitrary, we've talked about just standardizing on a size that we use everywhere, e.g. 256 bytes or 128 chars. I would still like us to go through and do such a standardization where appropriate, we've just not prioritized it. |
src/libraries/Common/src/Interop/Unix/System.Globalization.Native/Interop.Normalization.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Globalization.Native/Interop.Normalization.cs
Outdated
Show resolved
Hide resolved
It should be a new API that will do the right thing for the given context and platform without your worrying about it: #25423 |
src/libraries/Common/src/Interop/Unix/System.Globalization.Native/Interop.Normalization.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Globalization.Native/Interop.Normalization.cs
Outdated
Show resolved
Hide resolved
| if (toReturn != null) | ||
| { | ||
| return new string(buf, 0, realLen); | ||
| ArrayPool<char>.Shared.Return(toReturn); |
There was a problem hiding this comment.
Heads up: we generally don't return buffers to the shared pool inside finally blocks. The sole exception is where we're in 100% control of every single code path that might be invoked inside the preceding try.
There was a problem hiding this comment.
Heads up: we generally don't return buffers to the shared pool inside finally blocks. The sole exception is where we're in 100% control of every single code path that might be invoked inside the preceding try.
We're inconsistent on this, e.g.
runtime/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs
Lines 114 to 130 in 26b6e4e
There was a problem hiding this comment.
I know, but we should try to get it under check for new code (like this). The GitHub UI is misbehaving for me so I can't see the rest of the code. So this usage might not be problematic at all but somebody should double check.
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Unix.cs
Outdated
Show resolved
Hide resolved
|
Added a check to return the original string if normalization didn't change anything (which should be the common-case as well) |
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/Normalization.Unix.cs
Outdated
Show resolved
Hide resolved
| throw new ArgumentException(SR.Argument_InvalidNormalizationForm, nameof(normalizationForm)); | ||
| realLength = Interop.Normaliz.NormalizeString(normalizationForm, pInput, strInput.Length, pDest, buffer.Length); | ||
| } | ||
| int lastError = Marshal.GetLastWin32Error(); |
There was a problem hiding this comment.
Is this valid even if realLength > 0?
There was a problem hiding this comment.
(I see the above comment now)
stephentoub
left a comment
There was a problem hiding this comment.
One question about comparing to Interop.BOOL.TRUE. Otherwise LGTM.
While inherently an allocatey API (
string Normalize(string)), we can avoid allocating the temporary char[] buffer for the P/Invoke.These char[] allocations account for ~2% of allocated bytes when using HttpClient with a non-ascii Uri in my simple test. The allocated result (even if unchanged) another ~1.5%.
Is there a common stackalloc threshold we should agree upon across runtime? Highest I've seen so far is 1024 chars. I used 512 here as that is what IdnMapping uses as well.