Remove String IsASCII flag from the object header#24400
Remove String IsASCII flag from the object header#24400EgorBo wants to merge 1 commit intodotnet:masterfrom
Conversation
|
@EgorBo what is the motivation for this change? |
|
@danmosemsft dotnet/corefx#37358 (comment) and #24373 so while this change can make some benchmarks a bit slower (e.g. if they repeatedly call Replace for the same string over and over - in the real world strings tend to be more unique) it also makes code more portable (as you can see I removed #if CORECLR from the code that is "shared" with other runtimes including Mono). |
|
Also as I said I am not sure it should be removed at all, I did it mostly as an investigation 🙂 and if we want it to stay I wonder why we don't use it on Windows - on Windows it uses pinvoke into kernel32 FastStringOrdinal (https://docs.microsoft.com/en-us/windows/desktop/api/libloaderapi/nf-libloaderapi-findstringordinal) so currently this logic is useless on Windows if I am not mistaken |
|
|
||
| #if CORECLR | ||
| if (_isAsciiEqualityOrdinal && CanUseAsciiOrdinalForOptions(options) && source.IsFastSort() && target.IsFastSort()) | ||
| if (_isAsciiEqualityOrdinal && CanUseAsciiOrdinalForOptions(options) && source.IsAscii() && target.IsAscii()) |
There was a problem hiding this comment.
I guess this code can be optimized - we can omit IsAscii and just run IndexOf and figure out if it's not "IsAscii" on the fly.
There was a problem hiding this comment.
Looks Windows implementation don't have this optimization. we can try just remove the whole optimization and compare the perf with Windows.
checking IsAscii only here is incorrect in Linux as you can run into issues when having characters like hyphens.
In reply to: 281011011 [](ancestors = 281011011)
|
I'm mostly worried about removing the IndexOf optimization resulting in performance regressions. If the flag is to be kept then there's a lot of places where it can be propagated in various operations (eg. Substring of FastSort string is guaranteed to be FastSort). |
And make some benchmarks faster. This change needs perf numbers.
Correct. Whether or not this is worth doing should be supported by perf measurements. This optimization was used when removing use of this flag in other places, but not all of them.
The culture specific operations are not used that frequently, and they are a lot slower than a ordinal operations by design. I do not think it is a good idea to spread peanut butter overhead and complexity everywhere for something that is slow already. |
|
We should try to remove the ASCII case optimization and have the perf numbers. Using IsAscii only is wrong especially when string has some characters like hyphen or single quotes. |
|
Superseded by #26759 |
Not sure if this PR should be merged but at least it's a nice diff to understand how this feature works.
I even made a schema (click to zoom in):
This control bit in the object header signals if the string is checked that it doesn't contain any high char (>0x80), see this comment from syncblk.cpp:
Currently this flag works only on .Unix systems and can be checked via
String.IsFastSortinternal call used by internal IndexOf for the invariant cultures and for exampleString.Replace(string,string,Invariant)uses it.so when we do:
if we call
Replacefor the first time it scansstrfor the "high chars" and sets the bit to 0x80 (no high chars) or to 0x40 (has high chars) - see IsFastSort (IsFastSort is called from here)so next time we call
Replacefor this string the call will be a little bit faster (sinceIsFastSortwill be O(1) instead of O(N)).BTW, if we want to keep this flag why string literals don't have it set by default?
BTW2, if a string is checked that it doesn't contain any high char and has this flag set to 0x80 and we then modify the string via e.g.:
The
IsAsciiflag will remain and the comparison logic will be broken (I know, modifying strings this way is not a good idea but anyway).cc: @jkotas @filipnavara