Conversation
|
CC @jkotas @stephentoub @ellismg @danmosemsft @weshaggard @Petermarcu @janvorli |
| { | ||
| if (CultureData.InvariantMode) | ||
| { | ||
| // work ordinal, then all charcaters are normalized in this mode |
There was a problem hiding this comment.
typo "charcaters " in a few places
src/mscorlib/src/System/CLRConfig.cs
Outdated
| internal class CLRConfig | ||
| { | ||
|
|
||
| #if CORECLR |
There was a problem hiding this comment.
This should not be needed - CORECLR is always defined.
There was a problem hiding this comment.
I have the #else case to avoid compiling the code for non CORECLR. I may be missing something here.
There was a problem hiding this comment.
Do you mean CoreRT? I do not think we will want any of CLRConfig.cs in CoreRT.
There was a problem hiding this comment.
Unless this code may one day be shared with CoreRT, it will never get compiled without CORECLR defined. It's defined in System.Private.CoreLib.csproj
| return false; | ||
| } | ||
|
|
||
| static int32_t g_icuPresent = 1; |
There was a problem hiding this comment.
It may not actually matter, but did you consider reversing this? i.e. starting it at 0 == not present, and then setting it to 1 only once ICU has been detected and all of the function pointers appropriately initialized?
|
@dotnet-bot help |
|
|
||
| protected override bool ReleaseHandle() | ||
| { | ||
| if (CultureData.InvariantMode) |
There was a problem hiding this comment.
In what situation would ReleaseHandle be called but find InvariantMode == true?
There was a problem hiding this comment.
This is not really needed now. I had the initial implementation which was keeping the sort handle but now I don't initialize it at all so it is not needed now. I'll convert this to assert instead
| public partial class CompareInfo | ||
| { | ||
| internal static unsafe int InvariantIndexOfOrdinal(string source, string value, int startIndex, int count, bool ignoreCase) | ||
| { |
There was a problem hiding this comment.
Presumably these arguments have all already been validated?
There was a problem hiding this comment.
yes. but I can add some asserts for it
| fixed (char* pSource = source) fixed (char* pValue = value) | ||
| { | ||
| char* pSrc = &pSource[startIndex]; | ||
| int index = InvariantFindStringOrdinal(pSrc, count, pValue, value.Length, true, ignoreCase); |
There was a problem hiding this comment.
Nit: can you use named arguments for things like the true here and false below? From context I'm assuming it's about first vs last, but it'd be easier to read if it was clear from the call site.
There was a problem hiding this comment.
sure. it is good idea anyway :-)
|
@dotnet-bot test Ubuntu x64 corefx_baseline |
| for (ctrValue = 1; ctrValue < valueCount; ctrValue++) | ||
| { | ||
| sourceChar = InvariantToUpper(source[ctrSource + ctrValue]); | ||
| valueChar = InvariantToUpper(value[ctrValue]); |
There was a problem hiding this comment.
We already have routines for doing this kind of thing (though I don't know if we have it for this exact thing), used for ASCII-only performance paths. Can we look at reusing where possible? That would help to a) ensure the fallback is using well-tested code paths, b) any optimizations to one would apply to the other, and c) for places where we don't currently have ASCII-only fast paths, we might be able to leverage this work to add some.
There was a problem hiding this comment.
Related to that, how are we thinking about the performance of this invariant mode? Presumably one of the scenarios for it is apps that don't want to carry the ICU data around because they care about being lean, in which case throughput may also be of importance?
There was a problem hiding this comment.
I have already used the existing current optimized implementation in the functionality like string comparisons. but for IndexOf, looks we don't have a version for that. in short, I used whatever exist and provided the missing. Please point me to any implementation I should used if I missed it.
For the performance I agree in general. for the areas we can optimize it more we can look at it and do whatever needed there. I think the current implementation is reasonable for this check in and we can follow up for the optimization later. is this ok?
| } | ||
| } | ||
|
|
||
| private static unsafe int InvariantFindStringOrdinal(char* source, int sourceCount, char* value, int valueCount, bool start, bool ignoreCase) |
There was a problem hiding this comment.
What does it mean that we use both "Invariant" and "Ordinal" in the name? Do they connote different things?
There was a problem hiding this comment.
I'll remove the word Ordinal to reduce the confusion. it was meant in the invariant mode we do ordinal comparison.
| { | ||
| if (CultureData.InvariantMode) | ||
| { | ||
| // work ordinal, then all charcaters are normalized in this mode |
There was a problem hiding this comment.
I do not understand what is work ordinal trying to say.
| { | ||
| public static bool IsNormalized(this string strInput, NormalizationForm normalizationForm) | ||
| { | ||
| if (CultureData.InvariantMode) |
There was a problem hiding this comment.
This check should be after ValidateArguments
|
|
||
| public static string Normalize(this string strInput, NormalizationForm normalizationForm) | ||
| { | ||
| if (CultureData.InvariantMode) |
|
|
||
| public static unsafe bool IsSortable(string text) | ||
| { | ||
| if (CultureData.InvariantMode) |
There was a problem hiding this comment.
This should be after the argument validation at least; or even better in the common worked method.
There was a problem hiding this comment.
will move it after the validation. there is no common working method. we have forked implementation for different OS's
|
|
||
| protected override bool ReleaseHandle() | ||
| { | ||
| if (CultureData.InvariantMode) |
There was a problem hiding this comment.
How can we get here for the invariant mode?
There was a problem hiding this comment.
Yes you are right. I have converted this check to assert instead.
| { | ||
| internal partial class CultureData | ||
| { | ||
| private static bool s_invariantGlobalizationMode = InitGlobalizationInvariantMode(); |
There was a problem hiding this comment.
This should be initialized explicitly during startup, not using implicit constructor. The implicit constructor as written will turn into two checks each time. 1. have we initialized already? 2. What the value is? So these checks are twice as expensive than what they need to be.
Or even better - we may want to cache the flag in a field in the objects with hot paths like TextInfo, and check the instance field there. Instance fields are faster to access than static fields.
There was a problem hiding this comment.
Since corelib is crossgen'd, I assume the normal tricks around static readonly bools becoming JIT-time consts wouldn't apply, right?
There was a problem hiding this comment.
Right, that's why it would be best to cache the bool in instance fields where possible - instance fields are the fastest to access.
There was a problem hiding this comment.
sorry I am not clear about that. I am using this bool static in many static methods. if I cache it in instance field how I am going to use it in such static methods? could you please clarify more what you mean here?
There was a problem hiding this comment.
Right, this would not work everywhere. But it would work for the string manipulation methods on TextInfo or CompareInfo that are probably the most impacted ones.
Feel free to ignore this suggestion if you do not think it is a good idea. It would be useful to pick a few of the leanest paths that the check is getting added to and measure perf impact of this change on them.
| pResult[j] = pSource[j]; | ||
| } | ||
|
|
||
| pResult[i] = (Char)(pSource[i] & ~0x20); |
There was a problem hiding this comment.
lower case char
and above also
| _sortHandle); | ||
| } | ||
|
|
||
| if (0 == ret) |
There was a problem hiding this comment.
I don't see anything in the coding guidelines about Yoda conditionals but I think we avoid them.
There was a problem hiding this comment.
I didn't change this code. also I am not sure why this bad to use?
There was a problem hiding this comment.
As you know in native code people often write constant == variable (aka Yoda conditionals) out of habit so that if they mistype constant = variable it will cause a compile error. In C# though there is no implicit conversion in most cases to boolean so eg for integers there is no danger of successfully compiling constant = variable. So in code bases I"ve worked on, typically it's written variable == constant for readability.
Just a nit and I don't see it in the style guidelines although it does seem largely followed here.
| if (!invariantEnabled && Interop.GlobalizationInterop.ICUPresent() == 0) | ||
| { | ||
| string message = "Couldn't find a valid ICU package installed on the system. " + | ||
| "Set the configuration flag System.Globalization.Invariant to true if want to run with no globalization support."; |
There was a problem hiding this comment.
if want to -> if you want to
|
|
||
| if (CultureData.InvariantMode) | ||
| { | ||
| while (length != 0) |
There was a problem hiding this comment.
Is the sole difference between this section and the one above it that this one doesn't limit itself to just ASCII chars, so that we don't call to the OS in invariant mode? Consider a) adding a comment, and b) moving this out into a separate function.
| if (CultureData.InvariantMode) | ||
| { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Rather than having these guards in every function, would it be possible to, for example, implement a special CultureInfo for invariant all in managed code with functions like you have, and then if ICU isn't present / the config switch is thrown, just return that invariant CulutureInfo from anything that can give back a CultureInfo? Maybe that's not feasible, but it seems like it could help to consolidate the logic better.
There was a problem hiding this comment.
I have spent some time thinking about how we can implement the whole thing here. I have looked at the alternatives. for CultureInfo we already doing almost your idea mentioned here. but for other classes like CompareInfo/TextInfo I preferred the current implementation. To make it like what you suggested, I'll have to subclass CompareInfo/TextInfo and override all methods and then return such objects to the callers of such properties (like CultureInfo.CompareInfo, CultureInfo.TextInfo) . I dismissed this idea because a) we are not returning CompareInfo as we used to but we are returning a subclass of CompareInfo b) I was trying to avoid the unknown perf cost c) this will restrict us if we decided to expose a constructors for such classes in the future d) we'll have to use most of the existing code in CompareInfo anyway. we can refactor the code for this one though.
| length--; | ||
| } | ||
|
|
||
| if (CultureData.InvariantMode) |
There was a problem hiding this comment.
is it necessary to duplicate this block? can't the condition above be changed from
while (length != 0 && (*a <= 0x80) && (*b <= 0x80))
to
while (length != 0 && ((*a <= 0x80) && (*b <= 0x80)) || CultureData.InvariantMode)
There was a problem hiding this comment.
I didn't do that for the perf reason. this will add extra check with every character.
There was a problem hiding this comment.
What about
byte comparator = CultureData.InvariantMode ? 0xFF : 0x80;
while (length != 0 && (*a <= comparator) && (*b <= comparator))
{that would avoid the duplication al,so.
There was a problem hiding this comment.
Or a separate function as @stephentoub suggests. It's just confusing two have two blocks like this IMO.
There was a problem hiding this comment.
I think this is a good idea. I'll apply it. I'll use 0xFFFF (not 0xFF)
There was a problem hiding this comment.
Yes thanks I meant 0xFFFF :) ... this should get a good comment, though, as I had to think about it.
|
How are we testing this? |
|
What is your thinking about testing that we don't accidentally load ICU when we aren't supposed to? It would be easy to introduce a code path a month from now that does this. |
|
For the testing question, I have already created a test which testing almost all the globalization functionality when having the config switch is on. The test expect different values that if we have the switch is off. I have been testing it on Windows and Linux. also I have tested on system having ICU installed and not having the ICU installed with and without the config switch. |
I guess what I'm wondering is how easy it would be for a dev 6 months from now to accidentally introduce a load of ICU in a non-ICU codepath and not notice. Unless we either have something in the product that notices it loads (if that's even possible) or we regularly run our tests on a box without ICU available, as you've done manually, we probably wouldn't notice. |
|
@danmosemsft I understand your point. we need to be more robust to guard against mistakes in the future. you are talking about ICU here but we should think about Windows too which I am seeing them as same. I think the best here is we create a static analysis tool that can scan the whole code and ensure all paths to the pinvokes are guarded. |
| return LastIndexOfCore(source, value, startIndex, count, options); | ||
| } | ||
|
|
||
| internal unsafe int LastIndexOfOrdinal(string source, string value, int startIndex, int count, bool ignoreCase) |
| { | ||
| internal sealed partial class GlobalizationMode | ||
| { | ||
| internal static bool GetGlobalizationInvariantMode() |
| { | ||
| internal sealed partial class GlobalizationMode | ||
| { | ||
| internal static bool GetGlobalizationInvariantMode() |
|
LGTM modulo few nits. |
| internal sealed partial class GlobalizationMode | ||
| { | ||
| private const string c_InvariantModeConfigSwitch = "System.Globalization.Invariant"; | ||
| private static bool s_invariantMode = GetGlobalizationInvariantMode(); |
There was a problem hiding this comment.
Or even internal static bool Invariant { get; } = GetGlobalizationInvariantMode();
|
@jkotas thanks for your review and the feedback. I have one question. one of your comments you asked to remove the #if CORECLR. my question is, when these changes get merged with the corert, I think this will break. right? GlobalizationMode.Invariant get initialized by the QCall which is not in coreRT. so I think we'll need to add the #if CORECLR in the initialization to avoid the QCall calling in corert. |
|
test OSX10.12 x64 Checked Build and Test please |
I expect that we are not going to share this file with CoreRT - this file will be CoreCLR specific. There will be CoreRT specific version of it that reads it directly from AppContext and has potentially other CoreRT specific tweaks. |
|
Also, could you please work @russellhadley and @erozenfeld to figure out the plan how to get this integrated with the linker that they are working on? This can make standalone apps that opt into invariant globalization smaller; and it can be also used to test that you got all paths to ICU covered with the condition - the linker should tree shake all calls to ICU. |
I was just wondering about the calls of GlobalizationMode.Invarian in many places inside the globalization code. so corert should have the definition of GlobalizationMode or otherwise the code will not compile. I may be missing something but anyway I'll go ahead and merge the changes here and we can handle any break if we get any. |
|
test OSX10.12 x64 Checked Build and Test please |
|
I'll work with @russellhadley and @erozenfeld as you suggested too. |
|
Yes, we should be able to deadcode the appropriate code paths if the invariant flags is know at publish time. @erozenfeld can you make sure that we have a backlog item for this? |
|
@erozenfeld please let me know if you need anything from my side. I'll be happy to help. |
* Invariant Globalization Work * Convert the testing Exceptions to asserts * Remove un-needed comment * Fix typos * Fix unrelated typo * Address the PR feedback * More feedback addressing * More feedback addressing * Fix Linux break * More feedback addressing * cleanup
Partial port of dotnet/coreclr#10264 Fixes #4640
Partial port of dotnet#10264 Fixes dotnet#4640 Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Partial port of dotnet/coreclr#10264 Fixes #4640 Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Partial port of dotnet/coreclr#10264 Fixes #4640 Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Partial port of dotnet/coreclr#10264 Fixes #4640 Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Partial port of dotnet/coreclr#10264 Fixes #4640 Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com> Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
This change is to introduce the config switch to control using the OS globalization or just run in the invariant mode.