Skip to content

Conversation

@jackhorton
Copy link
Contributor

Fixes #421
Fixes #4526

Also fixes a previously untracked? bug where String.prototype.toLocale{Upper|Lower}Case.call(null) would print "null" rather than throwing a TypeError.

while (i < inStrLim)
// Fast path for one character strings
char16 inChar = pThis->GetSz()[0];
char16 oneCharAttempt[2] = { inChar, 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this buffer? Does the TryChangeStringLinguisticCaseInPlace function expect a null terminated string rather than a length-specified string?

*pErrorOut = TranslateUErrorCode(errorCode);
return -1;
}
AssertMsg(resultStringLength > 0, "u_strToCase must return required destString length");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assert hit if you try to change the case of the empty string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nevermind, I see there's a check earlier on that the source length is greater than 0; I'm not sure I see where the caller enforces that though

char16 *outStr = builder.DangerousGetWritableBuffer();
ApiError error = ApiError::NoError;
// pre-flight to get the length required, as it may be longer than the original string
charcount_t requiredStringLength = ChangeStringLinguisticCase<toUpper, useInvariant>(pThis->GetSz(), pThis->GetLength(), nullptr, 0, &error);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So some API implementation details, here... ICU's version of this API allows you to pass in a non-null buffer for pre-flighting, in case you happen to provide one long enough. Intl takes advantage of this, where we will basically try a function once with a buffer of a guessed length, and if that fails, allocate a new buffer using the actual length returned from the first attempt. We could do that here, where our initial guess could be to assume that the cased string will be as long as the non-cased string, which will be true in lots of cases. However, the Windows/no-ICU version of this function doesn't support that functionality, because if the buffer isn't long enough, it returns a required length of 0 to indicate failure. So, to be safe, we always pass nullptr here. Its one thing to implement this functionality here separately for ICU and no-ICU, but does that make the ChangeStringLinguisticCase API too complicated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to say open another issue to track this as a possible optimization but for now it's fine.


return ToLocaleCaseHelper(args[0], false, scriptContext);
JavascriptString * pThis = nullptr;
GetThisStringArgument(args, scriptContext, _u("String.prototype.toLocaleUpperCase"), &pThis);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toLocaleUpperCase [](start = 72, length = 17)

These labels are mixed up between the two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

return ToCaseCore<toUpper, false>(pThis);
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra blank line below this function


int32 ChangeStringLinguisticCase(CaseFlags caseFlags, const char16* sourceString, uint32 sourceLength, char16* destString, uint32 destLength, ApiError* pErrorOut)
template<bool toUpper, bool useInvariant>
charcount_t ChangeStringLinguisticCase(const char16* sourceString, charcount_t sourceLength, char16* destString, charcount_t destLength, ApiError* pErrorOut)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linguistic [](start = 32, length = 10)

how do we handle the langtag for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSIX doesn't have any idea of locale for this implementation, but this is only used in the --no-icu case, which as far as I know is anything goes in terms of spec compliance. This didn't handle system vs root locale before, either.

}
}

uint32 ChangeStringCaseInPlace(CaseFlags caseFlags, char16* stringToChange, uint32 bufferLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChangeStringCaseInPlace [](start = 15, length = 23)

is this not needed anymore? I think this was an optimization for ASCII strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it because we couldn't guarantee in-place conversion for any arbitrary string (any ICU implementation takes a src and dest buffer, and if characters can expand from source to dest, doing conversions in place using the same buffer for both args would produce incorrect results). I suppose I can add this back and fail for non-ASCII?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am still against this function because in reality, this is only called when converting JavascriptStrings, which are immutable and can't be changed in place regardless. So, unless we can come up with a case where we are casing a string from native code that isn't a JavascriptString, I think this function just adds more complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. If it regresses perf in a scenario we are tracking we can optimize later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth submitting an IB run with this change?

assert.areEqual("\u03B1\u0345\u03C2", "\u0391\u0345\u03A3".toLowerCase(), "Expecting sigma preceded by Greek upper-case alpha (0391), combining Greek ypogegrammeni (0345)");
assert.areEqual("a\u03C2\u0345", "A\u03A3\u0345".toLowerCase(), "Expecting sigma followed by combining Greek ypogegrammeni (0345)");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} [](start = 8, length = 1)

Should add a test for the proper error thrown for the String.prototype.toLocale{Upper|Lower}Case.call(null) case


if (WScript.Platform.INTL_LIBRARY === "icu") {
assert.areEqual("\u0069\u0307", "\u0130".toLowerCase(), "Expecting Latin upper-case i with dot above");
assert.areEqual("a\u03C2", "A\u03A3".toLowerCase(), "Expecting sigma preceded by Latin upper-case a");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upper [](start = 103, length = 5)

lower-case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were added previously. I could try to go through and add them all again, but I imagine they were auto-generated?

if (WScript.Platform.INTL_LIBRARY === "icu") {
assert.areEqual("\u0069\u0307", "\u0130".toLowerCase(), "Expecting Latin upper-case i with dot above");
assert.areEqual("a\u03C2", "A\u03A3".toLowerCase(), "Expecting sigma preceded by Latin upper-case a");
assert.areEqual("\uD835\uDCA2\u03C2", "\uD835\uDCA2\u03A3".toLowerCase(), "Expecting sigma preceded by mathematical script capital g (d835 dca2 = 1d4a2)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capital [](start = 139, length = 7)

again lower case because you're using .toLowerCase?

Probably just check over these comments again.

// Win32's version of toUpperCase does not support growing strings in the uppercase operation (see CharUpperBuff)
return;
}
assert.areEqual("\u0053\u0053", "\u00DF".toUpperCase(), "Expecting Latin lower-case sharp s");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower-case [](start = 85, length = 10)

you used toUpperCase so this comment should reflect that

assert.areEqual("\u1F0D\u0399", "\u1F85".toUpperCase(), "Expecting Greek lower-case alpha with dasia and oxia and ypogegrammeni");
assert.areEqual("\u1F0E\u0399", "\u1F86".toUpperCase(), "Expecting Greek lower-case alpha with psili and perispomeni and ypogegrammeni");
assert.areEqual("\u1F0F\u0399", "\u1F87".toUpperCase(), "Expecting Greek lower-case alpha with dasia and perispomeni and ypogegrammeni");
assert.areEqual("\u1F08\u0399", "\u1F88".toUpperCase(), "Expecting Greek upper-case alpha with psili and prosgegrammeni");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some mixing of lowercase and uppercase but all operations are toUpperCase

{
name: "Edge cases",
body() {
assert.areEqual("", "".toUpperCase(), "Empty string should return itself (toUpperCase)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toUpperCase [](start = 35, length = 11)

Did you add any tests for toLocale{Upper|Lower}Case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should probably go in a different file.

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and test comment consistency issues. Otherwise LGTM

Copy link
Contributor

@sethbrenith sethbrenith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jackhorton
Copy link
Contributor Author

@dilijev Unless you feel particularly strongly about it, I feel like I should leave the message issues in the Unicode test as is and open a bug to regenerate that test for Unicode 10 soon.

@dilijev
Copy link
Contributor

dilijev commented Mar 19, 2018

@jackhorton that's fine if there's a task to track. Keeping in mind the test is a bit misleading at the moment (especially if something fails could cause confusion).

We both have Unicode 10 update work to do :P

@chakrabot chakrabot merged commit 2335226 into chakra-core:master Mar 20, 2018
chakrabot pushed a commit that referenced this pull request Mar 20, 2018
…ctions to allow expanding-length strings

Merge pull request #4832 from jackhorton:icu/tocase

Fixes #421
Fixes #4526

Also fixes a previously untracked? bug where String.prototype.toLocale{Upper|Lower}Case.call(null) would print "null" rather than throwing a TypeError.
@jackhorton jackhorton deleted the icu/tocase branch April 30, 2018 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants