Work around CLR unicode bug with classification of U+2028#9519
Work around CLR unicode bug with classification of U+2028#9519gafter wants to merge 1 commit intodotnet:future-stabilizationfrom
Conversation
|
LGTM |
|
CC @ellismg |
|
@gafter future-stabilization is basically an ask mode branch. This doesn't intuitively seem like an ask mode issue. Am I missing a case? |
|
Why did you assign the issue to 2.0 (Preview)? Isn't that the milestone for this preview? |
|
@gafter ah okay. That's my mistake. |
| // U+2028 "LINE SEPARATOR" should be classified by CharUnicodeInfo.GetUnicodeCategory as | ||
| // UnicodeCategory.LineSeparator but due to a bug is incorrectly categorized as | ||
| // UnicodeCategory.Format. See https://github.com/dotnet/coreclr/issues/3542 | ||
| replaceWith = "\\u2028"; |
There was a problem hiding this comment.
There are a lot of calls to CharUnicodeInfo.GetUnicodeCategory in our codebase and some of those sites can still suffer from the bug in this API. For example (in src\Compilers\Core\Portable\UnicodeCharacterUtilities.cs):
UnicodeCategory cat = CharUnicodeInfo.GetUnicodeCategory(ch);
return IsLetterChar(cat)
|| IsDecimalDigitChar(cat)
|| IsConnectingChar(cat)
|| IsCombiningChar(cat)
|| IsFormattingChar(cat);
Instead of patching this particular call site in this manner, we should consider creating a wrapper for CharUnicodeInfo.GetUnicodeCategory, which would make necessary adjustments to the value it returns and simply change all call sites to use that wrapper.
|
Moving to the future branch. |
Fixes #9484
@dotnet/roslyn-compiler Please review