Skip to content

Conversation

@srutzky
Copy link
Contributor

@srutzky srutzky commented Jun 28, 2019

  1. Remove erroneous note regarding \U being used for specifying surrogate pairs. That note was patently false given that a) specifying a surrogate pair results in a compiler error, and b) specifying any valid code point / UTF-32 code unit returns the correct Unicode character for that code point.

    • Even if the original author meant "supplementary characters" instead of "surrogate pairs", that would still be incorrect as the \U escape can also be used for BMP characters.
    • Runnable example code showing that a valid code point (U+1F47E) works via \U0001F47E, and its surrogate pair via \UD83DDC7E does not, on IDE One
    • In creating the test noted above, I found a bug in the Mono C# compiler, so I submitted that here:
      "\U" Unicode escape sequence for strings accepts invalid value instead of raising error #15456
    • Runnable example code showing that invalid code point (U+110000) raises an exception, on IDE One
  2. Correctly indicated that \U is for a 4-byte UTF-32 value, and \u is for a 2-byte UTF-16 value.

  3. Show the pattern and an example to be more readable / helpful. Please note that \U00nnnnnn has two permanent zeros and only 6 user-supplied hex digits. This is not only being completely honest (since those first two zeros can only ever be zeros), it removes any possibility of interpreting the 8 hex digits as being for a surrogate pair (which can never start with two zeros), hence reducing confusion.

  4. Properly formatted escape sequences as being inline-code

  5. Added warning about using \x escape with less than 4 hex digits. For more info on this, please see:
    Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)

1. Remove erroneous note regarding `\U` being used for specifying surrogate pairs. That note was patently false given that a) specifying a surrogate pair raises an exception, and b) specifying any valid code point / UTF-32 code unit returns the correct Unicode character for that code point.
    * Even if the original author meant "supplementary characters" instead of "surrogate pairs", that would still be incorrect as the `\U` escape can also be used for BMP characters.
    * Runnable example code showing that a valid code point (U+1F47E) works via `\U0001F47E`, and its surrogate pair via `\UD83DDC7E` does not, on [IDE One](https://ideone.com/deoylQ)
   * In creating the test noted above, I found a bug in the Mono C\# compiler, so I submitted that here:  
       ["\U" Unicode escape sequence for strings accepts invalid value instead of raising error dotnet#15456](mono/mono#15456)
  * Runnable example code showing that invalid code point (U+110000) raises an exception, on [IDE One](https://ideone.com/jpVxL4)

2. Correctly indicated that `\U` is for a 4-byte UTF-32 value, and `\u` is for a 2-byte UTF-16 value.

3. Show the pattern _and_ an example to be more readable / helpful. Please note that `\U00nnnnnn` has two permanent zeros and only 6 user-supplied hex digits. This is not only being completely honest (since those first two zeros can only ever be zeros), it removes any possibility of interpreting the 8 hex digits as being for a surrogate pair (which can never start with two zeros), hence reducing confusion.

4. Properly formatted escape sequences as being inline-code

5. Added warning about using `\x` escape with less than 4 hex digits. For more info on this, please see:
     [Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)](https://sqlquantumleap.wordpress.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/#csharp)
@srutzky srutzky requested a review from BillWagner as a code owner June 28, 2019 13:42
@srutzky srutzky changed the title Fix and improve Unicode escape sequence info Fix and improve Unicode escape sequence info (C#) Jun 28, 2019
Copy link
Member

@BillWagner BillWagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding these clarifying comments @srutzky
We appreciate it.

I’ve reviewed the changes, and I’ll :shipit: now.

Thanks again!

@BillWagner BillWagner merged commit 9b6f355 into dotnet:master Jul 1, 2019
@srutzky
Copy link
Contributor Author

srutzky commented Jul 1, 2019

@BillWagner You are welcome.

I forgot to mention that this update has a companion F# update: #13168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants