Skip to content

Conversation

@srutzky
Copy link
Contributor

@srutzky srutzky commented Jul 5, 2019

"Literals" page

In my previous update I seem to have added an extra 0 to the 0010FFFF at the end of the first paragraph in the Remarks section.

"Strings" page

  1. Added 5 missing sequences: \a, \f, v, \x, and \DDD. The following example code ran in LINQPad 5 (Language = "F# Program"):

    printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB";
    printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}";
    printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}";
    printfn "---------------------";
    printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB";
    printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F";
    printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02
    printfn "---------------------";printfn "Test \\a: \a";
    printfn "Test \\f: \f";
    printfn "Test \\v: \v";

    They can also be found in the source code on GitHub:

  2. Broke \u and \U sequences into separate entries.

  3. Provided range and an example for each Unicode character sequence.

  4. Added "Important" note regarding \DDD being decimal, not octal, notation.

  5. Added note regarding \DDD and \xx effectively being ISO-8859-1 (which is the first 256 Unicode code points), including a link to the WikiPedia article for ISO-8859-1.

  6. NOTE: I changed the Xs into Hs for the \u and \U sequences due to adding the \x sequence and not wanting to have \xXX as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the Ds used for the newly added \DDD sequence). If anyone feels strongly that it should remain as X, then it can be changed back.

Please see Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters) for more details.

srutzky added 2 commits July 2, 2019 22:18
In my previous update I seem to have added an extra `0` to the `0010FFFF` at the end of the first paragraph in the **Remarks** section.
1.  Added 5 missing sequences: `\a`, `\f`, `v`, `\x`, and `\DDD`. The following example code ran in LINQPad 5 (Language = "F# Program"):

```fsharp
printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB";
printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}";
printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}";
printfn "---------------------";
printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB";
printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F";
printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02
printfn "---------------------";printfn "Test \\a: \a";
printfn "Test \\f: \f";
printfn "Test \\v: \v";
```

It can also be found in the source code on GitHub:

* Defined here:
    * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lex.fsl#L209
* Processed here:
    * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L142
    * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L179

2. Broke `\u` and `\U` sequences into separate entries.

3. Provided range and an example for each Unicode character sequence.

4. Added "Important" note regarding `\DDD` being decimal, not octal, notation.

5. Added note regarding `\DDD` and `\xx` effectively being ISO-8859-1 (which is the first 256 Unicode code points)

6. **NOTE:** I change the `X`s into `H`s for the `\u` and `\U` sequences due to adding the `\x` sequence and not wanting to have `\xXX` as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the `D`s used for the newly added `\DDD` sequence).

Please see https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/#fsharp for more details.
@srutzky srutzky requested a review from cartermp as a code owner July 5, 2019 21:17
Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srutzky! I definitely love the alien character, too 😄

@cartermp cartermp merged commit bcf6465 into dotnet:master Jul 5, 2019
@mairaw mairaw added this to the July 2019 milestone Jul 6, 2019
@cartermp cartermp mentioned this pull request Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants