More String escape sequence improvements (F#) #13257

srutzky · 2019-07-05T21:17:59Z

"Literals" page

In my previous update I seem to have added an extra 0 to the 0010FFFF at the end of the first paragraph in the Remarks section.

"Strings" page

Added 5 missing sequences: \a, \f, v, \x, and \DDD. The following example code ran in LINQPad 5 (Language = "F# Program"):

printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB";
printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}";
printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}";
printfn "---------------------";
printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB";
printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F";
printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02
printfn "---------------------";printfn "Test \\a: \a";
printfn "Test \\f: \f";
printfn "Test \\v: \v";

They can also be found in the source code on GitHub:

Broke \u and \U sequences into separate entries.
Provided range and an example for each Unicode character sequence.
Added "Important" note regarding \DDD being decimal, not octal, notation.
Added note regarding \DDD and \xx effectively being ISO-8859-1 (which is the first 256 Unicode code points), including a link to the WikiPedia article for ISO-8859-1.
NOTE: I changed the Xs into Hs for the \u and \U sequences due to adding the \x sequence and not wanting to have \xXX as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the Ds used for the newly added \DDD sequence). If anyone feels strongly that it should remain as X, then it can be changed back.

Please see Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters) for more details.

In my previous update I seem to have added an extra `0` to the `0010FFFF` at the end of the first paragraph in the **Remarks** section.

1. Added 5 missing sequences: `\a`, `\f`, `v`, `\x`, and `\DDD`. The following example code ran in LINQPad 5 (Language = "F# Program"): ```fsharp printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB"; printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}"; printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}"; printfn "---------------------"; printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB"; printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F"; printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02 printfn "---------------------";printfn "Test \\a: \a"; printfn "Test \\f: \f"; printfn "Test \\v: \v"; ``` It can also be found in the source code on GitHub: * Defined here: * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lex.fsl#L209 * Processed here: * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L142 * https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L179 2. Broke `\u` and `\U` sequences into separate entries. 3. Provided range and an example for each Unicode character sequence. 4. Added "Important" note regarding `\DDD` being decimal, not octal, notation. 5. Added note regarding `\DDD` and `\xx` effectively being ISO-8859-1 (which is the first 256 Unicode code points) 6. **NOTE:** I change the `X`s into `H`s for the `\u` and `\U` sequences due to adding the `\x` sequence and not wanting to have `\xXX` as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the `D`s used for the newly added `\DDD` sequence). Please see https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/#fsharp for more details.

cartermp

Thanks @srutzky! I definitely love the alien character, too 😄

srutzky added 2 commits July 2, 2019 22:18

Minor fix literals page

6cccead

In my previous update I seem to have added an extra `0` to the `0010FFFF` at the end of the first paragraph in the **Remarks** section.

srutzky requested a review from cartermp as a code owner July 5, 2019 21:17

cartermp approved these changes Jul 5, 2019

View reviewed changes

cartermp merged commit bcf6465 into dotnet:master Jul 5, 2019

mairaw assigned srutzky Jul 6, 2019

mairaw added this to the July 2019 milestone Jul 6, 2019

cartermp mentioned this pull request Jul 8, 2019

Unicode literal range #13271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More String escape sequence improvements (F#) #13257

More String escape sequence improvements (F#) #13257

Uh oh!

srutzky commented Jul 5, 2019

Uh oh!

cartermp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More String escape sequence improvements (F#) #13257

More String escape sequence improvements (F#) #13257

Uh oh!

Conversation

srutzky commented Jul 5, 2019

"Literals" page

"Strings" page

Uh oh!

cartermp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants