See the discussion here. The salient conclusion is this:
Escapes continue to work the way they do now: \x always inserts a single byte and \u always inserts a sequence of bytes encoding a unicode character. Literals are turned into String objects according to the following simple check:
ASCIIString if all bytes are < 0x80;
UTF8String if any bytes are ≥ 0x80.
If you want to use \x escapes with values at or above 0x80 to generate invalid UTF-8, that's your business. We can also introduce an Latin1"..." form that uses the Latin-1 encoding to store code points up to U+FF in an efficient character-per-byte form. Finally, the b"..." macro-defined string form can let you use characters and escapes (both \x and \u) to generate byte arrays.
We can safely and quickly concatenate ASCIIStrings with each other, with UTF8Strings, or with Latin1Strings. Mixing UTF8Strings and Latin1Strings, however, requires transcoding the Latin1Strings to UTF-8. This, however, will not occur with string literals since they will always be ASCIIStrings or UTF8Strings.
See the discussion here. The salient conclusion is this:
Escapes continue to work the way they do now:
\xalways inserts a single byte and\ualways inserts a sequence of bytes encoding a unicode character. Literals are turned into String objects according to the following simple check:ASCIIStringif all bytes are < 0x80;UTF8Stringif any bytes are ≥ 0x80.If you want to use
\xescapes with values at or above 0x80 to generate invalid UTF-8, that's your business. We can also introduce anLatin1"..."form that uses the Latin-1 encoding to store code points up to U+FF in an efficient character-per-byte form. Finally, theb"..."macro-defined string form can let you use characters and escapes (both\xand\u) to generate byte arrays.We can safely and quickly concatenate
ASCIIStrings with each other, withUTF8Strings, or withLatin1Strings. MixingUTF8Strings andLatin1Strings, however, requires transcoding theLatin1Strings to UTF-8. This, however, will not occur with string literals since they will always beASCIIStrings orUTF8Strings.