-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
See also #184
We add language support for a new platform type, Utf8String (dotnet/corefxlab#2350). This name is tentative and subject to a decision by the corert team. For now we use the name Utf8String as a placeholder for whatever name it ends up being.
Section numbers below refer to the ECMA version of the specification.
The following sections are proposed to be added to the specification
9.2.N Utf8String (in section Types)
The type System.Utf8String is a sealed class type that inherits directly from object. In the remainder of the spec, we use the name Utf8String to refer to this specific type. Instances of Utf8String represent Unicode strings stored internally using the Unicode UTF-8 encoding (https://en.wikipedia.org/wiki/UTF-8).
11.2.N Utf8String conversion (in section Implicit Conversions)
An implicit conversion exists from a constant expression of type string to the type Utf8String. This conversion produces a null value if the expression's value is null. Otherwise the conversion produces an instance of Utf8String that represents the same sequence of Unicode codepoints. It is a compile-time error if the characters of the string constant cannot be represented as a valid Unicode UTF-8 sequence. This would occur, for example, if the input string constant contains unmatched surrogates. The result of the conversion is a constant expression of type Utf8String.
Concatenation
The following addition (no pun intended) is made to 12.9.5 Addition operator:
Utf8Stringconcatenation:System.Utf8String operator +(System.Utf8String x, System.Utf8String y); System.Utf8String operator +(System.Utf8String x, object y); System.Utf8String operator +(object x, System.Utf8String y);
These overloads of the binary + operator perform Utf8String concatenation. If an operand of Utf8String concatenation is null, an empty Utf8String is substituted. Otherwise, any non-string operand is converted to its Utf8String representation by invoking the virtual ToString method inherited from type object and then encoding the result as a Utf8String. If ToString returns null, an empty Utf8String is substituted. If the string returned by ToString is not representable as a Utf8String, a System.ArgumentException is thrown.
The result of the Utf8String concatenation operator is a Utf8String that consists of the characters of the left operand followed by the characters of the right operand. The Utf8String concatenation operator never returns a null value. A System.OutOfMemoryException may be thrown if there is not enough memory available to allocate the resulting Utf8String.
Constant Expressions
The following changes are made to 12.20 Constant expressions:
Change this sentence
If a constant expression is a reference type, it must be the
stringtype, a default value expression (§12.7.15) for some reference type, or the value of the expression must benull.
to this
If a constant expression is a reference type, it must be the
stringtype, theUtf8Stringtype, a default value expression (§12.7.15) for some reference type, or the value of the expression must benull.
We add the Utf8String conversion to the set of conversions permitted in a constant expression.
Open Issues
Concatenation
String concatenation may be somewhat problematic. A + operation between a Utf8String and a string would be ambiguous due to the presence of the following two operators:
Utf8String operator +(Utf8String x, object y);string operator +(object x, string y);
It isn't clear what semantic are desired. Do we need concatenation for Utf8String values?
Interpolation
There is no easy way to use interpolation to get a Utf8String value. One approach would be to define a new interpolated string conversion from an interpolated string to the type Utf8String. That would permit us to issue a compile-time error if the format string contains unmatched surrogates.