lower interpolated strings to string concatenation where possible#26612
lower interpolated strings to string concatenation where possible#26612jcouv merged 5 commits intodotnet:masterfrom
Conversation
|
This PR requires tests. Please include at least one test with a sufficient number of interpolations such that the optimizations for string concatenation kick in. #Resolved |
|
Please implement similar optimizations for VB. |
I assume that the cases mentioned above are good test cases? Where do you want the tests to go? Should they be "black box", i.e. validate that the interpolated string results in the correct value, or "white box", i.e. validate the IL being generated? I can write such tests after next week (leaving for the Build conference tonight). #Resolved |
|
Please validate both the IL and resultes. The kind of test I’m thinking of would end up using a string builder due to optimizing the string concatenation. #Resolved |
| { | ||
| // this is one of the expression holes | ||
| if (fillin.HasErrors || | ||
| fillin.Value.Type?.SpecialType != SpecialType.System_String || |
There was a problem hiding this comment.
can't we box other types e.g. int and pass to Concat(object, object) etc? #ByDesign
There was a problem hiding this comment.
@alrz I think the purpose of this PR is to get the camel's nose into the tent. #Resolved
There was a problem hiding this comment.
I suppose that's a no -- due to custom formatting semantics. camel's nose shall not get into the tent. #Resolved
There was a problem hiding this comment.
What I meant was doing this for strings is the camel's nose in the tent. Once this optimization is done, if we can demonstrate that it would be correct to do the optimization for other types in the future, the optimization could be extended in the future. #Resolved
|
The PR description lists a number of examples with corresponding generated IL. Those would all make excellent tests. In reply to: 386603036 [](ancestors = 386603036) |
|
@KrisVandermotten Just to be clear, we're waiting for you to add tests. #Resolved |
|
@dotnet-bot retest windows_release_vs-integration_prtest please #Resolved |
|
@gafter Tests have been added and are running fine. Not sure why windows_release_vs-integration_prtest is failing, but it seems unrelated. @jcouv I did not add the tests to #Resolved |
|
@jaredpar Could you try removing the Blocked tag? GitHub didn't work properly when I tried. #Resolved |
So you're blocked removing the blocked tag? ;) done #Resolved |
| "); | ||
|
|
||
| comp.VerifyDiagnostics(); | ||
| comp.VerifyIL("Test.M6", @" |
There was a problem hiding this comment.
Tagging @AlekseyTs as FYI. I'm not sure if it was the case before, but interpolated strings can now introduce some small control flow (null-coalescing operator introduced during lowering). #Resolved
| Console.WriteLine($""a: {a}, b: {b}""); | ||
| Console.WriteLine($""{{{a}}}""); | ||
| Console.WriteLine(""a:"" + $"" {a}""); | ||
| } |
There was a problem hiding this comment.
} [](start = 4, length = 1)
Consider testing nested string interpolation $"{$"...."}" too. #Closed
|
@dotnet-bot retest windows_debug_spanish_unit32_prtest please |
| for (int i = 0; i < formatLength; i++) | ||
| { | ||
| char c = s[i]; | ||
| builder.Builder.Append(c); |
There was a problem hiding this comment.
builder.Builder [](start = 16, length = 15)
Consider extracting local for builder.Builder. #Closed
There was a problem hiding this comment.
Should that be done in the existing MakeInterpolatedStringFormat too? #Closed
There was a problem hiding this comment.
| } | ||
| } | ||
| else | ||
| { |
There was a problem hiding this comment.
{ [](start = 20, length = 1)
Consider asserting that the part is a BoundLiteral and ConstantValue is not null (ie. a string literal) #Closed
There was a problem hiding this comment.
There was a problem hiding this comment.
The existing lowering to string.Format doesn't have that assert. Should it be added there too? #Closed
There was a problem hiding this comment.
|
|
||
| if (part.ConstantValue != null) | ||
| { | ||
| if (part.ConstantValue.StringValue == null) |
There was a problem hiding this comment.
== [](start = 63, length = 2)
Perhaps use is null (no need to call string.operator==) #Closed
There was a problem hiding this comment.
Doesn't the compiler generate the exact same code in both cases, i.e. a brtrue.s instruction? #Closed
There was a problem hiding this comment.
| { | ||
| if (part.ConstantValue.StringValue == null) | ||
| { | ||
| return _factory.StringLiteral(""); |
There was a problem hiding this comment.
_factory.StringLiteral("") [](start = 39, length = 26)
It seems like _factory.StringLiteral("") always creates instances. @gafter Do you think it's worth it to use a static singleton here? #Resolved
There was a problem hiding this comment.
No, this is fine. It is easier to debug if the bound nodes have the right syntax node associated with them, which the factory helps with.
In reply to: 190383830 [](ancestors = 190383830)
| { | ||
| i++; | ||
| var part = node.Parts[i]; | ||
| if (part is BoundStringInsert fillin) |
There was a problem hiding this comment.
I didn't understand why the treatment is different in the case where length != 1 than it was in the case where length == 1.
I'd expect we can extract the logic from the length == 1 into a method, then we can call the method in both cases and just check what's returned (if it's just a StringLiteral, in the case where length == 1 then we can return).
If that works out, then I suspect the check on length == 1 will become a very local choice, rather than a containing if statement.
Something like:
foreach (var part in node.Parts)
{
var loweredPart = LowerPart(part);
if (length == 1 && /* loweredPart is a string literal */) { return loweredPart; }
result = ...
}
``` #ResolvedThere was a problem hiding this comment.
The reason why the length == 1 case is different, is that it requires additional processing (e.g. lowering to a null coalescing expression) to ensure the resulting string is not null. Such processing is not required when length > 1, because string concatenation allready ensures it. I will add a comment to clarify that.
I did consider merging the loops.
Because of the additonal requirements for the length == 1 case, such a LowerPart method would need two parameters: LowerPart(BoundExpression part, bool isOnlyPart). While the loop duplication is not ideal, such a flag parameter isn't either. The parameter isOnlyPart is indeed tested locally in LowerPart, but has no local meaning. It becomes magic. And if you create two LowerPart variants, you might as well have the two loops.
As for the loop itself:
bool isOnlyPart = length == 1;
foreach (var part in node.Parts)
{
var loweredPart = LowerPart(part, isOnlyPart);
if (isOnlyPart && loweredPart is BoundLiteral)
{
Debug.Assert(/* loweredPart is a non-null string*/);
return loweredPart;
}
result = ...
}
We can of course skip the if statement, and have the string literal go through another lowering pass like any other result. From a functional point of view, that doesn't hurt.
With or without the if statement, having a specialized loop for length == 1 is more efficient.
#Resolved
There was a problem hiding this comment.
To clarify: the additional processing to ensure that the result is not null is required when length == 1, but forbidden when length > 1. The reason for that is that string concatenation does not flatten ((a + b) ?? "") + c to a + b + c, and the same is true for a + ((b + c) ?? "").
However, it just occurred to me that this problem can be fixed quite easily :-). I'll see what I can do. #Resolved
There was a problem hiding this comment.
Done. #Resolved
| return true; | ||
| } | ||
|
|
||
| private static string Unescape(string s) |
There was a problem hiding this comment.
I suspect this method can be re-used in VisitInterpolatedString (LocalRewriter_StringInterpolation.cs line 78) #Closed
There was a problem hiding this comment.
No it cannot. String.Format requires '{' to be escaped as "{{" and '}' to be escaped as "}}". #Closed
There was a problem hiding this comment.
I'm sorry, I don't see the difference between this method and the code I referred to.
var builder = PooledStringBuilder.GetInstance();
...
int formatLength = formatText.Length;
for (int i = 0; i < formatLength; i++)
{
char c = formatText[i];
builder.Builder.Append(c);
if ((c == '{' || c == '}') && (i + 1) < formatLength && formatText[i + 1] == c)
{
i++;
}
}
... builder.ToStringAndFree() ...In reply to: 190505462 [](ancestors = 190505462)
There was a problem hiding this comment.
I'm confused. What code are you referring too?
Or are you saying that line 86:
formatString.Builder.Append(part.ConstantValue.StringValue);
does the same thing as:
int formatLength = s.Length;
for (int i = 0; i < formatLength; i++)
{
char c = s[i];
stringBuilder.Append(c);
if ((c == '{' || c == '}') && (i + 1) < formatLength && s[i + 1] == c)
{
i++;
}
}
?
It's not the same. When lowering an interpolation to string.Format, any "{{" must remain "{{", because string.Format needs them to be escaped. When lowering to string concatenation, they need to be unescaped to "{". #Closed
There was a problem hiding this comment.
My bad, I was looking at a local copy of the file before your change. Sorry for the confusion.
In reply to: 191042562 [](ancestors = 191042562)
jcouv
left a comment
There was a problem hiding this comment.
Done with review pass (iteration 3).
|
I've looked at the tests and I believe the cases I was interested in are tested (e.g. calling many-argument versions of |
| return _factory.StringLiteral(builder.ToStringAndFree()); | ||
|
|
||
| result = _factory.Coalesce(result, _factory.StringLiteral("")); | ||
| } |
There was a problem hiding this comment.
I don't think we should do this unless there was no string concatenation. #Resolved
There was a problem hiding this comment.
It is not functionally wrong to do this, even if there was a concatenation, because we remove it again in the lowering of the coalescing operator. I noticed this technique is used in lot's of places. (For example, the lowering of a non-binary string concatenation builds lot's of intermediary structures that are broken down again when lowering a higher level.) So I used the same technique here to clean up the code considerably. Also, notice that the unit tests all still pass, nothing has changed there.
But you are correct, a simple if (length == 1) avoids it in most cases.
#Resolved
|
Now that this is going in, would it also make sense to accept such literals as a constant in the language (with an additional requirement of every interpolation to be a constant) so we can use them in attributes etc? #Resolved |
|
@alrz Are you asking a language question in the compiler repository? Shame on you! #Resolved |
|
|
||
| // string concatenation is never null. | ||
| // interpolated string lowering may introduce redundant null coalescing, which we have to remove. | ||
| if (IsStringConcat(rewrittenLeft)) |
There was a problem hiding this comment.
Is this still needed? #Resolved
There was a problem hiding this comment.
Yes, for the case $"{a + b}" (where a + b is a string). #Resolved
jcouv
left a comment
There was a problem hiding this comment.
Done with review pass (iteration 5).
|
@gafter for a second look. Thanks |
|
Approved |
|
We're all set to merge. @KrisVandermotten Thanks a lot for the contribution. |
|
IT'S HAPPENING! now I don't feel guilty every time I use string interpolation. |
|
@jcouv Go ahead, squash and merge. |

This PR handles #22594 and is a follow up for #22595.
Customer scenario
The user uses string interpolation where concatenation could be used.
Risk
This PR modifies code generation of interpolated strings, but only in the case where all fill-ins are strings without alignment or format specifiers.
Performance impact
Code generation for interpolated strings will be somewhat slower. The generated code will be significantly faster in the case where all fill-ins in the interpolated string are strings, without alignment or format specifiers. In all other cases, the new implementation has one additional loop over the parts with no memory allocations in it, fairly simple logic and early exit as soon it is determined that the code gen optimization cannot be done.
Given the following declarations:
With
/o, the following methods compile to the following IL: