Skip to content

lower interpolated strings to string concatenation where possible#26612

Merged
jcouv merged 5 commits intodotnet:masterfrom
KrisVandermotten:interpolation2
Jun 1, 2018
Merged

lower interpolated strings to string concatenation where possible#26612
jcouv merged 5 commits intodotnet:masterfrom
KrisVandermotten:interpolation2

Conversation

@KrisVandermotten
Copy link
Copy Markdown
Contributor

This PR handles #22594 and is a follow up for #22595.

Customer scenario

The user uses string interpolation where concatenation could be used.

Risk

This PR modifies code generation of interpolated strings, but only in the case where all fill-ins are strings without alignment or format specifiers.

Performance impact

Code generation for interpolated strings will be somewhat slower. The generated code will be significantly faster in the case where all fill-ins in the interpolated string are strings, without alignment or format specifiers. In all other cases, the new implementation has one additional loop over the parts with no memory allocations in it, fairly simple logic and early exit as soon it is determined that the code gen optimization cannot be done.

Given the following declarations:

const string constantabc = "abc";
const string constantnull = null;

With /o, the following methods compile to the following IL:

string M1() => $"";

IL_0000: ldstr ""
IL_0005: ret

string M2() => $"abc";

IL_0000: ldstr "abc"
IL_0005: ret

string M3() => $"{constantabc}";

IL_0000: ldstr "abc"
IL_0005: ret

string M4() => $"{constantnull}";

IL_0000: ldstr ""
IL_0005: ret

string M5() => $"{constantabc}{constantnull}";

IL_0000: ldstr "abc"
IL_0005: ret

string M6(string a) => $"{a}";

IL_0000: ldarg.1
IL_0001: dup
IL_0002: brtrue.s IL_000a
IL_0004: pop
IL_0005: ldstr ""
IL_000a: ret

string M7(string a) => $"a: {a}";

IL_0000: ldstr "a: "
IL_0005: ldarg.1
IL_0006: call string [mscorlib]System.String::Concat(string, string)
IL_000b: ret

string M8(string a, string b) => $"{a + b}";

IL_0000: ldarg.1
IL_0001: ldarg.2
IL_0002: call string [mscorlib]System.String::Concat(string, string)
IL_0007: ret

string M9(string a, string b) => $"{a}{b}";

IL_0000: ldarg.1
IL_0001: ldarg.2
IL_0002: call string [mscorlib]System.String::Concat(string, string)
IL_0007: ret

string M10(string a, string b) => $"a: {a}, b: {b}";

IL_0000: ldstr "a: "
IL_0005: ldarg.1
IL_0006: ldstr ", b: "
IL_000b: ldarg.2
IL_000c: call string [mscorlib]System.String::Concat(string, string, string, string)
IL_0011: ret

string M11(object a) => $"{a}";

IL_0000: ldstr "{0}"
IL_0005: ldarg.1
IL_0006: call string [mscorlib]System.String::Format(string, object)
IL_000b: ret

string M12(object a) => $"a: {a}";

IL_0000: ldstr "a: {0}"
IL_0005: ldarg.1
IL_0006: call string [mscorlib]System.String::Format(string, object)
IL_000b: ret

string M13(string a) => $"{{{a}}}";

IL_0000: ldstr "{"
IL_0005: ldarg.1
IL_0006: ldstr "}"
IL_000b: call string [mscorlib]System.String::Concat(string, string, string)
IL_0010: ret

string M14(string a) => "a:" + $" {a}";

IL_0000: ldstr "a: "
IL_0005: ldarg.1
IL_0006: call string [mscorlib]System.String::Concat(string, string)
IL_000b: ret

@KrisVandermotten KrisVandermotten requested a review from a team as a code owner May 3, 2018 22:57
@jcouv jcouv added this to the 15.8 milestone May 3, 2018
@etbyrd etbyrd added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label May 3, 2018
@gafter
Copy link
Copy Markdown
Member

gafter commented May 4, 2018

This PR requires tests. Please include at least one test with a sufficient number of interpolations such that the optimizations for string concatenation kick in. #Resolved

@AlekseyTs
Copy link
Copy Markdown
Contributor

Please implement similar optimizations for VB.

@KrisVandermotten
Copy link
Copy Markdown
Contributor Author

KrisVandermotten commented May 4, 2018

@gafter

This PR requires tests. Please include at least one test with a sufficient number of interpolations such that the optimizations for string concatenation kick in.

I assume that the cases mentioned above are good test cases? Where do you want the tests to go? Should they be "black box", i.e. validate that the interpolated string results in the correct value, or "white box", i.e. validate the IL being generated?

I can write such tests after next week (leaving for the Build conference tonight). #Resolved

@gafter
Copy link
Copy Markdown
Member

gafter commented May 4, 2018

Please validate both the IL and resultes. The kind of test I’m thinking of would end up using a string builder due to optimizing the string concatenation. #Resolved

{
// this is one of the expression holes
if (fillin.HasErrors ||
fillin.Value.Type?.SpecialType != SpecialType.System_String ||
Copy link
Copy Markdown
Member

@alrz alrz May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we box other types e.g. int and pass to Concat(object, object) etc? #ByDesign

Copy link
Copy Markdown
Member

@gafter gafter May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alrz I think the purpose of this PR is to get the camel's nose into the tent. #Resolved

Copy link
Copy Markdown
Member

@gafter gafter May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
#Resolved

Copy link
Copy Markdown
Member

@alrz alrz May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's a no -- due to custom formatting semantics. camel's nose shall not get into the tent. #Resolved

Copy link
Copy Markdown
Member

@gafter gafter May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was doing this for strings is the camel's nose in the tent. Once this optimization is done, if we can demonstrate that it would be correct to do the optimization for other types in the future, the optimization could be extended in the future. #Resolved

@jcouv
Copy link
Copy Markdown
Member

jcouv commented May 15, 2018

The PR description lists a number of examples with corresponding generated IL. Those would all make excellent tests.
InterpolationTests.cs would be a fine place to add such tests.


In reply to: 386603036 [](ancestors = 386603036)

@gafter
Copy link
Copy Markdown
Member

gafter commented May 16, 2018

@KrisVandermotten Just to be clear, we're waiting for you to add tests. #Resolved

@KrisVandermotten
Copy link
Copy Markdown
Contributor Author

KrisVandermotten commented May 18, 2018

@dotnet-bot retest windows_release_vs-integration_prtest please #Resolved

@KrisVandermotten
Copy link
Copy Markdown
Contributor Author

KrisVandermotten commented May 18, 2018

@gafter Tests have been added and are running fine. Not sure why windows_release_vs-integration_prtest is failing, but it seems unrelated.

@jcouv I did not add the tests to InterpolationTests.cs. Those tests are purely semantic. They continued to run just fine with the new optimization. I added the new IL-verifying tests in a new file CodeGenInterpolatedString.cs.

#Resolved

@jcouv
Copy link
Copy Markdown
Member

jcouv commented May 18, 2018

@jaredpar Could you try removing the Blocked tag? GitHub didn't work properly when I tried. #Resolved

@jaredpar jaredpar removed the Blocked label May 18, 2018
@jaredpar
Copy link
Copy Markdown
Member

jaredpar commented May 18, 2018

Could you try removing the Blocked tag? GitHub didn't work properly when I tried.

So you're blocked removing the blocked tag? ;)

done #Resolved

");

comp.VerifyDiagnostics();
comp.VerifyIL("Test.M6", @"
Copy link
Copy Markdown
Member

@jcouv jcouv May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @AlekseyTs as FYI. I'm not sure if it was the case before, but interpolated strings can now introduce some small control flow (null-coalescing operator introduced during lowering). #Resolved

Console.WriteLine($""a: {a}, b: {b}"");
Console.WriteLine($""{{{a}}}"");
Console.WriteLine(""a:"" + $"" {a}"");
}
Copy link
Copy Markdown
Member

@jcouv jcouv May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} [](start = 4, length = 1)

Consider testing nested string interpolation $"{$"...."}" too. #Closed

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcouv Done #Closed

@KrisVandermotten
Copy link
Copy Markdown
Contributor Author

KrisVandermotten commented May 23, 2018

@dotnet-bot retest windows_debug_spanish_unit32_prtest please
#Resolved

for (int i = 0; i < formatLength; i++)
{
char c = s[i];
builder.Builder.Append(c);
Copy link
Copy Markdown
Member

@jcouv jcouv May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder.Builder [](start = 16, length = 15)

Consider extracting local for builder.Builder. #Closed

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be done in the existing MakeInterpolatedStringFormat too? #Closed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that'd be good. Thanks


In reply to: 190506216 [](ancestors = 190506216)

}
}
else
{
Copy link
Copy Markdown
Member

@jcouv jcouv May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 20, length = 1)

Consider asserting that the part is a BoundLiteral and ConstantValue is not null (ie. a string literal) #Closed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree... please do that.


In reply to: 190382117 [](ancestors = 190382117)

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing lowering to string.Format doesn't have that assert. Should it be added there too? #Closed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be good. Thanks


In reply to: 190494041 [](ancestors = 190494041)


if (part.ConstantValue != null)
{
if (part.ConstantValue.StringValue == null)
Copy link
Copy Markdown
Member

@jcouv jcouv May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== [](start = 63, length = 2)

Perhaps use is null (no need to call string.operator==) #Closed

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the compiler generate the exact same code in both cases, i.e. a brtrue.s instruction? #Closed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. Never mind then :-)


In reply to: 190485587 [](ancestors = 190485587)

{
if (part.ConstantValue.StringValue == null)
{
return _factory.StringLiteral("");
Copy link
Copy Markdown
Member

@jcouv jcouv May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_factory.StringLiteral("") [](start = 39, length = 26)

It seems like _factory.StringLiteral("") always creates instances. @gafter Do you think it's worth it to use a static singleton here? #Resolved

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is fine. It is easier to debug if the bound nodes have the right syntax node associated with them, which the factory helps with.


In reply to: 190383830 [](ancestors = 190383830)

{
i++;
var part = node.Parts[i];
if (part is BoundStringInsert fillin)
Copy link
Copy Markdown
Member

@jcouv jcouv May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why the treatment is different in the case where length != 1 than it was in the case where length == 1.

I'd expect we can extract the logic from the length == 1 into a method, then we can call the method in both cases and just check what's returned (if it's just a StringLiteral, in the case where length == 1 then we can return).
If that works out, then I suspect the check on length == 1 will become a very local choice, rather than a containing if statement.

Something like:

foreach (var part in node.Parts)
{
    var loweredPart = LowerPart(part);
    if (length == 1 && /* loweredPart is a string literal */) { return loweredPart; }
    result = ...
}
``` #Resolved

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why the length == 1 case is different, is that it requires additional processing (e.g. lowering to a null coalescing expression) to ensure the resulting string is not null. Such processing is not required when length > 1, because string concatenation allready ensures it. I will add a comment to clarify that.

I did consider merging the loops.

Because of the additonal requirements for the length == 1 case, such a LowerPart method would need two parameters: LowerPart(BoundExpression part, bool isOnlyPart). While the loop duplication is not ideal, such a flag parameter isn't either. The parameter isOnlyPart is indeed tested locally in LowerPart, but has no local meaning. It becomes magic. And if you create two LowerPart variants, you might as well have the two loops.

As for the loop itself:

bool isOnlyPart = length == 1;
foreach (var part in node.Parts)
{
    var loweredPart = LowerPart(part, isOnlyPart);
    if (isOnlyPart && loweredPart is BoundLiteral)
    {
        Debug.Assert(/* loweredPart is a non-null string*/);
        return loweredPart;
    }

    result = ...
}

We can of course skip the if statement, and have the string literal go through another lowering pass like any other result. From a functional point of view, that doesn't hurt.

With or without the if statement, having a specialized loop for length == 1 is more efficient.
#Resolved

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: the additional processing to ensure that the result is not null is required when length == 1, but forbidden when length > 1. The reason for that is that string concatenation does not flatten ((a + b) ?? "") + c to a + b + c, and the same is true for a + ((b + c) ?? "").

However, it just occurred to me that this problem can be fixed quite easily :-). I'll see what I can do. #Resolved

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Resolved

return true;
}

private static string Unescape(string s)
Copy link
Copy Markdown
Member

@jcouv jcouv May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this method can be re-used in VisitInterpolatedString (LocalRewriter_StringInterpolation.cs line 78) #Closed

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it cannot. String.Format requires '{' to be escaped as "{{" and '}' to be escaped as "}}". #Closed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't see the difference between this method and the code I referred to.

                var builder = PooledStringBuilder.GetInstance();
                ...
                int formatLength = formatText.Length;
                for (int i = 0; i < formatLength; i++)
                {
                    char c = formatText[i];
                    builder.Builder.Append(c);
                    if ((c == '{' || c == '}') && (i + 1) < formatLength && formatText[i + 1] == c)
                    {
                        i++;
                    }
                }
                ... builder.ToStringAndFree() ...

In reply to: 190505462 [](ancestors = 190505462)

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. What code are you referring too?

Or are you saying that line 86:

 formatString.Builder.Append(part.ConstantValue.StringValue);

does the same thing as:

        int formatLength = s.Length;
        for (int i = 0; i < formatLength; i++)
        {
            char c = s[i];
            stringBuilder.Append(c);
            if ((c == '{' || c == '}') && (i + 1) < formatLength && s[i + 1] == c)
            {
                i++;
            }
        }

?

It's not the same. When lowering an interpolation to string.Format, any "{{" must remain "{{", because string.Format needs them to be escaped. When lowering to string concatenation, they need to be unescaped to "{". #Closed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I was looking at a local copy of the file before your change. Sorry for the confusion.


In reply to: 191042562 [](ancestors = 191042562)

Copy link
Copy Markdown
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review pass (iteration 3).

@gafter
Copy link
Copy Markdown
Member

gafter commented May 23, 2018

I've looked at the tests and I believe the cases I was interested in are tested (e.g. calling many-argument versions of string.Concat). #Resolved

return _factory.StringLiteral(builder.ToStringAndFree());

result = _factory.Coalesce(result, _factory.StringLiteral(""));
}
Copy link
Copy Markdown
Member

@gafter gafter May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this unless there was no string concatenation. #Resolved

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not functionally wrong to do this, even if there was a concatenation, because we remove it again in the lowering of the coalescing operator. I noticed this technique is used in lot's of places. (For example, the lowering of a non-binary string concatenation builds lot's of intermediary structures that are broken down again when lowering a higher level.) So I used the same technique here to clean up the code considerably. Also, notice that the unit tests all still pass, nothing has changed there.

But you are correct, a simple if (length == 1) avoids it in most cases.
#Resolved

@alrz
Copy link
Copy Markdown
Member

alrz commented May 27, 2018

Now that this is going in, would it also make sense to accept such literals as a constant in the language (with an additional requirement of every interpolation to be a constant) so we can use them in attributes etc? #Resolved

@gafter
Copy link
Copy Markdown
Member

gafter commented May 28, 2018

@alrz Are you asking a language question in the compiler repository? Shame on you! #Resolved


// string concatenation is never null.
// interpolated string lowering may introduce redundant null coalescing, which we have to remove.
if (IsStringConcat(rewrittenLeft))
Copy link
Copy Markdown
Member

@jcouv jcouv May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? #Resolved

Copy link
Copy Markdown
Contributor Author

@KrisVandermotten KrisVandermotten May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the case $"{a + b}" (where a + b is a string). #Resolved

Copy link
Copy Markdown
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review pass (iteration 5).

Copy link
Copy Markdown
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks

@jcouv
Copy link
Copy Markdown
Member

jcouv commented May 31, 2018

@gafter for a second look. Thanks

@jcouv jcouv self-assigned this May 31, 2018
Copy link
Copy Markdown
Member

@gafter gafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jaredpar
Copy link
Copy Markdown
Member

jaredpar commented Jun 1, 2018

Approved

@jcouv
Copy link
Copy Markdown
Member

jcouv commented Jun 1, 2018

We're all set to merge.

@KrisVandermotten Thanks a lot for the contribution.
Is it ok with you if we "Squash and merge" this PR? It makes for a cleaner git history and it is the convention for the compiler team. Thanks

@alrz
Copy link
Copy Markdown
Member

alrz commented Jun 1, 2018

IT'S HAPPENING! now I don't feel guilty every time I use string interpolation.

@KrisVandermotten
Copy link
Copy Markdown
Contributor Author

@jcouv Go ahead, squash and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved to merge Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants