Skip to content

Collection expressions: avoid intermediate List<T> if spread elements have known length#69875

Closed
cston wants to merge 12 commits intodotnet:features/CollectionLiteralsfrom
cston:known-length
Closed

Collection expressions: avoid intermediate List<T> if spread elements have known length#69875
cston wants to merge 12 commits intodotnet:features/CollectionLiteralsfrom
cston:known-length

Conversation

@cston
Copy link
Contributor

@cston cston commented Sep 10, 2023

If all spread elements are countable and the collection satisfies certain heuristics, calculate the expected length of the resulting collection and:

  • If the target is T[], Span<T>, or ReadOnlySpan<T>, allocate the array or span at the expected length to avoid an intermediate buffer.
  • If the target is List<T>, set the capacity to avoid resizing as items are added.

@ghost ghost added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Sep 10, 2023
@cston cston force-pushed the known-length branch 3 times, most recently from ca89062 to 2a7a073 Compare September 11, 2023 22:04
@cston cston marked this pull request as ready for review September 11, 2023 22:09
@cston cston requested a review from a team as a code owner September 11, 2023 22:09
Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done review pass. Haven't looked at the tests in depth yet.

}
else if ((collectionTypeKind == CollectionExpressionTypeKind.ListInterface && isListInterfaceThatRequiresList(targetType)) ||
elements.Any(e => e is BoundCollectionExpressionSpreadElement)) // https://github.com/dotnet/roslyn/issues/68785: Avoid intermediate List<T> if all spread elements have Length property.
node.GetKnownLength(out _) is null)
Copy link
Member

@333fred 333fred Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming this parameter for clarity. #Resolved

<Field Name="EnumeratorInfoOpt" Type="ForEachEnumeratorInfo?"/>
<!-- Collection Length or Count property value. -->
<Field Name="LengthOrCount" Type="BoundExpression?" SkipInVisitor="true" Null="allow"/>
<!-- Collection element placeholder. -->
Copy link
Member

@333fred 333fred Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me from reading these comments what the difference between this and ExpressionPlaceholder is. Consider elaborating why we have both and what the difference between them is. #Resolved

Copy link
Contributor Author

@cston cston Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpressionPlaceholder represents the entire collection and ElementPlaceholder represents the current item from the collection. Added a comment where ElementPlaceholder is used.

public override BoundNode? VisitCollectionExpressionSpreadElement(BoundCollectionExpressionSpreadElement node)
{
base.VisitCollectionExpressionSpreadElement(node);
SetResultType(node, default);
Copy link
Member

@333fred 333fred Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be default? Can it ever actually be null? #ByDesign

Copy link
Contributor Author

@cston cston Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it ever actually be null?

Is the question: Can BoundCollectionExpressionSpreadElement.Type be null? With this PR, the type of this expression is always null since the type is not used.

It looks like we use the same SetResultType(node, default) for BoundThrowExpression, which seems like a similar case where the BoundExpression.Type is not meaningful.

var initialization = new BoundArrayInitialization(
syntax,
isInferred: false,
elements.SelectAsArray(e => VisitExpression(e)));
Copy link
Member

@333fred 333fred Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this static, and passing through this as a parameter, so the lambda can be cached. #Resolved

@cston cston requested a review from a team September 15, 2023 15:35
@RikkiGibson RikkiGibson self-assigned this Sep 15, 2023
Copy link
Member

@RikkiGibson RikkiGibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR LGTM but it sounds like LDM may need to discuss the exact strategy in use here before we merge.

<!-- Type is not significant for this node type; always null -->
<Field Name="Type" Type="TypeSymbol?" Override="true" Null="always"/>
<!-- Collection being spread. -->
<Field Name="Expression" Type="BoundExpression"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Operand" might be a good name here

@@ -4753,30 +4753,39 @@ BoundExpression bindSpreadElement(SpreadElementSyntax syntax, BindingDiagnosticB
return new BoundCollectionExpressionSpreadElement(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am beginning to feel that "spread element" is too ambiguous with the "iterator element" of the spread, the "collection element" of the collection expression, etc. It is making it more difficult to spec and implement the feature. No need to change in this PR, but I'd like to re-examine whether we can make things easier for ourselves by adopting a term like "spread operator" here.

diagnostics.Add(syntax.Expression, useSiteInfo);
expression = ConvertForEachCollection(expression, conversion, collectionType, diagnostics);
var elementPlaceholder = new BoundValuePlaceholder(syntax.Expression, enumeratorInfo.ElementType);
var convertedExpression = ConvertForEachCollection(expressionPlaceholder, conversion, collectionType, diagnostics);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converts the spread operand to the collection type we found in its EnumeratorInfo. In most cases this really converts the spread operand to the same type, except with arrays, where some wrapping is occurring (reference conv to IEnumerable). Possibly in MQ we could simplify some of this.

{
return element.Update(
BindToNaturalType(element.Expression, BindingDiagnosticBag.Discarded, reportNoTargetType: false),
expressionPlaceholder: element.ExpressionPlaceholder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected the placeholder would always be null in this path. Consider passing null explicitly to make that clear.


<Node Name="BoundCollectionExpressionSpreadExpressionPlaceholder" Base="BoundValuePlaceholderBase"/>

<Node Name="BoundCollectionExpressionSpreadElement" Base="BoundExpression">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"BoundCollectionExpressionSpreadOperator" might be a better name here

<Field Name="AddElementPlaceholder" Type="BoundValuePlaceholder?" SkipInVisitor="true" Null="allow"/>
<Field Name="AddMethodInvocation" Type="BoundStatement?" SkipInVisitor="true" Null="allow"/>
<!-- Statement executed for each collection element. -->
<Field Name="IteratorBody" Type="BoundStatement?" SkipInVisitor="true" Null="allow"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally found it slightly surprising that some of these constructs are synthesized in binding. I figured we would just get symbols for Add method, Length/Count methods, etc., and include them on this node, but actually constructing the statements which add elements to the resulting collection expression would be done in lowering. However, I don't feel strongly enough about the decision to suggest any change here.

BoundExpression array;

switch (node.GetKnownLength())
if (node.GetKnownLength(hasSpreadElements: out _) is null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this factoring causing us to have to get the known length multiple times? Does that matter?


RemovePlaceholderReplacement(addElementPlaceholder);
// Rewrite expressions into temporaries.
foreach (var element in elements)
Copy link
Member

@RikkiGibson RikkiGibson Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about this. Given we are creating an array I am not sure it is fine to create potentially very large numbers of temporaries just because a spread element (with runtime-known length) is present. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, for this release, I would be absolutely fine with having an intermediate List for these cases. I could easily see a code generator wanting to write something like

public static readonly string[] arr1 = ["bunch", "of", "values", "like", "thousands"];
public static readonly string[] arr2 = [..arr1, "even", "more", "values"];

In this specific case, maybe blowing the stack on a static constructor is less likely, but still maybe possible.

Also, I think in some cases compiler can probably determine that a spread operator like ..arr1 is non-side-effecting to evaluate, e.g. we can read Length before evaluating the elements in-order without a problem, and optimize accordingly. I would def feel comfortable saying that if your Length property itself is side effecting, then bad things may happen, sorry.

Copy link
Contributor Author

@cston cston Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to only create temporaries up to the last spread element, and with a small maximum number of temporaries. That will still cover cases such as [..s, e1, e2, <more>] and [..s1, ..s2] in particular.

{
var rewrittenExpression = RewriteCollectionExpressionElementExpression(elements[i]);
BoundAssignmentOperator assignmentToTemp;
BoundLocal temp = _factory.StoreToTemp(rewrittenExpression, out assignmentToTemp, isKnownToReferToTempIfReferenceType: true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we shouldn't require a temporary for an element that has a constant value.

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good. Would like some investigation of the TryBindLengthOrCount question before signing off though.

{
Binder.Error(diagnostics, ErrorCode.ERR_BadAttributeArgument, node.Syntax);
attrHasErrors = true;
return new TypedConstant(spread.Expression.Type, TypedConstantKind.Error, null);
Copy link
Member

@333fred 333fred Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the naming the last argument. #Resolved

BoundExpression? lengthOrCount;
if (!TryBindLengthOrCount(syntax.Expression, expressionPlaceholder, out lengthOrCount, diagnostics))
{
lengthOrCount = null;
Copy link
Member

@333fred 333fred Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a temp diagnostics bag that we add to diagnostics only if we're in the true path here. It looks like this will add use-site diagnostics from Length or Count, if any are found, even if it then returns false some some other reason. #Resolved

Copy link
Contributor Author

@cston cston Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've added a test, SpreadElement_LengthUseSiteError, and I think we probably want to report the use-site errors even if TryBindLengthOrCount returns false because the reason it returned false might be due to the same issue that resulted in the use-site error. This is consistent with how TryBindLengthOrCount is used for patterns.

internal partial class BoundCollectionExpressionBase
{
internal int? GetKnownLength()
internal bool HasSpreadElements(out int numberIncludingLastSpread, out bool hasKnownLength)
Copy link
Member

@333fred 333fred Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a doc comment with an example of what IncludingLastSpread looks like would be helpful for future maintainability here. #Resolved

@cston
Copy link
Contributor Author

cston commented Sep 29, 2023

@333fred, @RikkiGibson, please review the latest commit which uses List<T>..ctor(int capacity), thanks.

collectionType.OriginalDefinition.Equals(_compilation.GetWellKnownType(WellKnownType.System_Collections_Generic_List_T)))
{
// List<ElementType> list = new(N + s1.Length + ...);
var constructor = ((MethodSymbol)_factory.WellKnownMember(WellKnownMember.System_Collections_Generic_List_T__ctorInt32)).AsMember((NamedTypeSymbol)collectionType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to look to see if this constructor exists? and only use it if so? or is this ok to just do as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the documentation, this constructor is available on .NET Framework 2.0 and later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the compiler doesn't crash when this member is missing, I'm satisfied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're testing the missing constructor case in KnownLength_List_MissingConstructor().

@cston
Copy link
Contributor Author

cston commented Oct 2, 2023

@333fred, @RikkiGibson, please review the latest commit which uses List<T>..ctor(int capacity), thanks.

Never mind, I've moved the latest commit to the next PR #70197 instead.

@RikkiGibson
Copy link
Member

Is this PR ready to merge? @cston @jaredpar

@cston
Copy link
Contributor Author

cston commented Oct 4, 2023

Is this PR ready to merge? @cston @jaredpar

The changes in the PR have two approvals.

@jaredpar
Copy link
Member

jaredpar commented Oct 4, 2023

No, have to go through qb mode on these. Waiting for them to all be double reviewed beforetaking through QB

@cston cston changed the base branch from release/dev17.8 to features/CollectionLiterals October 5, 2023 16:50
@jcouv
Copy link
Member

jcouv commented Oct 5, 2023

Already merged elsewhere

@jcouv jcouv closed this Oct 5, 2023
@cston
Copy link
Contributor Author

cston commented Oct 5, 2023

Merged as part of #70197.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers Feature - Collection Expressions untriaged Issues and PRs which have not yet been triaged by a lead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants