Collection expressions: avoid intermediate List<T> if spread elements have known length#69875
Collection expressions: avoid intermediate List<T> if spread elements have known length#69875cston wants to merge 12 commits intodotnet:features/CollectionLiteralsfrom
Conversation
ca89062 to
2a7a073
Compare
… have known length
src/Compilers/CSharp/Portable/BoundTree/BoundCollectionExpression.cs
Outdated
Show resolved
Hide resolved
333fred
left a comment
There was a problem hiding this comment.
Done review pass. Haven't looked at the tests in depth yet.
| } | ||
| else if ((collectionTypeKind == CollectionExpressionTypeKind.ListInterface && isListInterfaceThatRequiresList(targetType)) || | ||
| elements.Any(e => e is BoundCollectionExpressionSpreadElement)) // https://github.com/dotnet/roslyn/issues/68785: Avoid intermediate List<T> if all spread elements have Length property. | ||
| node.GetKnownLength(out _) is null) |
There was a problem hiding this comment.
Consider naming this parameter for clarity. #Resolved
| <Field Name="EnumeratorInfoOpt" Type="ForEachEnumeratorInfo?"/> | ||
| <!-- Collection Length or Count property value. --> | ||
| <Field Name="LengthOrCount" Type="BoundExpression?" SkipInVisitor="true" Null="allow"/> | ||
| <!-- Collection element placeholder. --> |
There was a problem hiding this comment.
It's unclear to me from reading these comments what the difference between this and ExpressionPlaceholder is. Consider elaborating why we have both and what the difference between them is. #Resolved
There was a problem hiding this comment.
ExpressionPlaceholder represents the entire collection and ElementPlaceholder represents the current item from the collection. Added a comment where ElementPlaceholder is used.
| public override BoundNode? VisitCollectionExpressionSpreadElement(BoundCollectionExpressionSpreadElement node) | ||
| { | ||
| base.VisitCollectionExpressionSpreadElement(node); | ||
| SetResultType(node, default); |
There was a problem hiding this comment.
Why would this be default? Can it ever actually be null? #ByDesign
There was a problem hiding this comment.
Can it ever actually be
null?
Is the question: Can BoundCollectionExpressionSpreadElement.Type be null? With this PR, the type of this expression is always null since the type is not used.
It looks like we use the same SetResultType(node, default) for BoundThrowExpression, which seems like a similar case where the BoundExpression.Type is not meaningful.
| var initialization = new BoundArrayInitialization( | ||
| syntax, | ||
| isInferred: false, | ||
| elements.SelectAsArray(e => VisitExpression(e))); |
There was a problem hiding this comment.
Consider making this static, and passing through this as a parameter, so the lambda can be cached. #Resolved
RikkiGibson
left a comment
There was a problem hiding this comment.
The PR LGTM but it sounds like LDM may need to discuss the exact strategy in use here before we merge.
| <!-- Type is not significant for this node type; always null --> | ||
| <Field Name="Type" Type="TypeSymbol?" Override="true" Null="always"/> | ||
| <!-- Collection being spread. --> | ||
| <Field Name="Expression" Type="BoundExpression"/> |
There was a problem hiding this comment.
"Operand" might be a good name here
| @@ -4753,30 +4753,39 @@ BoundExpression bindSpreadElement(SpreadElementSyntax syntax, BindingDiagnosticB | |||
| return new BoundCollectionExpressionSpreadElement( | |||
There was a problem hiding this comment.
I am beginning to feel that "spread element" is too ambiguous with the "iterator element" of the spread, the "collection element" of the collection expression, etc. It is making it more difficult to spec and implement the feature. No need to change in this PR, but I'd like to re-examine whether we can make things easier for ourselves by adopting a term like "spread operator" here.
| diagnostics.Add(syntax.Expression, useSiteInfo); | ||
| expression = ConvertForEachCollection(expression, conversion, collectionType, diagnostics); | ||
| var elementPlaceholder = new BoundValuePlaceholder(syntax.Expression, enumeratorInfo.ElementType); | ||
| var convertedExpression = ConvertForEachCollection(expressionPlaceholder, conversion, collectionType, diagnostics); |
There was a problem hiding this comment.
This converts the spread operand to the collection type we found in its EnumeratorInfo. In most cases this really converts the spread operand to the same type, except with arrays, where some wrapping is occurring (reference conv to IEnumerable). Possibly in MQ we could simplify some of this.
| { | ||
| return element.Update( | ||
| BindToNaturalType(element.Expression, BindingDiagnosticBag.Discarded, reportNoTargetType: false), | ||
| expressionPlaceholder: element.ExpressionPlaceholder, |
There was a problem hiding this comment.
I expected the placeholder would always be null in this path. Consider passing null explicitly to make that clear.
|
|
||
| <Node Name="BoundCollectionExpressionSpreadExpressionPlaceholder" Base="BoundValuePlaceholderBase"/> | ||
|
|
||
| <Node Name="BoundCollectionExpressionSpreadElement" Base="BoundExpression"> |
There was a problem hiding this comment.
"BoundCollectionExpressionSpreadOperator" might be a better name here
| <Field Name="AddElementPlaceholder" Type="BoundValuePlaceholder?" SkipInVisitor="true" Null="allow"/> | ||
| <Field Name="AddMethodInvocation" Type="BoundStatement?" SkipInVisitor="true" Null="allow"/> | ||
| <!-- Statement executed for each collection element. --> | ||
| <Field Name="IteratorBody" Type="BoundStatement?" SkipInVisitor="true" Null="allow"/> |
There was a problem hiding this comment.
I personally found it slightly surprising that some of these constructs are synthesized in binding. I figured we would just get symbols for Add method, Length/Count methods, etc., and include them on this node, but actually constructing the statements which add elements to the resulting collection expression would be done in lowering. However, I don't feel strongly enough about the decision to suggest any change here.
| BoundExpression array; | ||
|
|
||
| switch (node.GetKnownLength()) | ||
| if (node.GetKnownLength(hasSpreadElements: out _) is null) |
There was a problem hiding this comment.
Is this factoring causing us to have to get the known length multiple times? Does that matter?
|
|
||
| RemovePlaceholderReplacement(addElementPlaceholder); | ||
| // Rewrite expressions into temporaries. | ||
| foreach (var element in elements) |
There was a problem hiding this comment.
Not sure how I feel about this. Given we are creating an array I am not sure it is fine to create potentially very large numbers of temporaries just because a spread element (with runtime-known length) is present. #Resolved
There was a problem hiding this comment.
Basically, for this release, I would be absolutely fine with having an intermediate List for these cases. I could easily see a code generator wanting to write something like
public static readonly string[] arr1 = ["bunch", "of", "values", "like", "thousands"];
public static readonly string[] arr2 = [..arr1, "even", "more", "values"];In this specific case, maybe blowing the stack on a static constructor is less likely, but still maybe possible.
Also, I think in some cases compiler can probably determine that a spread operator like ..arr1 is non-side-effecting to evaluate, e.g. we can read Length before evaluating the elements in-order without a problem, and optimize accordingly. I would def feel comfortable saying that if your Length property itself is side effecting, then bad things may happen, sorry.
There was a problem hiding this comment.
Updated to only create temporaries up to the last spread element, and with a small maximum number of temporaries. That will still cover cases such as [..s, e1, e2, <more>] and [..s1, ..s2] in particular.
| { | ||
| var rewrittenExpression = RewriteCollectionExpressionElementExpression(elements[i]); | ||
| BoundAssignmentOperator assignmentToTemp; | ||
| BoundLocal temp = _factory.StoreToTemp(rewrittenExpression, out assignmentToTemp, isKnownToReferToTempIfReferenceType: true); |
There was a problem hiding this comment.
Ideally, we shouldn't require a temporary for an element that has a constant value.
333fred
left a comment
There was a problem hiding this comment.
Overall looking good. Would like some investigation of the TryBindLengthOrCount question before signing off though.
| { | ||
| Binder.Error(diagnostics, ErrorCode.ERR_BadAttributeArgument, node.Syntax); | ||
| attrHasErrors = true; | ||
| return new TypedConstant(spread.Expression.Type, TypedConstantKind.Error, null); |
There was a problem hiding this comment.
Consider the naming the last argument. #Resolved
| BoundExpression? lengthOrCount; | ||
| if (!TryBindLengthOrCount(syntax.Expression, expressionPlaceholder, out lengthOrCount, diagnostics)) | ||
| { | ||
| lengthOrCount = null; |
There was a problem hiding this comment.
We probably need a temp diagnostics bag that we add to diagnostics only if we're in the true path here. It looks like this will add use-site diagnostics from Length or Count, if any are found, even if it then returns false some some other reason. #Resolved
There was a problem hiding this comment.
Good catch. I've added a test, SpreadElement_LengthUseSiteError, and I think we probably want to report the use-site errors even if TryBindLengthOrCount returns false because the reason it returned false might be due to the same issue that resulted in the use-site error. This is consistent with how TryBindLengthOrCount is used for patterns.
| internal partial class BoundCollectionExpressionBase | ||
| { | ||
| internal int? GetKnownLength() | ||
| internal bool HasSpreadElements(out int numberIncludingLastSpread, out bool hasKnownLength) |
There was a problem hiding this comment.
I think a doc comment with an example of what IncludingLastSpread looks like would be helpful for future maintainability here. #Resolved
|
@333fred, @RikkiGibson, please review the latest commit which uses |
| collectionType.OriginalDefinition.Equals(_compilation.GetWellKnownType(WellKnownType.System_Collections_Generic_List_T))) | ||
| { | ||
| // List<ElementType> list = new(N + s1.Length + ...); | ||
| var constructor = ((MethodSymbol)_factory.WellKnownMember(WellKnownMember.System_Collections_Generic_List_T__ctorInt32)).AsMember((NamedTypeSymbol)collectionType); |
There was a problem hiding this comment.
do you need to look to see if this constructor exists? and only use it if so? or is this ok to just do as is?
There was a problem hiding this comment.
From the documentation, this constructor is available on .NET Framework 2.0 and later.
There was a problem hiding this comment.
If the compiler doesn't crash when this member is missing, I'm satisfied
There was a problem hiding this comment.
We're testing the missing constructor case in KnownLength_List_MissingConstructor().
Never mind, I've moved the latest commit to the next PR #70197 instead. |
|
No, have to go through qb mode on these. Waiting for them to all be double reviewed beforetaking through QB |
|
Already merged elsewhere |
|
Merged as part of #70197. |
If all spread elements are countable and the collection satisfies certain heuristics, calculate the expected length of the resulting collection and:
T[],Span<T>, orReadOnlySpan<T>, allocate the array or span at the expected length to avoid an intermediate buffer.List<T>, set the capacity to avoid resizing as items are added.