Specialize the single-selector overload of SelectMany.#13942
Specialize the single-selector overload of SelectMany.#13942VSadov merged 8 commits intodotnet:masterfrom
Conversation
There was a problem hiding this comment.
I would have preferred to keep this SelectManyIterator, but the compiler complains because there are other SelectManyIterator members.
There was a problem hiding this comment.
TODO: Add tests similar to the ones in Concat to make sure checked is used.
There was a problem hiding this comment.
Are you going to add these tests in this PR?
There was a problem hiding this comment.
Should we set _subEnumerator to null here? That would help in the case where the subsequent GetEnumerator() call throws, at which point if we didn't null this out we'd be holding on to a disposed enumerator.
What about the regressions for iteration and ToArray? |
stephentoub
left a comment
There was a problem hiding this comment.
Other than my few questions/comments, LGTM. Thanks!
I believe those are just random fluctuations in the results... most of them appear to be when the sub-collection length is really small, like 1-3, and edit: I forgot to add, most of the regressions are like 5-10% and inconsistent. |
|
LGTM Also was interesting to learn about #6892. Nice change too. Normally I would not like an idea of exposing internal buffers to strangers, but indeed, |
I'm not convinced that there aren't real regressions in some cases. Consider the common case where the enumerable returned for each iteration isn't an ICollection. Won't List.AddRange with its current implementation end up doing more work than if each element were Add'd individually? |
There was a problem hiding this comment.
_subEnumerator may not be null here, and need to be disposed too, if the enumeration is jumped out of.
There was a problem hiding this comment.
Good point. I'll add a dispose for that too.
|
@stephentoub Ah, you make a good point. Looks like how it's currently implemented we'll end up in a loop calling |
There was a problem hiding this comment.
I believe that if previously an exception came out of here, such as from the MoveNext, the instance would be Dispose'd immediately; now it requires Dispose to be called on the enumerator explicitly. Are we ok with that @VSadov?
There was a problem hiding this comment.
Doesn't this happen every time we replace a yield-based iterator with a class?
There was a problem hiding this comment.
@VSadov, can you comment on whether you think this is an acceptable change when you have time? Thanks.
There was a problem hiding this comment.
It is a common practice in LINQ. See, for example implementation of SelectEnumerableIterator
Since client can stop enumerating at any time, voluntarily or due to a fault, the client needs to be able to dispose the entire internal state of the iterator. Thus we create a composite Dispose.
We could stop just at that, but we often, as a good effort, dispose on state transitions as well. That is mostly to release resources early and does not need to be hardened for faulting scenarios.
|
Resolved merge conflicts |
|
@stephentoub Responding to your comments from earlier:
My tests didn't actually hit that branch since each subcollection was an array. However, I modified them slightly to remove the |
|
LGTM |
Specialize the single-selector overload of SelectMany. Commit migrated from dotnet/corefx@5bf69d1
e.SelectMany(i => i)is a very popular option for flattening a list of enumerables: http://stackoverflow.com/questions/1590723/flatten-list-in-linqThis PR optimizes
SelectManycalls followed byToArrayorToList, leading to a substantial (~40%) speedup. Instead of looping through each item of the projected sequence, yield returning it from the iterator, and adding it to the list/LargeArrayBuilder, we simply callAddRange.Performance test: https://github.com/jamesqo/Dotnet/blob/05daf42403b615942706a7e9be32c8b22db80e67/Program.cs Results: https://gist.github.com/jamesqo/635e6e49b3ceb6e9527161f3472188f7
You may notice that there are a lot of regressions (substantially more allocations happening) for
ToList. That puzzled me too, until I found out the version of S.P.CoreLib I was using did not include this change; in my tests, a buffer was being allocated every timeList.AddRangewas called. Still, however, the new implementation was faster.I've also added more testing for these changes in the PR.
cc @JonHanna @VSadov @stephentoub