Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Optimize Enumerable.SkipLast() for IPartition and IList#41342

Merged
stephentoub merged 5 commits intodotnet:masterfrom
timandy:opt-linq
Oct 23, 2019
Merged

Optimize Enumerable.SkipLast() for IPartition and IList#41342
stephentoub merged 5 commits intodotnet:masterfrom
timandy:opt-linq

Conversation

@timandy
Copy link
Contributor

@timandy timandy commented Sep 26, 2019

Enumerable.SkipLast() can be cast to Enumerable.Take() when source type is IPartition or IList.

performance test see #40014

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 26, 2019

Can you provide some pre and post performance figures using BenachmarkDotNet? There have been a number of suggestions to change the behaviour of linq query methods which were rejected because they degrade the performance of the common case in favour of an edge case,

@timandy
Copy link
Contributor Author

timandy commented Sep 26, 2019

Can you provide some pre and post performance figures using BenachmarkDotNet? There have been a number of suggestions to change the behaviour of linq query methods which were rejected because they degrade the performance of the common case in favour of an edge case,

@Wraith2 Performance test is in the issue #40014. Should I move the testcase to here?

@timandy
Copy link
Contributor Author

timandy commented Sep 26, 2019

    [MemoryDiagnoser]
    public class SkipLastPerf
    {
        private IEnumerable<int> data;

        [Params(10, 100, 1000)]
        public int COUNT;

        [GlobalSetup]
        public void Setup()
        {
            List<int> tmp = new List<int>(COUNT);
            for (int i = 0; i < COUNT; i++)
            {
                tmp.Add(i);
            }

            data = tmp;
        }

        [Benchmark]
        public int SkipLast()
        {
            return data.SkipLast(5).Sum();
        }

        [Benchmark(Baseline = true)]
        public int SkipLastOld()
        {
            return data.SkipLastOld(5).Sum();
        }
    }
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.915 (1803/April2018Update/Redstone4)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
Frequency=1945313 Hz, Resolution=514.0561 ns, Timer=TSC
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT

Method COUNT Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
SkipLast 10 71.08 ns 1.352 ns 1.198 ns 0.32 0.0114 - - 48 B
SkipLastOld 10 223.76 ns 4.636 ns 4.761 ns 1.00 0.0591 - - 248 B
SkipLast 100 806.39 ns 6.829 ns 6.053 ns 0.42 0.0114 - - 48 B
SkipLastOld 100 1,924.12 ns 35.878 ns 31.805 ns 1.00 0.0572 - - 248 B
SkipLast 1000 8,304.96 ns 88.786 ns 78.707 ns 0.43 - - - 48 B
SkipLastOld 1000 19,396.12 ns 138.287 ns 115.476 ns 1.00 0.0305 - - 248 B

@safern safern requested a review from maryamariyan September 26, 2019 18:59
@maryamariyan
Copy link

Thank you @timandy. The benchmarks test results you provided show improvements. LGTM but also would like to get @stephentoub 's feedback on this.

@adamsitnik would you recommend benchmark tests like these get included in the dotnet benchmark repo?

Copy link

@maryamariyan maryamariyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
cc: @stephentoub

@stephentoub
Copy link
Member

For the benchmarks, the tests shared show the optimized case. Can you share results when the input doesn't hit the optimizations, namely when the input isn't an IList or an IPartition? For example, your setup does this:

        [GlobalSetup]
        public void Setup()
        {
            List<int> tmp = new List<int>(COUNT);
            for (int i = 0; i < COUNT; i++)
            {
                tmp.Add(i);
            }

            data = tmp;
        }

What are the results of your test if you instead do this:

        [GlobalSetup]
        public void Setup()
        {
            data = GetInput(COUNT);
        }

        private static IEnumerable<int> GetInput(int count)
        {
            for (int i = 0; i < count; i++) yield return i;
        }

?

Thanks.

@timandy
Copy link
Contributor Author

timandy commented Sep 27, 2019

@stephentoub I test the code with your testcase, The result is

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.915 (1803/April2018Update/Redstone4)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
Frequency=1945313 Hz, Resolution=514.0561 ns, Timer=TSC
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT

Method COUNT Mean Error StdDev Median Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
SkipLast 10 238.4 ns 6.197 ns 16.325 ns 231.7 ns 1.20 0.12 0.0591 - - 248 B
SkipLastOld 10 212.9 ns 2.897 ns 2.709 ns 213.0 ns 1.00 0.00 0.0591 - - 248 B
SkipLast 100 1,759.0 ns 28.236 ns 22.045 ns 1,759.6 ns 1.01 0.02 0.0591 - - 248 B
SkipLastOld 100 1,736.1 ns 33.219 ns 32.626 ns 1,728.5 ns 1.00 0.00 0.0591 - - 248 B
SkipLast 1000 16,974.7 ns 333.448 ns 356.785 ns 16,836.1 ns 1.05 0.03 0.0305 - - 248 B
SkipLastOld 1000 16,286.4 ns 195.489 ns 182.860 ns 16,311.9 ns 1.00 0.00 0.0305 - - 248 B

@stephentoub
Copy link
Member

stephentoub commented Sep 27, 2019

test the code with your testcase

Thanks for sharing the results.

Do we have any evidence around the typical type and length of input used with SkipLast? As expected, this demonstrates a nice improvement for the optimized case, but we pay for that with a non-trivial regression (20%) in the case of a short regular enumerable.

@stephentoub
Copy link
Member

Other than my question in #41342 (comment), LGTM.

@timandy
Copy link
Contributor Author

timandy commented Sep 29, 2019

@stephentoub I rerun the testcase several times, and wait the programe exit without do anything on my computer. About 5%-6% slower than before optimization.

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.915 (1803/April2018Update/Redstone4)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
Frequency=1945313 Hz, Resolution=514.0561 ns, Timer=TSC
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT

Method COUNT Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
SkipLast 10 236.5 ns 2.883 ns 2.555 ns 1.05 0.01 0.0591 - - 248 B
SkipLastOld 10 224.5 ns 2.305 ns 2.156 ns 1.00 0.00 0.0591 - - 248 B
SkipLast 100 1,845.4 ns 34.071 ns 40.560 ns 1.01 0.03 0.0591 - - 248 B
SkipLastOld 100 1,841.8 ns 28.275 ns 26.448 ns 1.00 0.00 0.0591 - - 248 B
SkipLast 1000 17,342.4 ns 198.178 ns 185.376 ns 1.02 0.02 0.0305 - - 248 B
SkipLastOld 1000 16,953.2 ns 184.808 ns 172.869 ns 1.00 0.00 0.0305 - - 248 B

@timandy
Copy link
Contributor Author

timandy commented Oct 9, 2019

@maryamariyan What should I do next?

@maryamariyan
Copy link

maryamariyan commented Oct 9, 2019

Thanks a lot for the PR @timandy.
Overall the results show the optimized case is faster and that is great. But the issue is with the regressed cases: If the regressed cases for this optimization are also frequently used then merging this PR would be a problem.

In other words, it would be hard to merge the PR if there is no evidence demonstrating that for the "most used cases" we have not made a regression. (The question posed in @stephentoub 's last comment needs to be answered before we can confidently merge the PR).

@timandy
Copy link
Contributor Author

timandy commented Oct 9, 2019

I think it's worthy to use 5% loss of the performance swap 60% performance improvement. And the time of one option for small amount is very very short.

@stephentoub
Copy link
Member

I think it's worthy to use 5% loss of the performance swap 60% performance improvement. And the time of one option for small amount is very very short.

Waving my hands, if the 5% regression is for a case that's > 12x as common as the 60% improvement, this is a net loss. What evidence do we have about how common these cases are? That's what I'm trying to understand.

@safern
Copy link
Member

safern commented Oct 22, 2019

@timandy are you planning on following up on this wrt the question asked above?

@stephentoub
Copy link
Member

Ok. I just looked at what we do for Enumerable.Skip, Take, and TakeLast, and for all of those we check both of these interfaces. Given that, it seems reasonable to be consistent and complete this one with those two interface as well. I'd like to avoid another PR that does the same in other operators without compelling data about why it's worthwhile. Thanks.

@stephentoub stephentoub merged commit bdd3397 into dotnet:master Oct 23, 2019
@timandy timandy deleted the opt-linq branch November 28, 2019 01:08
@karelz karelz added this to the 5.0 milestone Dec 19, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…x#41342)

* Optimize Enumerable.SkipLast() for IPartition and IList

* Convert to conditional expression

* Add parameter name

* Revert "Convert to conditional expression"

This reverts commit 7c7908fb

* Format conditional expression


Commit migrated from dotnet/corefx@bdd3397
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants