Skip to content

Optimize windowed #809

@james-wasson

Description

@james-wasson

Intro

I have been working towards increasing the performance of the windowed method and have made some good headway. Main performance gains come from using built in itertools methods and using islice to "fast forward" the windows.

At low list sizes the performance does take a small hit but I did try to minimize that as much as possible.
As the list size, n, or step get larger the benefits become clearer gaining approximately a 25-50% performance gain.

Testing

Test:

from more_itertools import windowed, consume
%timeit -n 100000 consume(windowed([1]*10, 3, step=1))
%timeit -n 100000 consume(windowed([1]*10, 3, step=2))
%timeit -n 100000 consume(windowed([1]*10, 3, step=4))
%timeit -n 100000 consume(windowed([1]*10, 3, step=6))
print()
%timeit -n 100000 consume(windowed([1]*20, 5, step=1))
%timeit -n 100000 consume(windowed([1]*20, 5, step=2))
%timeit -n 100000 consume(windowed([1]*20, 5, step=4))
%timeit -n 100000 consume(windowed([1]*20, 5, step=8))
%timeit -n 100000 consume(windowed([1]*20, 5, step=12))
print()
%timeit -n 100000 consume(windowed([1]*40, 10, step=1))
%timeit -n 100000 consume(windowed([1]*40, 10, step=4))
%timeit -n 100000 consume(windowed([1]*40, 10, step=7))
%timeit -n 100000 consume(windowed([1]*40, 10, step=9))
%timeit -n 100000 consume(windowed([1]*40, 10, step=15))
%timeit -n 100000 consume(windowed([1]*40, 10, step=25))
print()
%timeit -n 100000 consume(windowed([1]*80, 20, step=1))
%timeit -n 100000 consume(windowed([1]*80, 20, step=6))
%timeit -n 100000 consume(windowed([1]*80, 20, step=12))
%timeit -n 100000 consume(windowed([1]*80, 20, step=19))
%timeit -n 100000 consume(windowed([1]*80, 20, step=25))
%timeit -n 100000 consume(windowed([1]*80, 20, step=35))
print()
%timeit -n 10000 consume(windowed([1]*1000, 30, step=1))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=2))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=3))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=4))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=8))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=12))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=18))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=25))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=40))
%timeit -n 10000 consume(windowed([1]*1000, 30, step=80))

Before:

2.19 µs ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.01 µs ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.85 µs ± 19.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.65 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

3.39 µs ± 27 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.87 µs ± 36.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.47 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.26 µs ± 21.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.1 µs ± 32.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

6.08 µs ± 63.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.93 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.58 µs ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.44 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.05 µs ± 26.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.97 µs ± 36.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

14 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
6.25 µs ± 73.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.39 µs ± 43.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.31 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.07 µs ± 47.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

227 µs ± 901 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
135 µs ± 762 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
106 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
92.3 µs ± 669 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
67.7 µs ± 957 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
60.3 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
53.6 µs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
50.7 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
47.5 µs ± 556 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
45.3 µs ± 487 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

After:

2.43 µs ± 15.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.16 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.46 µs ± 8.91 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

3.34 µs ± 48 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.78 µs ± 46.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.42 µs ± 6.42 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.54 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.34 µs ± 34.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

5.56 µs ± 42.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.4 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.03 µs ± 88.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.99 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.65 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.44 µs ± 11 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

12.4 µs ± 70.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.86 µs ± 37.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.2 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.19 µs ± 22.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.53 µs ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.23 µs ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

204 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
112 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
82.2 µs ± 661 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
67.4 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
44.5 µs ± 486 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
37.8 µs ± 534 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
33.3 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
29.8 µs ± 471 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
15.5 µs ± 314 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
10.8 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions