Skip to content

Padded Improvement #804

@james-wasson

Description

@james-wasson

Intro

I have been working towards increasing the performance of the padded method and have made some good headway. Main performance gains come from using built in itertools methods and using generators in a more bulk friendly way.

There are a few caveats to this PR and I would like some advice on how to proceed and what would be considered best practice for this project. I will explain in more detail down below.

Performance Gains Where next_multiple is False

When next_multiple is set to False this PR is a strict improvement over the current method with no drawbacks I can see.

Test 1 - Iterable length of 10-80 with a low, medium, and high n value

Test:

from more_itertools import padded, consume
import math
for i in range(1, 5):
    iter_len = 5 * pow(2, i)
    ns = (math.floor(iter_len * 0.20), math.floor(iter_len * 0.50), math.floor(iter_len * 0.80))
    print(f'Iter Length {iter_len} with n={ns} and next_multiple=False')
    for n in ns:
        %timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=False))

Before:

Iter Length 10 with n=(2, 5, 8) and next_multiple=False
1.11 µs ± 3.91 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.1 µs ± 9.13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(4, 10, 16) and next_multiple=False
1.53 µs ± 7.69 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.54 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.52 µs ± 8.74 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(8, 20, 32) and next_multiple=False
2.38 µs ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.37 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.37 µs ± 39.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(16, 40, 64) and next_multiple=False
4.07 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.03 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.98 µs ± 49.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

Iter Length 10 with n=(2, 5, 8) and next_multiple=False
941 ns ± 31.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
935 ns ± 5.26 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
944 ns ± 3.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(4, 10, 16) and next_multiple=False
984 ns ± 5.47 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.04 µs ± 25.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(8, 20, 32) and next_multiple=False
1.13 µs ± 46.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.19 µs ± 40.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.17 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(16, 40, 64) and next_multiple=False
1.3 µs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.43 µs ± 58.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.47 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Test 2 - Iterable length of 10-80 with very low n values

Test:

from more_itertools import padded, consume
import math
for i in range(1, 5):
    iter_len = 5 * pow(2, i)
    ns = (2, 4, 6, 8, 10, 12, 15)
    print(f'Iter Length {iter_len} with n={ns} and next_multiple=False')
    for n in ns:
        %timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=False))

Before:

Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.11 µs ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.1 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.09 µs ± 7.09 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.21 µs ± 33 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.26 µs ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.54 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 6.63 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.52 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
2.35 µs ± 38 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.34 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.38 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.34 µs ± 22.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.35 µs ± 60.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.38 µs ± 52.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.34 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
3.96 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.95 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.96 µs ± 9.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.07 µs ± 165 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.99 µs ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.02 µs ± 88.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.98 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
928 ns ± 3.81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
931 ns ± 3.21 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
948 ns ± 14.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
951 ns ± 5.94 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
952 ns ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
993 ns ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.02 µs ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.04 µs ± 73.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
994 ns ± 1.64 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1 µs ± 16.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 29.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 3.77 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.03 µs ± 5.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.06 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.07 µs ± 5.28 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.08 µs ± 7.72 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.08 µs ± 2.13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.13 µs ± 33 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.15 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.23 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 33.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.26 µs ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.33 µs ± 69.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.33 µs ± 74.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Test 3 - Iterable length of 10000 with a high n value

Test:

from more_itertools import padded, consume
%timeit -n 100000 consume(padded(range(1, 10000), n=107, next_multiple=False))

Before:

521 µs ± 4.43 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

99.6 µs ± 305 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Performance Gains Where next_multiple is True

When next_multiple is set to True there are some problems with speed at very low iterable lengths (less than 10 items) and n values (less than around 9) that cause some performance degradation. I could put a switch to use the old method here to keep the performance the same but I feel that doing that may introduce unexpected bugs and hurt maintainability. I would welcome feedback here.

Test 1 - Iterable length of 10-80 with a low, medium, and high n value

Test:

from more_itertools import padded, consume
import math
for i in range(1, 5):
    iter_len = 5 * pow(2, i)
    ns = (math.floor(iter_len * 0.20), math.floor(iter_len * 0.50), math.floor(iter_len * 0.80))
    print(f'Iter Length {iter_len} with n={ns} and next_multiple=True')
    for n in ns:
        %timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=True))

Before:

Iter Length 10 with n=(2, 5, 8) and next_multiple=True
1.14 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 9.17 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.33 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(4, 10, 16) and next_multiple=True
1.55 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.58 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.92 µs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(8, 20, 32) and next_multiple=True
2.4 µs ± 69.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.37 µs ± 31.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.01 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(16, 40, 64) and next_multiple=True
3.99 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.97 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.32 µs ± 46.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

Iter Length 10 with n=(2, 5, 8) and next_multiple=True
1.9 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.43 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.5 µs ± 38.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(4, 10, 16) and next_multiple=True
2.02 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.67 µs ± 55.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(8, 20, 32) and next_multiple=True
2.2 µs ± 70.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.69 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.86 µs ± 19.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(16, 40, 64) and next_multiple=True
2.54 µs ± 58.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.01 µs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.33 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Test 2 - Iterable length of 10-80 with very low n values

Test:

from more_itertools import padded, consume
import math
for i in range(1, 5):
    iter_len = 5 * pow(2, i)
    ns = (2, 4, 6, 8, 10, 12, 15)
    print(f'Iter Length {iter_len} with n={ns} and next_multiple=True')
    for n in ns:
        %timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=True))

Before:

Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
1.13 µs ± 29 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.4 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.14 µs ± 21.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.24 µs ± 29.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.32 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
1.54 µs ± 7.65 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 5.58 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.73 µs ± 49.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.7 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.58 µs ± 44.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.71 µs ± 7.69 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.87 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
2.4 µs ± 55.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.36 µs ± 22.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.5 µs ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.38 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.35 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.61 µs ± 7.85 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.54 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
4 µs ± 59.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.96 µs ± 8.11 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.26 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.09 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.01 µs ± 44 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.17 µs ± 92.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.34 µs ± 78.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
1.89 µs ± 6.36 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.63 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.46 µs ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.51 µs ± 32.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.24 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.31 µs ± 26 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.32 µs ± 50.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
2.79 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.95 µs ± 9.97 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.91 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.7 µs ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.51 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.58 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
4.43 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.9 µs ± 26.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.56 µs ± 98.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.18 µs ± 33.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.97 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.07 µs ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.88 µs ± 16.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
7.86 µs ± 86.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.76 µs ± 57.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.94 µs ± 67.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.22 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.89 µs ± 5.58 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.86 µs ± 44.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.74 µs ± 37.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Test 3 - Iterable length of 10000 with a high n value

Test:

from more_itertools import padded, consume
%timeit -n 100000 consume(padded(range(1, 10000), n=107, next_multiple=True))

Before:

528 µs ± 3.05 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

157 µs ± 444 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions