-
Notifications
You must be signed in to change notification settings - Fork 311
Padded Improvement #804
Description
Intro
I have been working towards increasing the performance of the padded method and have made some good headway. Main performance gains come from using built in itertools methods and using generators in a more bulk friendly way.
There are a few caveats to this PR and I would like some advice on how to proceed and what would be considered best practice for this project. I will explain in more detail down below.
Performance Gains Where next_multiple is False
When next_multiple is set to False this PR is a strict improvement over the current method with no drawbacks I can see.
Test 1 - Iterable length of 10-80 with a low, medium, and high n value
Test:
from more_itertools import padded, consume
import math
for i in range(1, 5):
iter_len = 5 * pow(2, i)
ns = (math.floor(iter_len * 0.20), math.floor(iter_len * 0.50), math.floor(iter_len * 0.80))
print(f'Iter Length {iter_len} with n={ns} and next_multiple=False')
for n in ns:
%timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=False))Before:
Iter Length 10 with n=(2, 5, 8) and next_multiple=False
1.11 µs ± 3.91 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.1 µs ± 9.13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(4, 10, 16) and next_multiple=False
1.53 µs ± 7.69 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.54 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.52 µs ± 8.74 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(8, 20, 32) and next_multiple=False
2.38 µs ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.37 µs ± 47.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.37 µs ± 39.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(16, 40, 64) and next_multiple=False
4.07 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.03 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.98 µs ± 49.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
Iter Length 10 with n=(2, 5, 8) and next_multiple=False
941 ns ± 31.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
935 ns ± 5.26 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
944 ns ± 3.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(4, 10, 16) and next_multiple=False
984 ns ± 5.47 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.04 µs ± 25.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(8, 20, 32) and next_multiple=False
1.13 µs ± 46.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.19 µs ± 40.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.17 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(16, 40, 64) and next_multiple=False
1.3 µs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.43 µs ± 58.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.47 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Test 2 - Iterable length of 10-80 with very low n values
Test:
from more_itertools import padded, consume
import math
for i in range(1, 5):
iter_len = 5 * pow(2, i)
ns = (2, 4, 6, 8, 10, 12, 15)
print(f'Iter Length {iter_len} with n={ns} and next_multiple=False')
for n in ns:
%timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=False))Before:
Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.11 µs ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.1 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.09 µs ± 7.09 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.21 µs ± 33 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.26 µs ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.54 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 6.63 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.52 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
2.35 µs ± 38 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.34 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.38 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.34 µs ± 22.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.35 µs ± 60.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.38 µs ± 52.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.34 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
3.96 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.95 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.96 µs ± 9.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.07 µs ± 165 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.99 µs ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.02 µs ± 88.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.98 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
928 ns ± 3.81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
931 ns ± 3.21 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
948 ns ± 14.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
951 ns ± 5.94 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
952 ns ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
993 ns ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.02 µs ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.04 µs ± 73.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
994 ns ± 1.64 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1 µs ± 16.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 29.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.01 µs ± 3.77 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.03 µs ± 5.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.06 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.07 µs ± 5.28 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.08 µs ± 7.72 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.08 µs ± 2.13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.13 µs ± 33 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.15 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=False
1.23 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 33.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.26 µs ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.33 µs ± 69.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.33 µs ± 74.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Test 3 - Iterable length of 10000 with a high n value
Test:
from more_itertools import padded, consume
%timeit -n 100000 consume(padded(range(1, 10000), n=107, next_multiple=False))Before:
521 µs ± 4.43 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
99.6 µs ± 305 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Performance Gains Where next_multiple is True
When next_multiple is set to True there are some problems with speed at very low iterable lengths (less than 10 items) and n values (less than around 9) that cause some performance degradation. I could put a switch to use the old method here to keep the performance the same but I feel that doing that may introduce unexpected bugs and hurt maintainability. I would welcome feedback here.
Test 1 - Iterable length of 10-80 with a low, medium, and high n value
Test:
from more_itertools import padded, consume
import math
for i in range(1, 5):
iter_len = 5 * pow(2, i)
ns = (math.floor(iter_len * 0.20), math.floor(iter_len * 0.50), math.floor(iter_len * 0.80))
print(f'Iter Length {iter_len} with n={ns} and next_multiple=True')
for n in ns:
%timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=True))Before:
Iter Length 10 with n=(2, 5, 8) and next_multiple=True
1.14 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.11 µs ± 9.17 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.33 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(4, 10, 16) and next_multiple=True
1.55 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.58 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.92 µs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(8, 20, 32) and next_multiple=True
2.4 µs ± 69.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.37 µs ± 31.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.01 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(16, 40, 64) and next_multiple=True
3.99 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.97 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.32 µs ± 46.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
Iter Length 10 with n=(2, 5, 8) and next_multiple=True
1.9 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.43 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.5 µs ± 38.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(4, 10, 16) and next_multiple=True
2.02 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.53 µs ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.67 µs ± 55.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(8, 20, 32) and next_multiple=True
2.2 µs ± 70.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.69 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.86 µs ± 19.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(16, 40, 64) and next_multiple=True
2.54 µs ± 58.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.01 µs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.33 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Test 2 - Iterable length of 10-80 with very low n values
Test:
from more_itertools import padded, consume
import math
for i in range(1, 5):
iter_len = 5 * pow(2, i)
ns = (2, 4, 6, 8, 10, 12, 15)
print(f'Iter Length {iter_len} with n={ns} and next_multiple=True')
for n in ns:
%timeit -n 100000 consume(padded(range(iter_len), n=n, next_multiple=True))Before:
Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
1.13 µs ± 29 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.25 µs ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.4 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.14 µs ± 21.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.24 µs ± 29.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.32 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
1.54 µs ± 7.65 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 5.58 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.73 µs ± 49.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.7 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.58 µs ± 44.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.71 µs ± 7.69 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.87 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
2.4 µs ± 55.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.36 µs ± 22.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.5 µs ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.38 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.35 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.61 µs ± 7.85 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.54 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
4 µs ± 59.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.96 µs ± 8.11 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.26 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.09 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.01 µs ± 44 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.17 µs ± 92.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.34 µs ± 78.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
Iter Length 10 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
1.89 µs ± 6.36 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.63 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.46 µs ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.51 µs ± 32.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.24 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.31 µs ± 26 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.32 µs ± 50.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 20 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
2.79 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.95 µs ± 9.97 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.91 µs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.7 µs ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.51 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.55 µs ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.58 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 40 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
4.43 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.9 µs ± 26.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.56 µs ± 98.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.18 µs ± 33.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.97 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.07 µs ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.88 µs ± 16.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Iter Length 80 with n=(2, 4, 6, 8, 10, 12, 15) and next_multiple=True
7.86 µs ± 86.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.76 µs ± 57.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.94 µs ± 67.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.22 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.89 µs ± 5.58 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.86 µs ± 44.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
2.74 µs ± 37.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Test 3 - Iterable length of 10000 with a high n value
Test:
from more_itertools import padded, consume
%timeit -n 100000 consume(padded(range(1, 10000), n=107, next_multiple=True))Before:
528 µs ± 3.05 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
After:
157 µs ± 444 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)