-
-
Notifications
You must be signed in to change notification settings - Fork 34.2k
Closed as not planned
Labels
performancePerformance or resource usagePerformance or resource usage
Description
Performance Improvement
Using enumerate(...) in for loops are slow and there are at least +250 enumerate's in CPython
Let's take for example this for loop with enumerate:
Lines 116 to 118 in 5f660e8
| for i, c in enumerate(s1): | |
| if c != s2[i]: | |
| return s1[:i] |
If you remove enumerate, this method will run 5% to 7% faster:
i = 0
for c in s1:
if c != s2[i]:
return s1[:i]
i += 1Complete performance test script, basically I copied twice the original commonprefix(m) and then I modified the last one:
import os
import time
# From file CPython/Lib/genericpath.py
# Return the longest prefix of all list elements.
def old_commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
# Some people pass in a list of pathname parts to operate in an OS-agnostic
# fashion; don't try to translate in that case as that's an abuse of the
# API and they are already doing what they need to be OS-agnostic and so
# they most likely won't be using an os.PathLike object in the sublists.
if not isinstance(m[0], (list, tuple)):
m = tuple(map(os.fspath, m))
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1
# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
# Some people pass in a list of pathname parts to operate in an OS-agnostic
# fashion; don't try to translate in that case as that's an abuse of the
# API and they are already doing what they need to be OS-agnostic and so
# they most likely won't be using an os.PathLike object in the sublists.
if not isinstance(m[0], (list, tuple)):
m = tuple(map(os.fspath, m))
s1 = min(m)
s2 = max(m)
i = 0
for c in s1:
if c != s2[i]:
return s1[:i]
i += 1
return s1
def perfomance(func):
start = time.time()
for _ in range(1000000):
func(["/usr/lib", "/usr/local/lib", "/usr/local/bin", "/usr/bin", "/usr/local/share"])
end = time.time()
return end - start
old_perfomance = perfomance(old_commonprefix)
new_perfomance = perfomance(commonprefix)
perc = (old_perfomance / new_perfomance) * 100
print(f"with enumerate: {old_perfomance} secs")
print(f"without enumerate: {new_perfomance} secs")
print(f"New version is {perc:.2f}% faster")You will get this performance improvement:
with enumerate: 0.8863015174865723 secs
without enumerate: 0.8414063453674316 secs
New version is 105.34% faster
Running this commands, you can count and find all for loops with enumerate in /Lib for example
$ grep -r "in enumerate" ./Lib | wc -l
$ grep -rl "in enumerate" ./LibIf this is useful for CPython, I can create specific issues+PR replacing enumerate(...)'s.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performancePerformance or resource usagePerformance or resource usage