Skip to content

enumerate's are slow #121380

@LimaBD

Description

@LimaBD

Performance Improvement

Using enumerate(...) in for loops are slow and there are at least +250 enumerate's in CPython

Let's take for example this for loop with enumerate:

cpython/Lib/genericpath.py

Lines 116 to 118 in 5f660e8

for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]

If you remove enumerate, this method will run 5% to 7% faster:

    i = 0
    for c in s1:
        if c != s2[i]:
            return s1[:i]
        i += 1

Complete performance test script, basically I copied twice the original commonprefix(m) and then I modified the last one:

import os
import time

# From file CPython/Lib/genericpath.py

# Return the longest prefix of all list elements.
def old_commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    # Some people pass in a list of pathname parts to operate in an OS-agnostic
    # fashion; don't try to translate in that case as that's an abuse of the
    # API and they are already doing what they need to be OS-agnostic and so
    # they most likely won't be using an os.PathLike object in the sublists.
    if not isinstance(m[0], (list, tuple)):
        m = tuple(map(os.fspath, m))
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    # Some people pass in a list of pathname parts to operate in an OS-agnostic
    # fashion; don't try to translate in that case as that's an abuse of the
    # API and they are already doing what they need to be OS-agnostic and so
    # they most likely won't be using an os.PathLike object in the sublists.
    if not isinstance(m[0], (list, tuple)):
        m = tuple(map(os.fspath, m))
    s1 = min(m)
    s2 = max(m)
    i = 0
    for c in s1:
        if c != s2[i]:
            return s1[:i]
        i += 1
    return s1

def perfomance(func):
    start = time.time()
    for _ in range(1000000):
        func(["/usr/lib", "/usr/local/lib", "/usr/local/bin", "/usr/bin", "/usr/local/share"])
    end = time.time()
    return end - start

old_perfomance = perfomance(old_commonprefix)
new_perfomance = perfomance(commonprefix)
perc = (old_perfomance / new_perfomance) * 100

print(f"with enumerate: {old_perfomance} secs")
print(f"without enumerate: {new_perfomance} secs")
print(f"New version is {perc:.2f}% faster")

You will get this performance improvement:

with enumerate: 0.8863015174865723 secs
without enumerate: 0.8414063453674316 secs
New version is 105.34% faster

Running this commands, you can count and find all for loops with enumerate in /Lib for example

$ grep -r "in enumerate" ./Lib | wc -l
$ grep -rl "in enumerate" ./Lib

If this is useful for CPython, I can create specific issues+PR replacing enumerate(...)'s.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions