Speed up `zip_broadcast` by Masynchin · Pull Request #737 · more-itertools/more-itertools

Masynchin · 2023-07-23T11:49:58Z

Issue reference

No Issues 😎

Changes

Change logic inside zip_broadcast. Now it converts scalars to iterables, so that it can just zip them all together.

Simple benchmark:

Before:

After:

Masynchin · 2023-07-23T12:19:51Z

It is almost exact same as the original implementation (0926631), but passes new test cases introduced after #561 and #565.

kalekundert · 2023-07-25T05:40:52Z

I don't think this implementation works in the following case:

>>> from more_itertools import zip_broadcast
>>> list(zip_broadcast(1, [1,2], strict=True))
[(1, 1), (1, 2)]

The problem is that repeat(1) will not be considered same size as [1,2], so the strict=True check will fail. It should succeed though, because the scalars are meant to be treated as iterables of the same length as their peers. See #543 for more discussion of this issue.

Unfortunately, it seems like this case is not covered by the tests. I feel bad about that; this is definitely a case that I should've included in #565. I realize now that there are no tests for cases in which zip_broadcast(..., strict=True) does anything except raise an exception, so we should definitely add some.

Maybe a way to improve the performance of zip_broadcast() would be to make custom wrappers for the scalars and the iterables. The iterable wrappers would detect when the iterables become exhausted, then notify the scalar wrappers. The scalar wrappers would behave like repeat() initially, but would stop upon receiving the aforementioned notification. Not totally sure if this would avoid the issues from #561, but it might be worth thinking about.

bbayles · 2023-07-25T14:02:11Z

I'll take a PR with more tests, please!

Masynchin · 2023-07-25T18:37:46Z

@kalekundert I see your optimization in #740. I was thinking about different approach, and came up with this idea (but not any drafts or PoC).

zip_broadcast uses _zip_equal if strict:

more-itertools/more_itertools/more.py

Line 4242 in bed0811

zipper = _zip_equal if strict else zip

Which uses _zip_equal_generator, in case there any "scalar" iterables passed (in this PR version), which in fact repeat(scalar).

more-itertools/more_itertools/recipes.py

Lines 357 to 360 in bed0811

    
           # If any one of the iterables didn't have a length, start reading 
        
           # them until one runs out. 
        
           except TypeError: 
        
               return _zip_equal_generator(iterables)

So the issue is that repeat never stops and when any "real" iterable is exhausted, zip_longest yields combo of scalar and _marker. Because of it an UnequalIterablesError raised, but we don't want it.

more-itertools/more_itertools/recipes.py

Lines 335 to 340 in bed0811

    
           def _zip_equal_generator(iterables): 
        
               for combo in zip_longest(*iterables, fillvalue=_marker): 
        
                   for val in combo: 
        
                       if val is _marker: 
        
                           raise UnequalIterablesError() 
        
                   yield combo

What if we write another version of _zip_equal and _zip_equal_generator specifically for zip_broadcast? I think about something like this pseudocode:

def _zip_equal_generator(n, *iterables):
    for combo in zip_longest(*iterables, fillvalue=_marker):
        match combo.count(_marker):
            case 0: yield combo
            case n: break
            case _: raise UnequalIterablesError()

Where n is a count of "real" iterables. I have added the check for markers count, if it equals to n, it would mean that all iterables have stopped at the same length, so it would handle the case you suggested in first reply.

What do you think? (Sorry for bad english)

kalekundert · 2023-07-25T18:44:20Z

That seems like a good idea to me. I can't think of any reason why it wouldn't work, and it should be just as fast as your original implementation.

Also, just to be clear, I didn't mean for #740 to preclude more substantial optimizations like this. I just noticed a small inefficiency, and it was easy to fix.

Masynchin · 2023-07-25T18:52:09Z

I didn't mean for #740 to preclude more substantial optimizations like this. I just noticed a small inefficiency, and it was easy to fix.

Your optimization is absolutely good, didn't mean to put any offense on it. Hope any other ideas would work too!

bbayles · 2023-07-25T21:23:06Z

For both branches, could you please make sure that you've merged in master and gotten the new tests?

Masynchin · 2023-07-26T13:44:42Z

For both branches, could you please make sure that you've merged in master and gotten the new tests?

Yep, done!

Masynchin · 2023-07-26T13:48:38Z

@kalekundert I have implemented optimization above, seems that it passes both old tests and new ones you added in #739. I am looking for your help reviewing it.

Masynchin · 2023-07-26T13:54:27Z

I just copypasted _zip_equal/_zip_equal_generator implementations, slightly rewrote it, and added scalars suffix. Haven't check whether we can optimize it specifically for scalars, I only made sure it passes tests.

kalekundert · 2023-07-26T16:15:30Z

I only have a few very minor comments. Most of them pertain more to _zip_equal() than to your PR, but I included them anyways just because they were things I noticed while reading the code. Feel free to ignore:

I think it's probably worth avoiding the duplication of _zip_equal(), e.g. by giving the existing _zip_equal() a keyword-only zip_equal_generator argument that defaults to the standard behavior, but can be changed in cases like this.
Alternatively, it might make sense to not use _zip_equal() at all. Most of the time there will be at least one scalar argument, so _zip_equal() will just end up calling _zip_equal_generator() anyways. The extra complexity/code duplication doesn't really seem worth it. This code path also involves raising and catching an exception, which is a relatively expensive operation, but I don't really think the difference would be noticeable here.
I'd be tempted to define _marker as a local variable, both in your version of _zip_equal_generator() and the existing version. Not only would this completely eliminate the possibility of this value appearing in a user-provided iterable (which is admittedly a paranoid concern), it would also be ≈5% faster for iterables with more than ≈100 elements due to local variable lookups being more efficient than global ones (see below).
The for val in combo: loop in _zip_equal_generator() could be replaced with if _marker in combo:. That would be easier to understand, and ≈10% faster (see below).
The for/else logic in _zip_equal() seems unnecessarily complex. Why not just raise the exception immediately, instead of breaking and raising it outside the loop? Of course, this has nothing to do with this PR.

Other than that, everything looks good to me. I assume this version still runs ≈10x faster than the current implementation?

Speed tests for minor optimizations mentioned above:

from itertools import zip_longest
from more_itertools import consume

_marker = object()

args = [
        range(100_000),
        range(100_000),
        range(100_000),
]

def z1(iterables):
    for combo in zip_longest(*iterables, fillvalue=_marker):
        for val in combo:
            if val is _marker:
                raise UnequalIterablesError()
        yield combo

def z2(iterables):
    _marker = object()
    for combo in zip_longest(*iterables, fillvalue=_marker):
        for val in combo:
            if val is _marker:
                raise UnequalIterablesError()
        yield combo

def z3(iterables):
    for combo in zip_longest(*iterables, fillvalue=_marker):
        if _marker in combo:
            raise UnequalIterablesError()
        yield combo

def z4(iterables):
    _marker = object()
    for combo in zip_longest(*iterables, fillvalue=_marker):
        if _marker in combo:
            raise UnequalIterablesError()
        yield combo

import timeit    

for f in [z1, z2, z3, z4]:
    print(
            timeit.timeit(
                stmt=f'consume(f(args))',
                globals=dict(f=f) | globals(),
                number=100,
            )
    )

Output:

2.199860891967546
2.0907241030363366
2.0204972539795563
1.9232789399684407

Masynchin · 2023-07-26T17:55:25Z

I think it's probably worth avoiding the duplication of _zip_equal(), e.g. by giving the existing _zip_equal() a keyword-only zip_equal_generator argument that defaults to the standard behavior, but can be changed in cases like this.

I can do something like this:

def zip_broadcast(*objects, scalar_types=(str, bytes), strict=False):
    ...
    if strict:
+        yield from _zip_equal(
+            *iterables, gen=partial(
+                _zip_equal_generator_scalars, iterables_count
+            )
+        )
    else:
        yield from zip(*iterables)


+def _zip_equal(*iterables, gen=_zip_equal_generator):
    ...
    except TypeError:
+        return gen(iterables)


def _zip_equal_generator_scalars(n, iterables):
    ...

It reverts _zip_equal to its previous positional arguments (without needing to provide n), with new gen keyword. Do partial part looks good? Would it be better with lambda, or just we should rewrite/choose different approach?

This code path also involves raising and catching an exception, which is a relatively expensive operation, but I don't really think the difference would be noticeable here.

It is also raised when regular iterable of infinite/undefined length is occured, so how can this check be eliminated?

kalekundert · 2023-07-26T18:37:38Z

Yeah, that's exactly what I had in mind. And I think partial() is the right way to go.

What I meant was something like this:

def zip_broadcast(*objects, scalar_types=(str, bytes), strict=False):
    ...
    if strict:
+        # It would also make sense to just put the for loop directly here, with no extra function call.
+        yield from _zip_equal_generator_scalars(iterables, iterables_count)
    else:
        yield from zip(*iterables)

Basically, just don't call _zip_equal() at all. The only downside is that you end up using zip_longest() instead of zip() in the case where all the iterables have lengths that are the same. But this will not often be the case.

Don't take my suggestions too seriously; I'm not sure they're all good ideas.

Masynchin · 2023-07-26T18:51:12Z

I'd be tempted to define _marker as a local variable, both in your version of _zip_equal_generator() and the existing version. Not only would this completely eliminate the possibility of this value appearing in a user-provided iterable (which is admittedly a paranoid concern), it would also be ≈5% faster for iterables with more than ≈100 elements due to local variable lookups being more efficient than global ones (see below).

The for val in combo: loop in _zip_equal_generator() could be replaced with if _marker in combo:. That would be easier to understand, and ≈10% faster (see below).

The for/else logic in _zip_equal() seems unnecessarily complex. Why not just raise the exception immediately, instead of breaking and raising it outside the loop? Of course, this has nothing to do with this PR.

Good spots! Maybe I should't optimize _zip_equal_generator in this PR, so that you can add this optimization and all the others in optimization-specific PR.

Masynchin · 2023-07-26T19:12:44Z

Basically, just don't call _zip_equal() at all. The only downside is that you end up using zip_longest() instead of zip() in the case where all the iterables have lengths that are the same. But this will not often be the case.

I'm not sure they're all good ideas.

Feel free to try, I invited you to collaborate on my fork, so that you can push your commits in this PR. I would agree with any of your decision to add/not add any change, such as above.

Masynchin · 2023-07-30T09:04:36Z

@bbayles I have resolved conflicts, is it planned to be merged?

pochmann · 2023-07-30T16:37:01Z

I think about something like this pseudocode:

def _zip_equal_generator(n, *iterables):
    for combo in zip_longest(*iterables, fillvalue=_marker):
        match combo.count(_marker):
            case 0: yield combo
            case n: break
            case _: raise UnequalIterablesError()

That's not safe. Miscounts if there's a non-marker object that equals the marker. Demo:

from unittest.mock import ANY

_marker = object()
for combo in zip('foo', [1, ANY, 3]):
    print(combo.count(_marker))

Output (Attempt This Online!):

0
1
0

Masynchin · 2023-07-30T17:10:58Z

@pochmann if it treats ANY as _marker, then it also should fail in the current implementation? Can't test this right now

Masynchin · 2023-07-30T17:15:12Z

@pochmann if it treats ANY as _marker, then it also should fail in the current implementation? Can't test this right now

Oh, is it because current one uses is comparison and this PR uses count which uses __equals__ under the hood? If so, can it be fixed with something like sum(1 for o in combo if o is _marker)?

pochmann · 2023-07-30T17:36:18Z

Yes, counting with is would be correct.

Btw I think I also optimized this, but can't find it right now... Maybe I dismissed it because the current implementation is simpler (especially after prefilling the scalars, which I had also done). But mine might be simpler than the new suggestion. I might try again...

pochmann · 2023-07-30T21:19:18Z

2.199860891967546
2.0907241030363366
2.0204972539795563
1.9232789399684407

What Python version did you use? I think globals got faster in the last few versions. And are those results stable? (I.e., you ran it multiple times and always got very similar results?)

pochmann · 2023-07-30T21:36:22Z

@Masynchin About your current proposal: I'd rather do it like this, without the extra functions:

def zip_broadcast(*objects, scalar_types=(str, bytes), strict=False):
    ...
    iterables = [repeat(obj) if is_scalar(obj) else obj for obj in objects]

    if not strict:
        yield from zip(*iterables)

    if lengths of the non-scalars are all the same:
        yield from zip(*iterables)
    (or raise UnequalIterablesError if that's the case)

    for combo in zip_longest(*iterables, fillvalue=_marker):
        ...

Advantages:

Less code.
Higher chance of the are-all-lengths-equal check to succeed, as this doesn't include the repeated scalars in that check.
Faster last case (the for combo in case) as it avoids one generator layer. (The other cases could also avoid their generator layer, by making zip_broadcast not be a generator.)

Masynchin · 2023-07-31T12:34:10Z

That's not safe. Miscounts if there's a non-marker object that equals the marker

Fixed and added as test

Masynchin · 2023-07-31T12:35:16Z

Also, what should I do about flake8 in CI? Same as #742 (comment)

bbayles · 2023-07-31T13:54:04Z

Merge in the master branch for the flake8 issue, if you could please.

Masynchin · 2023-07-31T16:00:35Z

I found that in this PR UnequalIterableError may be raised with incorrect iterable index. Let me fix that before the merge

kalekundert · 2023-07-31T19:21:58Z

2.199860891967546
2.0907241030363366
2.0204972539795563
1.9232789399684407
What Python version did you use? I think globals got faster in the last few versions. And are those results stable? (I.e., you ran it multiple times and always got very similar results?)

I used version 3.10.0, and I ran it a bunch of times. IIRC, it's not a stable effect when there are only ≈10 items per iterable, but it's very stable by the time you get to ≈1000. That said, this is a really micro optimization, and I just pointed it out because I noticed it.

Masynchin · 2023-08-01T18:39:42Z

I found that in this PR UnequalIterableError may be raised with incorrect iterable index. Let me fix that before the merge

Fixed

I'd rather do it like this, without the extra functions

Done

pochmann · 2023-08-01T19:07:49Z

I found that in this PR UnequalIterableError may be raised with incorrect iterable index

Can you tell what that was, give an example? I don't see it, and the fix commit changes a lot of code.

pochmann · 2023-08-01T19:15:04Z

more_itertools/more.py


-    iterables_count = sum(1 for obj in objects if not is_scalar(obj))
+    iterables = list(filterfalse(is_scalar, objects))
+    iterables_count = ilen(iterables)


It's a list, just use len. And I don't think you need the count in the variable. You only use it once or twice, the first time you can just check the list instead, and the second time you can just call len there.

It's a list, just use len.

My bad, I had filterfalse in mind while adding ilen.

...and the second time you can just call len there.

elif markers == iterables_count:

Should I use len here? It would be more performant if I check the length once and not every step of for-loop

Should I use len here?

Yes.

It would be more performant if I check the length once and not every step of for-loop

No. You get there at most once.

You get there at most once.

Am I missing something?

Your print is at the wrong place. You only get to the len call if the if condition is false. Prepend the print to the elif instead: elif print(...) or condition:.

You are right. If if statement failed, then for-loop cancels in both elif or else branches. Thanks for notice!

Masynchin · 2023-08-01T19:40:46Z

I found that in this PR UnequalIterableError may be raised with incorrect iterable index

Can you tell what that was, give an example?

Consider consume(zip_broadcast(0, [1], [1, 2], strict=True)). There is one scalar and two real iterables. In the current implementation it checks for lengths of this iterables, ignoring indexes of scalars. In this PR it do the same now, and raises:

UnequalIterablesError: Iterables have different lengths: index 0 has length 1; index 1 has length 2

Before the fix it would be index 1 has length 1; index 2 has length 2. I think it was bug in my implementation, but if you count it as a feature, I can revert it back.

pochmann · 2023-08-01T21:35:25Z

Before the fix it would be index 1 has length 1; index 2 has length 2.

Ah, ok. Not sure which is better, indexes referring to all arguments or just to iterables.

Wasn't it index 0 has length 1; index 2 has length 2, though? (The index 0 is hardcoded.) That would really be wrong.

Masynchin · 2023-08-01T21:48:26Z

Wasn't it index 0 has length 1; index 2 has length 2, though?

I just rerun consume(zip_broadcast(0, [1], [1, 2], strict=True)) on hash before fix and it raises UnequalIterablesError: Iterables have different lengths without any length indexes 😬

Masynchin · 2023-08-05T16:19:21Z

I run this benchmark on Python 3.11.3:

import timeit
from more_itertools import consume, zip_broadcast

N = 1_000
G = globals()

t1 = timeit.timeit("consume(zip_broadcast(1, 2, [1] * 100_000))", number=N, globals=G)
t2 = timeit.timeit("consume(zip_broadcast(1, 2, [1] * 200_000, [2, 3] * 100_000))", number=N, globals=G)
t3 = timeit.timeit("consume(zip_broadcast(1, 2, [1] * 100_000, strict=True))", number=N, globals=G)

print(t1, t2, t3, sep="\n")

On master (266ebdc) and this PR (2dd6fe2), here are the results:

Master

17.083783166483045
40.894872582517564
21.31543183233589

PR

2.304440625011921
5.877900958992541
2.28413129132241

Almost 10x speed up. There is no unresolved questions, can we merge it?

pochmann · 2023-08-05T19:16:09Z

I think none of those test your slow case, can you test that as well?

bbayles · 2023-08-31T20:51:05Z

Is there still a test to add here?

Masynchin · 2023-08-31T21:14:02Z

Is there still a test to add here?

After #739 been merged and I tweaked this PR to pass this tests, no other problems was found. The only thing requested is regression benchmarks, but I am currently too busy with other stuff. My thought is that it can only regress if caller provides only one iterable and N scalars. I would be happy if anyone could verify this

bbayles · 2026-01-05T22:00:50Z

I think nobody is eager to finish this off, so closing. Thanks for the contribution nonetheless!

This was referenced Jul 25, 2023

Add more tests for zip_broadcast() #739

Merged

Speed up zip_broadcast() by pre-filling the scalar elements #740

Merged

Masynchin added 5 commits July 26, 2023 16:33

Simplify zip_broadcast

63764e0

Run black

435634a

Do not use len

035045d

Rename to iterables

7a5c8d8

Fix zip_broadcast in strict mode

3e8a184

Masynchin force-pushed the zip_broadcast branch from 299634f to 3e8a184 Compare July 26, 2023 13:41

Fix style

d3b2283

Merge branch 'master' into zip_broadcast

8f3540d

Masynchin added 2 commits July 31, 2023 15:27

Test zip_broadcast with ANY

07798af

Fix zip_broadcast for ANY

6cf1352

Merge branch 'master' into zip_broadcast

e1ca232

Masynchin added 3 commits August 1, 2023 21:27

Fix index of UnequalIterablesError in zip_broadcast

8836a0e

Use iterators

e7f765d

Remove trailing whitespace

98116cb

pochmann reviewed Aug 1, 2023

View reviewed changes

Remove len pre-optimization

2dd6fe2

bbayles closed this Jan 5, 2026

Conversation

Masynchin commented Jul 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue reference

Changes

Uh oh!

Masynchin commented Jul 23, 2023

Uh oh!

kalekundert commented Jul 25, 2023

Uh oh!

bbayles commented Jul 25, 2023

Uh oh!

Masynchin commented Jul 25, 2023

Uh oh!

kalekundert commented Jul 25, 2023

Uh oh!

Masynchin commented Jul 25, 2023

Uh oh!

bbayles commented Jul 25, 2023

Uh oh!

Masynchin commented Jul 26, 2023

Uh oh!

Masynchin commented Jul 26, 2023

Uh oh!

Masynchin commented Jul 26, 2023

Uh oh!

kalekundert commented Jul 26, 2023

Uh oh!

Masynchin commented Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kalekundert commented Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Masynchin commented Jul 26, 2023

Uh oh!

Masynchin commented Jul 26, 2023

Uh oh!

Masynchin commented Jul 30, 2023

Uh oh!

pochmann commented Jul 30, 2023

Uh oh!

Masynchin commented Jul 30, 2023

Uh oh!

Masynchin commented Jul 30, 2023

Uh oh!

pochmann commented Jul 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pochmann commented Jul 30, 2023

Uh oh!

pochmann commented Jul 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Masynchin commented Jul 31, 2023

Uh oh!

Masynchin commented Jul 31, 2023

Uh oh!

bbayles commented Jul 31, 2023

Uh oh!

Masynchin commented Jul 31, 2023

Uh oh!

kalekundert commented Jul 31, 2023

Uh oh!

Masynchin commented Aug 1, 2023

Uh oh!

pochmann commented Aug 1, 2023

Uh oh!

pochmann Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

Masynchin Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

pochmann Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

Masynchin Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

Masynchin commented Jul 23, 2023 •

edited

Loading

Masynchin commented Jul 26, 2023 •

edited

Loading

kalekundert commented Jul 26, 2023 •

edited

Loading

pochmann commented Jul 30, 2023 •

edited

Loading

pochmann commented Jul 30, 2023 •

edited

Loading

pochmann Aug 1, 2023 •

edited

Loading