Optimize foldl/foreach for zip(arrays...), CartesianIndices, etc. by tkf · Pull Request #35036 · JuliaLang/julia

tkf · 2020-03-07T07:23:49Z

This PR implements performance an optimization for foldl on CartesianIndices and product by executing them as nested loops rather than invoking their custom iterate function on a single loop. From this optimization, we can easily add performance optimizations of other functions such as foldl(_, zip(arrays...)) and foreach(_, arrays...). For more contexts, see #9080 (comment), #9080 (comment), and #15648 (comment).

As we already have iterators-to-transducers automatic conversions #33526, iterator comprehensions wrapping product, i.e., anything of the form

(f(x, y, z) for x in xs, y in ys, z in zs if p(x, y, z))

can automatically get some performance boost.

I think this PR also addresses issue #9080.

Benchmarks (issue #9080)

using BenchmarkTools

function sumcart_manual(A::AbstractMatrix)
    s = 0.0
    @inbounds for j = 1:size(A,2), i = 1:size(A,1)
        s += A[i,j]
    end
    s
end

function sumcart_iter(A)
    s = 0.0
    @inbounds for I in CartesianIndices(size(A))
        s += A[I]
    end
    s
end

function sumcart_foldl(A)
    foldl(CartesianIndices(size(A)); init=0.0) do s, I
        @inbounds s + A[I]
    end
end

A = rand(10^4, 10^4);
@btime sumcart_manual($A);  # 126.509 ms (0 allocations: 0 bytes)
@btime sumcart_iter($A);    # 145.124 ms (0 allocations: 0 bytes)
@btime sumcart_foldl($A);   # 125.753 ms (0 allocations: 0 bytes)

Some more benchmarks

using BenchmarkTools
suite = BenchmarkGroup()
suite["sum(x == y for x in xs, y in ys)"] = @benchmarkable sum(
    x == y for x in (1, 2, 3, 4, 5, 6, 7, 8), y in $(rand(1:50, 10^3))
)
suite["sum(x * y for (x, y) in zip(A, transpose(B)))"] = @benchmarkable sum(
    x * y for (x, y) in zip($(rand(100, 100)), transpose($(rand(100, 100))))
)
suite["copyto!(A, transpose(B))"] = @benchmarkable begin
    A = $(zeros(100, 100))
    B = transpose($(rand(100, 100)))
    foreach(eachindex(A, B)) do I
        @inbounds A[I] = B[I]
    end
end

Before (1.5.0-DEV.416):

  "sum(x * y for (x, y) in zip(A, transpose(B)))" => Trial(19.145 μs)
  "sum(x == y for x in xs, y in ys)" => Trial(5.190 μs)
  "copyto!(A, transpose(B))" => Trial(15.972 μs)

After:

  "sum(x * y for (x, y) in zip(A, transpose(B)))" => Trial(10.593 μs)
  "sum(x == y for x in xs, y in ys)" => Trial(616.000 ns)
  "copyto!(A, transpose(B))" => Trial(5.733 μs)

timholy · 2020-03-07T11:42:36Z

Very nice addition! I also agree with your choice of language: it "addresses" #9080 but is not a fix for it. We want to be able to support both paradigms efficiently. My guess is the same transformation, unwrapping the loops, needs to be applied to the iterator version, but as you say it's a much harder transformation to do automatically and might require some compiler magic. So it's great to have this.

vchuravy · 2020-03-07T13:59:18Z

base/multidimensional.jl

    first(iter::CartesianIndices) = CartesianIndex(map(first, iter.indices))
    last(iter::CartesianIndices)  = CartesianIndex(map(last, iter.indices))

+    # Use nested for-loop in `foldl` as it is much faster than `iterate`:


... as it preserves LLVM's ability to vectorize.

E.g. don't just state that it is faster, but why

Is 37e5971 enough?

vchuravy · 2020-03-07T14:01:19Z

base/reduce.jl

-struct _InitialValue end
+@inline function _foldl_impl(op::OP, init, array::AbstractArray) where {OP}
+    if IndexStyle(array) isa IndexLinear
+        return invoke(_foldl_impl, Tuple{Any,Any,Any}, op, init, array)


That we have to use invoke here is a codesmell for me. Why can't this be done with a dispatch on the IndexStyle?

I just simply renamed the default implementation a56c371. I can wrap it with a new function with IndexStyle argument but I think this is simpler. What do you think?

The reason why I don't think IndexStyle is a good approach here is that this trait is insufficient for supporting more complex collections such as sparse arrays.

vchuravy · 2021-05-14T16:54:51Z

bump, I think this shouldn't need to wait until we figure out the generic compiler work.

vtjnash · 2021-11-13T05:32:53Z

Is this still needed? I see the linked issues are closed now, not a few months after Tim sounded that he despaired of seeing #9080 be solved.

DilumAluthge · 2026-02-09T03:12:14Z

This PR has had no activity for four years+, so I'll close it as stale.

If somebody is interested in taking over this PR and continuing to work on it, let me know.

Optimize foldl/foreach for zip(arrays...), CartesianIndices, etc.

71a12d2

vchuravy reviewed Mar 7, 2020

View reviewed changes

timholy mentioned this pull request Mar 9, 2020

Code generation and cartesian iteration #9080

Closed

tkf added 3 commits March 9, 2020 22:42

Move _InitialValue out of foldl && mapfoldl section

98493a5

More specific comment for foldl on CartesianIndices

37e5971

Avoid using invoke

a56c371

tkf mentioned this pull request Mar 11, 2020

Add a compiler pass for iterate(::CartesianIndices) #35074

Closed

tkf mentioned this pull request May 19, 2020

50% performance regression in map! #35914

Open

kimikage mentioned this pull request Feb 19, 2021

Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

Closed

DilumAluthge closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize foldl/foreach for zip(arrays...), CartesianIndices, etc.#35036

Optimize foldl/foreach for zip(arrays...), CartesianIndices, etc.#35036
tkf wants to merge 4 commits intoJuliaLang:masterfrom
tkf:foldl-cartesian

tkf commented Mar 7, 2020

Uh oh!

timholy commented Mar 7, 2020

Uh oh!

vchuravy Mar 7, 2020

Uh oh!

tkf Mar 10, 2020

Uh oh!

vchuravy Mar 7, 2020

Uh oh!

tkf Mar 10, 2020

Uh oh!

vchuravy commented May 14, 2021

Uh oh!

vtjnash commented Nov 13, 2021

Uh oh!

DilumAluthge commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

tkf commented Mar 7, 2020

Benchmarks (issue #9080)

Some more benchmarks

Uh oh!

timholy commented Mar 7, 2020

Uh oh!

vchuravy Mar 7, 2020

Choose a reason for hiding this comment

Uh oh!

tkf Mar 10, 2020

Choose a reason for hiding this comment

Uh oh!

vchuravy Mar 7, 2020

Choose a reason for hiding this comment

Uh oh!

tkf Mar 10, 2020

Choose a reason for hiding this comment

Uh oh!

vchuravy commented May 14, 2021

Uh oh!

vtjnash commented Nov 13, 2021

Uh oh!

DilumAluthge commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants