Skip to content

RFC: clean up map and collect with iterator traits#15123

Closed
JeffBezanson wants to merge 27 commits intojb/generatorsfrom
jb/itertraits
Closed

RFC: clean up map and collect with iterator traits#15123
JeffBezanson wants to merge 27 commits intojb/generatorsfrom
jb/itertraits

Conversation

@JeffBezanson
Copy link
Copy Markdown
Member

This is a proposal for cleaning up our iteration, mapping, and collecting. Between copy[!], collect, map, and comprehensions we had subtly different behaviors for similar operations. We also had some repeated code and inefficient generic fallbacks.

This change replaces all that with collect of various iterators. There are four core algorithms for this, based on whether an iterator has a known length/shape and whether it has a known element type. To select the right algorithm, I added some HolyTraits for iterators. I seem to recall iterator traits being discussed before, but I can't find a reference.

The biggest immediate benefit is that generic map is faster and better-typed.
Before:

julia> @time map(tuple, 11:2000000, enumerate(11:2000000));
  2.745751 seconds (18.00 M allocations: 505.257 MB, 29.04% gc time)

giving a Vector{Any}.
After:

julia> @time map(tuple, 11:2000000, enumerate(11:2000000));
  0.570696 seconds (6.00 M allocations: 198.356 MB, 11.51% gc time)

giving a Vector{Tuple{Int64,Tuple{Int64,Int64}}}.
This is achieved with the following glorious definition(s):

spread(f) = (args)->f(args...)
map(f, iters...) = collect(Generator(spread(f),zip(iters...)))

This definition is almost as fast as existing array functions:

julia> A = rand(1000,1000);

julia> @time A+A;
  0.001964 seconds (9 allocations: 7.630 MB)

julia> @time map(+,A,A);
  0.004792 seconds (11 allocations: 7.630 MB)

Hopefully this gap can be closed.

A major issue here is iterator shape preservation. Some iterators just wrap others and so should obviously preserve their shape, like Generator. Others are less obvious. Since zipped iteration is so common when working with arrays, I made zip shape-preserving. product is currently 1d only, but could for example concatenate the shapes of its component iterators.

@JeffBezanson JeffBezanson changed the title RFC: clean up map and collect RFC: clean up map and collect with iterator traits Feb 17, 2016
@tbreloff
Copy link
Copy Markdown

+1. Does this mean we could have a correctly typed unzip as well?

On Wednesday, February 17, 2016, Jeff Bezanson notifications@github.com
wrote:

This is a proposal for cleaning up our iteration, mapping, and collecting.
Between copy[!], collect, map, and comprehensions we had subtly different
behaviors for similar operations. We also had some repeated code and
inefficient generic fallbacks.

This change replaces all that with collect of various iterators. There
are four core algorithms for this, based on whether an iterator has a known
length/shape and whether it has a known element type. To select the right
algorithm, I added some HolyTraits for iterators. I seem to recall iterator
traits being discussed before, but I can't find a reference.

The biggest immediate benefit is that generic map is faster and
better-typed.
Before:

julia> @time map(tuple, 11:2000000, enumerate(11:2000000));
2.745751 seconds (18.00 M allocations: 505.257 MB, 29.04% gc time)

giving a Vector{Any}.
After:

julia> @time map(tuple, 11:2000000, enumerate(11:2000000));
0.570696 seconds (6.00 M allocations: 198.356 MB, 11.51% gc time)

giving a Vector{Tuple{Int64,Tuple{Int64,Int64}}}.
This is achieved with the following glorious definition(s):

spread(f) = (args)->f(args...)
map(f, iters...) = collect(Generator(spread(f),zip(iters...)))

This definition is almost as fast as existing array functions:

julia> A = rand(1000,1000);

julia> @time A+A;
0.001964 seconds (9 allocations: 7.630 MB)

julia> @time map(+,A,A);
0.004792 seconds (11 allocations: 7.630 MB)

Hopefully this gap can be closed.

A major issue here is iterator shape preservation. Some iterators just
wrap others and so should obviously preserve their shape, like Generator.
Others are less obvious. Since zipped iteration is so common when working
with arrays, I made zip shape-preserving. product is currently 1d only,

but could for example concatenate the shapes of its component iterators.

You can view, comment on, or merge this pull request online at:

#15123
Commit Summary

  • make map and collect more general and uniform by adding iterator
    traits

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#15123.

Fixes #14002

The primary problem was the lack of a space at the end of the regexp,
which didn't match the output of the shell process (does have a space).
This meant that emacs sometimes placed the point one character before
the end of the buffer, which prevented M-p from working.

This new regexp also allows for other prompts, such as "shell> "
yuyichao and others added 3 commits February 18, 2016 13:00
julia-mode: Add space to prompt regexp
Reenable test of condition number estimates of sparse matrices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.