Skip to content

ENH: add np.broadcast_to and reimplement np.broadcast_arrays#5371

Merged
charris merged 1 commit intonumpy:masterfrom
shoyer:broadcast_to
Feb 27, 2015
Merged

ENH: add np.broadcast_to and reimplement np.broadcast_arrays#5371
charris merged 1 commit intonumpy:masterfrom
shoyer:broadcast_to

Conversation

@shoyer
Copy link
Member

@shoyer shoyer commented Dec 15, 2014

Per the mailing list discussion, I have implemented a new function broadcast_to that broadcasts an array to a given shape according to numpy's broadcasting rules.

The new array iterator has functionality that could make several of these functions into almost one liners (thank you @jaimefrio, @seberg and @mwiebe for showing how), although we do have to do some extra work to preserve subclasses.

The first commit (eventually to be squashed away) shows how I would implement these without relying heavily on nditer. It also includes the broadcast_shape function (no longer necessary for the implementation of broadcast_arrays), which almost but didn't quite work as a one-liner (it failed for size 0 arrays). We could revive it if others think it would serve a useful purpose.

Question: should we allow for broadcasting with negative numbers in the shape? This seems strange but it does work (sort of like reshape) with the current implementation and I suppose someone could find it useful. On the other hand, numbers less than -1 are also supported, which may just be a bug with nditer?

In [1]: import numpy as np

In [2]: x = np.array([1, 2, 3])

# possibly useful
In [3]: np.broadcast_to(x, (-1, -1))
Out[3]: array([[1, 2, 3]])

# probably not useful
In [4]: np.broadcast_to(x, (-10,))
Out[4]: array([1, 2, 3])

(I did not yet add tests for the behavior with negative sizes.)

@seberg
Copy link
Member

seberg commented Dec 15, 2014

I am a bit worried that someone might use broadcast arrays with more then 32 arguments, then it would be a regression, I am not really worried for the new functions.

@shoyer
Copy link
Member Author

shoyer commented Dec 15, 2014

@seberg In fact, the test failure I'm getting here is for exactly that reason. So I guess I'll switch broadcast_arrays back to the more verbose implementation.

@seberg
Copy link
Member

seberg commented Dec 16, 2014

Since it is very simple and if the speed difference is big, we could think of making a fast pat. Though maybe it is a bad idea considering all the funny little things with subclasses (at least if subok=True). and the general maintanence overhead of it (and if just that we need to duplicate all tests)...

@mhvk
Copy link
Contributor

mhvk commented Dec 16, 2014

Thanks for keeping the subclasses! (after all the time @seberg and I spent on it...). One question (which I had earlier myself but didn't address): is the calculation of the shape much better/faster than using np.broadcast? If so, should that be adapted to use the routine here? If not, could np.broadcast be used here?

@njsmith
Copy link
Member

njsmith commented Dec 16, 2014

I wouldn't bother trying to mess around with fast paths until someone finds
a situation where broadcast_arrays is actually a bottleneck (which seems
kinda unlikely to me honestly). Let's save fastpath complexity for the
cases where it matters.

I think the idea is that we want to steer users away from np.broadcast b/c
the way it returns an iterator is confusing and unnecessary as compared to
broadcast_arrays. So better if we avoid depending on it in new code.
(Should we formally deprecate np.broadcast?)

I don't think there's much point in adapting np.broadcast to use the new
code either -- it'd be pretty complicated.

On Tue, Dec 16, 2014 at 9:08 PM, Marten van Kerkwijk <
notifications@github.com> wrote:

Thanks for keeping the subclasses! (after all the time @seberg
https://github.com/seberg and I spent on it...). One question (which I
had earlier myself but didn't address): is the calculation of the shape
much better/faster than using np.broadcast? If so, should that be adapted
to use the routine here? If not, could np.broadcast be used here?


Reply to this email directly or view it on GitHub
#5371 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@mhvk
Copy link
Contributor

mhvk commented Dec 16, 2014

Hmm, just been recommending np.broadcast twice in the context of astropy (e.g., liberfa/erfa-wrapping-experiments#9 (comment)) -- to replace something very similar to broadcast_shape here, as it was quite significantly faster (and it somewhat mattered, as we were calling C code and quickly dominated by setup overhead). Indeed,

a = np.ones((3,1,3,4))
b = np.ones((3,1,4))
c = np.ones((3,1,1,4))
%timeit broadcast_shape(a,b,c)
# 100000 loops, best of 3: 9.58 µs per loop
%timeit np.broadcast(a,b,c).shape
# 1000000 loops, best of 3: 764 ns per loop

@njsmith
Copy link
Member

njsmith commented Dec 16, 2014

Hooray for data! If it does matter then it matters.

...looking at that thread, it sounds like the overall operation you're
talking about is 4.something ms? Does ~9 us matter?

On Tue, Dec 16, 2014 at 9:31 PM, Marten van Kerkwijk <
notifications@github.com> wrote:

Hmm, just been recommending np.broadcast twice in the context of astropy
(e.g., liberfa/erfa-wrapping-experiments#9 (comment)
liberfa/erfa-wrapping-experiments#9 (comment))
-- to replace something very similar to broadcast_shape here, as it was
quite significantly faster (and it somewhat mattered, as we were calling C
code and quickly dominated by setup overhead). Indeed,

a = np.ones((3,1,3,4))
b = np.ones((3,1,4))
c = np.ones((3,1,1,4))
%timeit broadcast_shape(a,b,c)

100000 loops, best of 3: 9.58 µs per loop

%timeit np.broadcast(a,b,c).shape

1000000 loops, best of 3: 764 ns per loop


Reply to this email directly or view it on GitHub
#5371 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@mhvk
Copy link
Contributor

mhvk commented Dec 16, 2014

The above said, in most cases I think people would indeed just want the final shape, not the broadcast object returned by np.broadcast. This might suggest exposing broadcast_shape.

Also a small note: again in the context of astropy, yet another implementation of the same thing could not be replaced by np.broadcast because it needed to pass in a known shape which did not have an array yet. It might make broadcast_shape more general if tuples were interpreted as shape rather than as 1-d arrays.

@mhvk
Copy link
Contributor

mhvk commented Dec 16, 2014

@njsmith - the 4 ms was for a 10000-element array. But for the simpler functions and fewer elements it does matter. We did quite elaborate timing tests; eg., astropy/astropy#3141 (comment). For small arrays, total time is 20 µs, so, yes, the additional 10 µs does matter a little.

@njsmith
Copy link
Member

njsmith commented Dec 16, 2014

I think that can be interpreted as a request for my suggestion (1) in this
email?
http://thread.gmane.org/gmane.comp.python.numeric.general/59443/focus=59494

(Well, that talks about broadcast_arrays, but the generalization to
broadcast_shape is obvious.)

On Tue, Dec 16, 2014 at 9:37 PM, Marten van Kerkwijk <
notifications@github.com> wrote:

The above said, in most cases I think people would indeed just want the
final shape, not the broadcast object returned by np.broadcast. This
might suggest exposing broadcast_shape.

Also a small note: again in the context of astropy, yet another
implementation of the same thing could not be replaced by np.broadcast
because it needed to pass in a known shape which did not have an array yet.
It might make broadcast_shape more general if tuples were interpreted as
shape rather than as 1-d arrays.


Reply to this email directly or view it on GitHub
#5371 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@mhvk
Copy link
Contributor

mhvk commented Dec 16, 2014

yes, looks (very) simiar...

@njsmith
Copy link
Member

njsmith commented Dec 16, 2014

Does that 20 µs matter though? (Genuine question, not trying to denigrate
your use case. Is this something where you perform the 20 µs operation a
million times in a loop so it adds up?)

On Tue, Dec 16, 2014 at 9:40 PM, Marten van Kerkwijk <
notifications@github.com> wrote:

@njsmith https://github.com/njsmith - the 4 ms was for a 10000-element
array. But for the simpler functions and fewer elements it does matter. We
did quite elaborate timing tests; eg., astropy/astropy#3141 (comment)
astropy/astropy#3141 (comment). For
small arrays, total time is 20 µs, so, yes, the additional 10 µs does
matter a little.


Reply to this email directly or view it on GitHub
#5371 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@shoyer
Copy link
Member Author

shoyer commented Dec 16, 2014

For the purposes of discussion, I just pushed a fastpath for broadcast_shape that works for <32 dimensions (without new tests yet).

Interestingly, I needed to use the old iterator, though. This implementation with the new iterator almost works:

def broadcast_shape(*args):
    return np.nditer(args, flags=['multi_index', 'zerosize_ok']).shape

However, it fails with ValueError: Iterator is past the end if any of the arrays have size zero. That isn't a problem for np.broadcast(...).shape.

We can also add a fastpath for broadcast_arrays, too, if desired (I have original implementation in my first commit).

The other option is to keep a single path for broadcast_arrays by porting it to Cython. That would probably make up for the 10x speed factor difference. Also, it would let us treat tuples as shapes in broadcast_shape without an appreciable slow down.

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

@shoyer, how about just using np.broadcast(...).shape? Since it is already written in C, it should be hard to beat and avoids duplication of code.

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

@njsmith - the erfa routines underlie some of the coordinate and time transformations we do in astronomy, so are quite basic, and are used a lot. That said, if one really were calling them lots of times, it would probably be possible to optimize them by using larger arrays instead. Anyway, for that particular PR I mostly felt it was good to reduce many lines to one -- we try hard not to duplicate numpy functionality. But within numpy of course, it is good to think what is the best approach.

@shoyer
Copy link
Member Author

shoyer commented Dec 17, 2014

@mhvk I would use np.broadcast(...).shape only, but that imposes the 32 argument limit -- which apparently would break the existing API for broadcast_arrays.

@juliantaylor
Copy link
Contributor

or one could fix the 32 argument limit of the iterator, someone want to finish this:
https://github.com/juliantaylor/numpy/tree/no-max-args (turns out fixing einsum is tricky)

@seberg
Copy link
Member

seberg commented Dec 17, 2014

Well, the current code is slow for small arrays, whether this matters in any real world applications, I don't know. I have used np.broadcast(...).shape, too. Possibly even within numpy source, but I agree using the broadcasted iterator from python is probably almost never of any use at all. Or you are such an advanced user that you could even mess with nditer. On the other hand, I would like to remind everyone that broadcast_arrays has its own problems in the sense that 0-stride arrays can easily confuse new users if they are not aware of their behaviour.

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

OK, fixing @juliantaylor's tree would obviously be best, but in the meantime how about the following to ensure one is both fast and does not break the API?:

def broadcast_shape2(*args):
    if len(args) <= 32:
        return np.broadcast(*args).shape

    b = np.broadcast(*args[:32])
    for i in range(32, len(args), 31):
        b = np.broadcast(b, *args[i:min(i+31, len(args))])
    return b.shape

a = [np.ones((4,1)) for i in range(50)]
%timeit broadcast_shape(*a)
# 10000 loops, best of 3: 80 µs per loop
%timeit broadcast_shape2(*a)
# 100000 loops, best of 3: 11.8 µs per loop

(here, broadcast_shape is the implementation of the current PR, as in my earlier trials).

@jaimefrio
Copy link
Member

On Wed, Dec 17, 2014 at 12:39 AM, Julian Taylor notifications@github.com
wrote:

or one could fix the 32 argument limit of the iterator, someone want to
finish this:
https://github.com/juliantaylor/numpy/tree/no-max-args (turns out fixing
einsum is tricky)

I glanced over your code, and you have a few variable sized arrays here and
there, e.g in npyiter_get_common_dtype in nditer_constr.c that I am afraid
MSVC would frown upon.

In any case, if the trouble is with einsum, can this not be partially
fixed, starting with the nditer?

I still have trouble imagining a real world application where the 32
argument limit is a real limitation...

Jaime

(__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what np.asanyarray does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But note that subok is a variable here, which can be True or False.

core/numeric/asanyarray is defined to

    return array(a, dtype, copy=False, order=order, subok=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's why you should read all the code before commenting... Sorry for the noise.

@shoyer
Copy link
Member Author

shoyer commented Dec 17, 2014

@mhvk Good idea for simplifying broadcast_shape -- I just pushed it in the latest commit.

I like the idea of making broadcast_shape handle tuples as shapes, but even doing [isinstance(arg, tuple) for arg in args] slows down broadcast_shape by a factor of 70% or so.

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

@shoyer - had forgotten how forgiving python is with indices outside of a list's range... nice. Seems OK to leave the tuple is shape thing for another PR (would also need to be careful that it wouldn't break the API).

@shoyer
Copy link
Member Author

shoyer commented Dec 17, 2014

@mhvk In that case, we should not expose broadcast_shape in the public API, because it currently casts the input to ndarrays with np.asarray, so (2, 3) gets converted into an ndarray with shape (2,).

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

@shoyer - agreed that for now it is best not to expose it, as it currently is mostly a work-around the 32-argument limit of np.broadcast. But very useful here!

@njsmith
Copy link
Member

njsmith commented Dec 18, 2014

Instead of handling tuples as shapes, let's put them in a separate kwarg.
isinstance is almost always a code smell.
On 17 Dec 2014 15:45, "Marten van Kerkwijk" notifications@github.com
wrote:

@shoyer https://github.com/shoyer - agreed that for now it is best not
to expose it, as it currently is mostly a work-around the 32-argument limit
of np.broadcast. But very useful here!


Reply to this email directly or view it on GitHub
#5371 (comment).

@shoyer
Copy link
Member Author

shoyer commented Dec 18, 2014

Anyone have opinions on my question from the first post: should we allow broadcasting with negative numbers in the shape?

Given that nditer supports it and that it could be useful, my inclination is to document it and add tests to verify that it works as expected.

@njsmith
Copy link
Member

njsmith commented Dec 18, 2014

The fact that after 30 seconds thought I can't figure out what negative
numbers would mean in this situation makes me inclined not to support it.

For reshape you know what the total array size is, so an ndim array has
ndim-1 degrees of freedom in its shape. Here the final array size could be
anything, so... I don't get it.

I'm generally against supporting things just because we can. That way lies
confusion and difficult maintenance and annoying deprecation cycles.
On 17 Dec 2014 21:36, "Stephan Hoyer" notifications@github.com wrote:

Anyone have opinions on my question from the first post: should we allow
broadcasting with negative numbers in the shape?

Given that nditer supports it and that it could be useful, my inclination
is to document it and add tests to verify that it works as expected.


Reply to this email directly or view it on GitHub
#5371 (comment).

@shoyer
Copy link
Member Author

shoyer commented Dec 18, 2014

@njsmith In the context of nditer, apparently negative numbers in the shape mean: size of the original dimension (if it's present) or 1 otherwise. So, for example, np.broadcast_to(x, (-1, -1, -1)) ends up being roughly equivalent to np.atleast_3d, except it follows broadcasting rules, so the extra dimensions always end up inserted at the front.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably to late, just a quick idea; but what if we would make the view readony as default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like a good idea. Would you just add an argument readonly=True?

I suppose we'd need to default to readonly=False on np.broadcast_arrays to ensure backwards compatibility for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the readonly=False default for broadcast_arrays is what makes me wonder if that idea is a bit late. But on the other hand, I cannot find a reason why it would not be a good default at least here.

@charris
Copy link
Member

charris commented Feb 15, 2015

We are >< that close to finishing this...

@charris
Copy link
Member

charris commented Feb 15, 2015

The readonly flag can be set separately if needed?

@seberg
Copy link
Member

seberg commented Feb 15, 2015

Yeah, sorry for delaying things in the first place. It can be added later, but if we say it makes sense, I would think that it probably makes sense to make it default to True (for the new function only unfortunatly)? And in that case we need to decide before the release, but we can just make a blocker issue to not forget it.

@njsmith
Copy link
Member

njsmith commented Feb 15, 2015

How about we just default it to readonly now (which does seem safer) and
then if we decide to change our minds later, then oh well it's easy because
going readonly -> readwrite is backwards compatible.
On 14 Feb 2015 22:05, "seberg" notifications@github.com wrote:

Yeah, sorry for delaying things in the first place. It can be added later,
but if we say it makes sense, I would think that it probably makes sense to
make it default to True (for the new function only unfortunatly)? And in
that case we need to decide before the release, but we can just make a
blocker issue to not forget it.


Reply to this email directly or view it on GitHub
#5371 (comment).

@matthew-brett
Copy link
Contributor

Readonly default sounds reasonable to me.

@shoyer
Copy link
Member Author

shoyer commented Feb 20, 2015

Should we even include a readonly keyword argument? readonly=False seems so niche that we might as well insist users set the writeable flag directly.

The only justification I can see for readonly is as a temporary argument to ease the transition to readonly-only at some point in the future. For example, we could default to readonly=False in broadcast_arrays, and then issue a warning if it hasn't been set to True.

@seberg
Copy link
Member

seberg commented Feb 20, 2015

Maybe that makes sense, I am still wondering if there is some nice way to deprecate broadcast_arrays being writeable, but I can't think of one easily, or can we have a "writable but warn" flag without much hassle?

@njsmith
Copy link
Member

njsmith commented Feb 20, 2015

We already implemented writeable-but-warn for diagonal and for some record
array thing. So it's a trivial matter of just using it, or maybe
resurrecting it from recent git history.

And leaving out the readonly flag altogether for the new api sounds fine to
me.
On Feb 20, 2015 5:40 AM, "seberg" notifications@github.com wrote:

Maybe that makes sense, I am still wondering if there is some nice way to
deprecate broadcast_arrays being writeable, but I can't think of one
easily, or can we have a "writable but warn" flag without much hassle?


Reply to this email directly or view it on GitHub
#5371 (comment).

@seberg
Copy link
Member

seberg commented Feb 24, 2015

Ok, lets move this finally. Since I guess it is a bit beyond this (though if you are happy to help with it, please do!) to do a deprecation for broadcast_arrays, we don't need to do it in this PR. But there seems to be an agreement that there is no reason to have a writable output for now.

Could you make the output readonly (and hack broadcast_arrays to not change for now)? That would be great! I think it might be easiest to make a private function that has a readonly keyword arg for this purpose, so that there is an argument, but maybe the user doesn't need to know about it.

Then we can finally put this in. Again, I am sorry for delaying it all the time....

@shoyer
Copy link
Member Author

shoyer commented Feb 24, 2015

@seberg Sounds good, I'll get on that.

@shoyer
Copy link
Member Author

shoyer commented Feb 25, 2015

OK, I have pushed another commit that makes the result of broadcast_to readonly, while keeping the results of broadcast_arrays writeable.

I have not added the warn-on-write flag for broadcast_arrays for two reasons:

  1. The recent debate about breaking backwards compatibility for np.diagonal is still unresolved. This issue feels somewhat similar.
  2. As far as I can tell from the original PR (Clean up diagonal #280), the warn-on-write flag can currently only be set from the C-side. I don't really know C, so I'm not in a good position to add the appropriate hack to let it be set from the Python side.

@seberg
Copy link
Member

seberg commented Feb 25, 2015

OK, great. I will look at it later (hopefully), if I don't, wake me up some time ;). I think we are much safer here then with diagonal to be honest, though it might be a misconception because I simply can't see anyone writing to a non-contiguous multi-dimensional array without risking insane bugs ;).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be safer to do:

if readonly:
     result.flags.writable = False

Can you add a test (if not done already) that a readonly array works? I imagine this might throw an error for broadcast_arrays if the array is already readonly, or if not throw an error it might unset an existing readonly flag (I am not sure about the logic, but I think we cannot change the flag if the parent is readonly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fact, I tried this your logic as my first solution. It turns out this didn't work, as I discovered when I pushed this to Travis: https://travis-ci.org/numpy/numpy/jobs/52098107

I'll add a test for preserving the readonly nature of readonly input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See the latest commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that test did not rely on buggy behaviour in the first place? ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that looks like it should be good. But I will need to understand why the first thing did not work (you don't have to check that though, I will later). There is a fishy smell there for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was that nditer was making even writeable input readonly. That's what I added op_flags=['readonly'] in the latest commit, to try to clarify what nditer is doing (I couldn't get this to work when I tried using another flag like 'readwrite').

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that that makes problems makes sense to me. I guess 'readwrite' would throw an error if it is readonly before, so then this is should be the best way.

@mhvk mhvk mentioned this pull request Feb 25, 2015
@charris
Copy link
Member

charris commented Feb 26, 2015

Sounds like this is ready. I'll merge this evening if no one complains or beats me to it.

@charris
Copy link
Member

charris commented Feb 27, 2015

@shoyer Could you squash the commits into one?

Per the mailing list discussion [1], I have implemented a new function
`broadcast_to` that broadcasts an array to a given shape according to
numpy's broadcasting rules.

[1] http://mail.scipy.org/pipermail/numpy-discussion/2014-December/071796.html
charris added a commit that referenced this pull request Feb 27, 2015
ENH: add np.broadcast_to and reimplement np.broadcast_arrays
@charris charris merged commit 06af991 into numpy:master Feb 27, 2015
@charris
Copy link
Member

charris commented Feb 27, 2015

Thanks @shoyer.

@seberg
Copy link
Member

seberg commented Feb 27, 2015

Thanks a lot!

@richardotis
Copy link

Leaving this here in case someone needs to support older versions of numpy. I timed it against the version of broadcast_to @shoyer included in xray. It's much less efficient, of course, and involves a memory allocation, but for many applications the difference is negligible.

def naive_broadcast_to(a, shape):
    "Broadcast an array to a specified shape. Returns a view."
    return np.broadcast_arrays(a, np.empty(shape, dtype=np.bool))[0]

a = np.random.rand(100, 1, 100)
np.testing.assert_array_equal(naive_broadcast_to(a, (100, 1000, 100)), broadcast_to(a, (100, 1000, 100)))
%timeit naive_broadcast_to(a, (100, 10000, 100))
%timeit broadcast_to(a, (100, 10000, 100))
10000 loops, best of 3: 56.7 µs per loop
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 7.69 µs per loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants