ENH: add np.block to improve upon np.bmat by eric-wieser · Pull Request #8886 · numpy/numpy

eric-wieser · 2017-04-03T11:38:51Z

This continues where #7768 left off (which continues where #5057 left off). The first commit is just a rebase of #7768 onto master, with the release notes moved to the latest release.

Changes since then:

2D arrays are not special. This does not auto-promote results to 2D.
Works for ND arrays
No longer accepts *args, for consistency with stack and concatenate

The semantics are essentially:

~~Normalize the list of lists to be of uniform depth~~
reshape all the list elements to be at least as many dimensions as the list depth (ENH: Added atleast_nd #7804 ?)
concatenate from the innermost list to the outermost, working backwards from the last dimension - elements can have extra leading dimensions

Proposed restrictions

Things that we should decide on before merging:

Make tuples forbidden, require that inputs be lists of lists. This frees us up to let tuples mean something else in future (perhaps records, like they do in np.array with structured dtypes).
Require that list depth match explicitly: make np.block([[a, b], c]) illegal, must be spelt np.block([[a, b], [c]]) (currently these are equivalent - this change would make step 1 a check, not a normalization). This makes things look more like the np.array constructor, but forces users to be more verbose

Future Enhancements

Moved to #8899

madphysicist · 2017-04-04T13:33:28Z

numpy/core/shape_base.py

Extra newline?

madphysicist · 2017-04-04T13:35:21Z

numpy/core/shape_base.py

This can probably be replaced with np.array(x, subok=True, copy=False, ndmin=ndim).

Just as an FYI, I am still not sure what all the review-related buttons do, so please forgive any mess I create because of that.

madphysicist · 2017-04-04T13:41:05Z

numpy/core/shape_base.py

While I know that most of the numpy devs prefer not to use try-except in their code, I think it has some value here because it prevents you from allocating an unnecessary list. Something like

try: return max(xs) except ValueError: return 0

I'd prefer not to, because that would also catch exceptions from evaluating x.__next__

Good point. Comment rescinded.

madphysicist · 2017-04-04T13:53:52Z

numpy/core/tests/test_shape_base.py

I would be interested to see a case similar to this, but where Two spans multiple rows. The reason being that none of the input arrays are actually 2D here. Something like

one = np.array([[1, 1, 1]]) two = np.array([[2, 2, 2], [2, 2, 2], [2, 2, 2]]) three = np.array([[3, 3, 3]]) four = np.array([[4, 4, 4]]) ... result = block([[[one], [three], [four], two], [five, six], zeros]) expected = np.array([[1, 1, 1, 2, 2, 2], [3, 3, 3, 2, 2, 2], [4, 4, 4, 2, 2, 2], [5, 6, 6, 6, 6, 6], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]])

I suspect that the notation for result is only an approximation here, but that's all the more reason to test something like that.

I don't think such a concatenation is possible in one invocation of block. I think that would be written as

np.block([ [ np.block([ [one], [three], [four] ]), two ], [five, six], zeros ])

I've added a test with a nested invocation

madphysicist · 2017-04-04T13:54:10Z

numpy/core/tests/test_shape_base.py

Lowercase (global comment)

Again, convention here seems to be uppercase = 2D. Not sure that's a good convention, but we are being consistent

madphysicist · 2017-04-04T13:54:25Z

numpy/core/tests/test_shape_base.py

Lowercase variable names

Copied from the previous PR. I think the convention we're going for here is to use uppercase letters for 2D inputs?

We generally use lower case letters, although some old code and new code written by people less familiar with numpy conventions use caps. I suppose that comes from text books. Lower case is preferred for variables, CamelCase for classes, lower case for modules.

I've renamed these to a_2d. The fact that I had to justify the capitals suggests that they failed at conveying the meaning they intended, so let's go with something more obvious

madphysicist · 2017-04-04T13:55:01Z

numpy/core/tests/test_shape_base.py

For good measure, I would recommend adding a 3D case, but there is no end to that rabbit hole, so maybe it's better to leave it out.

madphysicist · 2017-04-04T14:03:47Z

numpy/core/tests/test_shape_base.py

I would like to see a test that checks type conversion. Something like

def test_different_types(self): a = np.array([1, 2, 3], dtype=np.int_) b = np.array([4.0, 5.0, 6.0], dtype=np.float_) result = np.block([a, b]) #assert result is [1.0, 2.0, 3.0, 4.0, 5.0, 6.0] #assert output dtype is np.float_

I guess we'd need this if we were to no longer write block in terms of concatenate

Right. I was thinking of the last item on your enhancements list:

Preallocate the result array to save redundant copies

juliantaylor · 2017-04-04T15:07:12Z

doc/release/1.13.0-notes.rst

I feel like this needs an example, I have no clue what matlabs square bracked notation is and thus what block does.

jakirkham · 2017-04-04T16:52:55Z

doc/release/1.13.0-notes.rst

Love this example. Particularly after our discussion. Is there a test for this as well? Just familiarizing myself with this PR now. So if I overlooked it, sorry about that.

There is a test that covers an even more complex situation.

charris · 2017-04-04T17:32:11Z

Does this broadcast arrays/sclars?

madphysicist · 2017-04-04T18:43:43Z

numpy/core/tests/test_shape_base.py

It is odd that there is no way to do this without the explicit nested call, but I suppose you are right, this would be a 3D array if you did [[[[one], [three], [four]], two], ...

That would be 4D, as that is how deep the brackets nest.

What you seem to be asking for there is a separate syntax where [ alternatively means "stack vertically" and "stack horizontally". As an ND generalisation, you could introduce [ meaning "stack in the next dimension" and ( meaning "stack in the previous dimension" - but that would be pretty cryptic.

I agree that the current syntax is best for now. It would be possible to avoid the nesting pretty easily in 2D (like MATLAB's ; vs ,), but it would not generalize well to higher dimensions. Concatenating along successive axes really does seem like the optimal approach here.

I'm pretty sure that matlab provides no way to do this either - you still have to nest [

All the more reason to keep things as they are :)

eric-wieser · 2017-04-04T20:25:33Z

@charris: No, this does not broadcast at all, as concatenate does not. All it does is promote dimensionality, so that (1, 2) is the same as (1, 1, 1, 2).

As I mention in the front-matter though, broadcasting would be desirable in future, but I think most of the utility is already there without it.

eric-wieser · 2017-04-04T20:26:10Z

Can I have some feedback about the restrictions I proposed adding regarding list nesting and forbidding tuples?

madphysicist · 2017-04-05T15:12:24Z

@eric-wieser The list nesting restriction seems reasonable. It never hurts to make things explicit, especially if you plan on adding possibilities.

Why do you want to restrict tuples? Personally, that's how I prefer to pass in a sequence of arrays, so I would be against that particular restriction.

Also, as far as your latest proposed enhancement goes (an array object containing other arrays), perhaps you could just treat an ndarray input with dtype=object the same as list and tuple (e.g. by expanding is_element)?

eric-wieser · 2017-04-05T15:19:42Z

Why do you want to restrict tuples?

Deliberately make np.block look similar to the usual invocation of np.array

Leave room for future expansion. For instance, structured arrays:

dt = np.dtype([('x', int), ('y', int)])
A = np.array([(1, 2), (3, 4)], dtype=dt)

# now
B = np.block(A, np.array((5, 6), dtype=dt))

# possibly in future
B = np.block([A, (5, 6)])  # requires us to not "use up" the `()` syntax
B = np.block([A, (5, 6)], dtype=dt)  # alternative option with a flag to disable that syntax
                                     # may still surprise the user, but not incompatible

perhaps you could just treat an ndarray input with dtype=object the same as list and tuple

This would have bad effects on actually trying to concatenate object arrays. Also, I'd like this code path to forbid the kind of stacking that produces:

AABB
CCCC

madphysicist · 2017-04-05T15:25:57Z

I don't think item 1 is a particularly good reason, but item 2 is pretty convincing.

Another item you may want to add to the expansion list is adding an explicit dtype parameter as your example suggests. That would probably go hand-in-hand with implementing preallocation and moving away from concatenate.

eric-wieser · 2017-04-05T15:28:19Z

That would probably go hand-in-hand with implementing preallocation and moving away from concatenate.

I think the correct path here is to add a dtype argument to concatenate, but that's for another time.

Regarding preallocation, I think that would be best solved with an out argument in concatenate

mhvk · 2017-04-05T16:51:32Z

@eric-wieser - this is a very nice improvement on what was already a nice idea. I have only skimmed the code, but I find the notation without matching levels of [] very unclear, so would agree with your proposed restriction that levels of [ should match (this is also consistent with np.array -- e.g., np.array([[1,2],3]) returns a 2-item object array). To me, it also makes sense to forbid tuples for now, and I think your analogy of np.array is a good reason in itself (and anyway, it can be enabled later if needed, but would not be possible to revert).

Finally, I find your example in the release notes clearer than some of the examples in the docstring, so would suggest repeating it there.

eric-wieser · 2017-04-05T17:40:13Z

@mhvk I've committed a version that allows toggling the bracket-matching with a flag. I could make some cleanup by removing that flag.

Do you want me to go ahead and enforce that always?

mhvk · 2017-04-05T18:12:57Z

Thinking about it more, I can see how [[a,b], c] is still reasonably obvious, as it reads "stacked one set", now stack next axis with another", but oddly I find [c, [a,b]] more mind-jarring. In contrast, just reading every instance of a brace pair as "now go one further dimension out" seems much simpler.
E.g., [[[a1]], [[a2]]] seems quite a bit better than [a1, [[a2]]].

Overall, I'd just go for simplicity and remove the toggle: one obvious way to do it and all that.

eric-wieser · 2017-04-05T20:48:33Z

oddly I find [c, [a,b]] more mind-jarring

Try np.block([a, b, c, [[[ [[[ ]]] ]]] ]) out for size then, which means concatenate(map(atleast7d, [a, b, c]), axis=-7). I think another argument against this input being allowed is that you have to read right to the end to understand what's going on - rather than just to the first element

Overall, I'd just go for simplicity and remove the toggle: one obvious way to do it and all that

I agree with this - I only kept it because it was there in the original PR.

eric-wieser · 2017-04-05T23:41:38Z

Huge implementation overhaul here. Gotten rather functional in order to avoid writing the same recursion over and over again.

shoyer · 2017-04-05T23:45:35Z

A couple high level comments:

This looks very nice now -- like something I might actually use if I can wrap my head around it!
Documentation should mention how `block uses broadcasting to expand array dimensions.
+1 for only treating list in a special way. NumPy could use less type overloading.

eric-wieser · 2017-04-05T23:50:26Z

Something like this then? ;)

how block uses broadcasting to expand array dimensions.

It does not.

I did think I covered this with "Elements shapes must match along the appropiate axes (without
broadcasting)". but perhaps it needs to be more obvious

+1 for only treating list in a special way

Things get pretty weird if I only treat list in this way - namely, suddenly tuple gets treated like ndarray! (because we just throw asanyarray at everything else)

This has some somewhat useful consequences:

>>> np.block([
...    ([1],
...     [1]), ([1,2,3],
...            [4,5,6])      # () guards these from np.block, and passes them
                             # to the array constructor instead
... ])
array([[1, 1, 2, 3],
       [1, 4, 5, 6]])

shoyer · 2017-04-05T23:53:52Z

how block uses broadcasting to expand array dimensions.

It's not broadcasting per-se, but it sure looks a lot like broadcasting in the way it supports promoting scalars to higher dimensions.

Things get pretty weird if I only treat list in this way - namely, suddenly tuple gets treated like ndarray! (because we just throw asanyarray at everything else)

OK, this is pretty confusing...

eric-wieser · 2017-04-05T23:55:43Z

Questions regarding that confusing thing:

Is this useful

Is this an argument for using tuple instead of list, to make the ndarray ctors more obvious

>>> np.block((  # was previously ([
...    [[1],
...     [1]], [[1,2,3],
...            [4,5,6]]      # now [] guards instead of ()
... ))  # was previously ])

Should I just throw TypeError: implicit conversion from tuple to ndarray not allowed within block if anyone tries to pull this on us, because it's super-confusing?

eric-wieser · 2017-04-20T16:00:14Z

@charris: I was hoping to add a default grid=True parameter first - with the reasoning that adding it as an afterthought would be a breaking change. But maybe that shouldn't be a concern, as if it did turn out to be worthwhile, it wouldn't be a difficult deprecation cycle

But yes, I'll squash the history down in some form, and I definitely intend to preserve @sotte's commit.

charris · 2017-04-20T16:02:18Z

I don't know how entangled the various commits are, but with so many it might be easiest to soft reset to after @sotte's commit and then commit the five files in some reasonable order.

eric-wieser · 2017-04-20T16:03:47Z

@charris: I'll work something out - my main issue with rebases is that something (typically Sublime Text) ends up sitting on a file handle, which messes up a git rebase -i, requiring me to start over.

charris · 2017-04-20T16:06:23Z

@eric-wieser You work on windows, yes? The virtue of a soft reset is that all the files are there in their current state and you just need to commit them (again).

eric-wieser · 2017-04-20T16:09:39Z

Indeed, soft resetting has some advantages, although I'd then have to rewrite my commit messages rather than just combine/edit them. Don't worry about it, I'll work something out

Anyway, are you -1 on putting grid=True in this release (eric-wieser#2)?

charris · 2017-04-20T16:15:49Z

I think the grid keyword can be added later if the default is False. That default was also @madphysicist's suggestion, but I don't really feel strongly either way. Might be good to hear from @mhvk and @shoyer on that.

eric-wieser · 2017-04-20T16:18:36Z

Relatedly, it might be handy to add np.b_[...] == np.block([...]), as suggested in #8899

sotte · 2017-04-20T20:10:33Z

I just want to say that I'm very happy that someone picked this up! Thanks guys!

Add a block function to the current stacking functions vstack, hstack, stack. It is similar to Matlab's square bracket stacking functionality for block matrices.

As requested in numpy#7768

Based on feedback in numpy#7768

This changes the API to not have any special cases for 2d arrays, and accept a single input for clarity

Refactor common loop, improve error messages

This: * Avoids having more than one way to do things (don't treat them like lists) * Avoids confusing behavior (don't treat them like ndarrays)

eric-wieser · 2017-04-20T21:26:02Z

@charris: Trimmed off 7 commits, and reordered them to put fixups to @sotte's code nearer the beginning (where possible) - trying to keep those separate from my changes, without creating too much work for myself. I've verified the final commit matches the previous one exactly, but I haven't done too much to check that the intermediate commits are still somewhat valid

Short enough now?

mhvk · 2017-04-20T21:41:07Z

My sense would be to keep the PR as is (which is very nice), and leave improvements (#8899) to 1.14, hopefully informed by feedback.

eric-wieser · 2017-04-20T21:55:44Z

@mhvk: Do we want to slip a deprecation of np.bmat into 1.13 to direct people to np.block, to try and increase awareness and therefore feedback?

mhvk · 2017-04-20T22:01:34Z

Seems a bit much to deprecate bmat -- unless we're also deprecating matrix more generally...

charris · 2017-04-21T00:51:33Z

If there is no more feedback, I'll put this in tomorrow.

charris · 2017-04-21T18:15:12Z

Thanks Eric.

eric-wieser requested a review from charris April 3, 2017 11:38

eric-wieser added 01 - Enhancement component: numpy._core labels Apr 3, 2017

eric-wieser mentioned this pull request Apr 3, 2017

Update 5057, ENH: Add a block function for creating stacked block arrays. #7768

Closed

eric-wieser force-pushed the np-block branch from 2a8a93f to 339cd13 Compare April 3, 2017 13:32

madphysicist reviewed Apr 4, 2017

View reviewed changes

numpy/core/shape_base.py Outdated

Copy link

Contributor

madphysicist Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline?

madphysicist suggested changes Apr 4, 2017

View reviewed changes

juliantaylor reviewed Apr 4, 2017

View reviewed changes

jakirkham mentioned this pull request Apr 4, 2017

Extending concatenate to nested sequences dask/dask#2028

Closed

jakirkham reviewed Apr 4, 2017

View reviewed changes

charris added this to the 1.13.0 release milestone Apr 4, 2017

madphysicist reviewed Apr 4, 2017

View reviewed changes

madphysicist approved these changes Apr 4, 2017

View reviewed changes

eric-wieser force-pushed the np-block branch from f218531 to 7715d0d Compare April 5, 2017 23:51

sotte and others added 2 commits April 20, 2017 21:30

ENH: Add a block function for creating stacked block arrays.

f5c27d4

Add a block function to the current stacking functions vstack, hstack, stack. It is similar to Matlab's square bracket stacking functionality for block matrices.

TST: Don't use assert_almost_equal on ints

d5361b7

As requested in numpy#7768

eric-wieser force-pushed the np-block branch from 6e70e75 to d40575e Compare April 20, 2017 21:16

eric-wieser added 7 commits April 20, 2017 22:22

DOC: Improve release notes and cross-references

7b64ca6

Based on feedback in numpy#7768

ENH: add support for nd inputs to block

00b55fc

This changes the API to not have any special cases for 2d arrays, and accept a single input for clarity

STY: Improve variable case

a29c2a0

TST: Test nested case

02a1866

ENH: Require list depths to always match

d0ad4c8

Refactor common loop, improve error messages

DOC: Improve description, add more notes

5e65b7c

API: Forbid tuple inputs entirely

642f7e8

This: * Avoids having more than one way to do things (don't treat them like lists) * Avoids confusing behavior (don't treat them like ndarrays)

eric-wieser force-pushed the np-block branch from d40575e to 642f7e8 Compare April 20, 2017 21:25

charris merged commit f1b3f00 into numpy:master Apr 21, 2017

This was referenced Apr 21, 2017

ENH: Implement take_along_axis as described in #8708 #8714

Closed

BUG: Fix pinv for stacked matrices #8827

Merged

perimosocordiae mentioned this pull request May 23, 2017

linalg.expm() does not work for block zero-block-diagonal matrices with dim>198 scipy/scipy#7404

Closed

jakirkham mentioned this pull request Sep 1, 2017

Add block dask/dask#2650

Merged

Uh oh!

Conversation

eric-wieser commented Apr 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed restrictions

Future Enhancements

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madphysicist Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Apr 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Apr 4, 2017

Uh oh!

eric-wieser commented Apr 4, 2017

Uh oh!

madphysicist commented Apr 5, 2017

Uh oh!

eric-wieser commented Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

eric-wieser commented Apr 3, 2017 •

edited

Loading

madphysicist Apr 4, 2017 •

edited

Loading

eric-wieser Apr 4, 2017 •

edited

Loading

eric-wieser commented Apr 5, 2017 •

edited

Loading

eric-wieser commented Apr 5, 2017 •

edited

Loading

eric-wieser commented Apr 5, 2017 •

edited

Loading

eric-wieser commented Apr 5, 2017 •

edited

Loading

eric-wieser commented Apr 5, 2017 •

edited

Loading