ENH: add np.block to improve upon np.bmat#8886
Conversation
numpy/core/shape_base.py
Outdated
numpy/core/shape_base.py
Outdated
There was a problem hiding this comment.
This can probably be replaced with np.array(x, subok=True, copy=False, ndmin=ndim).
There was a problem hiding this comment.
Just as an FYI, I am still not sure what all the review-related buttons do, so please forgive any mess I create because of that.
numpy/core/shape_base.py
Outdated
There was a problem hiding this comment.
While I know that most of the numpy devs prefer not to use try-except in their code, I think it has some value here because it prevents you from allocating an unnecessary list. Something like
try:
return max(xs)
except ValueError:
return 0
There was a problem hiding this comment.
I'd prefer not to, because that would also catch exceptions from evaluating x.__next__
There was a problem hiding this comment.
Good point. Comment rescinded.
numpy/core/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
I would be interested to see a case similar to this, but where Two spans multiple rows. The reason being that none of the input arrays are actually 2D here. Something like
one = np.array([[1, 1, 1]])
two = np.array([[2, 2, 2], [2, 2, 2], [2, 2, 2]])
three = np.array([[3, 3, 3]])
four = np.array([[4, 4, 4]])
...
result = block([[[one], [three], [four], two], [five, six], zeros])
expected = np.array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 2, 2, 2],
[4, 4, 4, 2, 2, 2],
[5, 6, 6, 6, 6, 6],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
There was a problem hiding this comment.
I suspect that the notation for result is only an approximation here, but that's all the more reason to test something like that.
There was a problem hiding this comment.
I don't think such a concatenation is possible in one invocation of block. I think that would be written as
np.block([
[
np.block([
[one], [three], [four]
]),
two
],
[five, six],
zeros
])There was a problem hiding this comment.
I've added a test with a nested invocation
numpy/core/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
Lowercase (global comment)
There was a problem hiding this comment.
Again, convention here seems to be uppercase = 2D. Not sure that's a good convention, but we are being consistent
numpy/core/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
Lowercase variable names
There was a problem hiding this comment.
Copied from the previous PR. I think the convention we're going for here is to use uppercase letters for 2D inputs?
There was a problem hiding this comment.
We generally use lower case letters, although some old code and new code written by people less familiar with numpy conventions use caps. I suppose that comes from text books. Lower case is preferred for variables, CamelCase for classes, lower case for modules.
There was a problem hiding this comment.
I've renamed these to a_2d. The fact that I had to justify the capitals suggests that they failed at conveying the meaning they intended, so let's go with something more obvious
numpy/core/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
For good measure, I would recommend adding a 3D case, but there is no end to that rabbit hole, so maybe it's better to leave it out.
numpy/core/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
I would like to see a test that checks type conversion. Something like
def test_different_types(self):
a = np.array([1, 2, 3], dtype=np.int_)
b = np.array([4.0, 5.0, 6.0], dtype=np.float_)
result = np.block([a, b])
#assert result is [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
#assert output dtype is np.float_
There was a problem hiding this comment.
I guess we'd need this if we were to no longer write block in terms of concatenate
There was a problem hiding this comment.
Right. I was thinking of the last item on your enhancements list:
Preallocate the result array to save redundant copies
doc/release/1.13.0-notes.rst
Outdated
There was a problem hiding this comment.
I feel like this needs an example, I have no clue what matlabs square bracked notation is and thus what block does.
doc/release/1.13.0-notes.rst
Outdated
There was a problem hiding this comment.
Love this example. Particularly after our discussion. Is there a test for this as well? Just familiarizing myself with this PR now. So if I overlooked it, sorry about that.
There was a problem hiding this comment.
There is a test that covers an even more complex situation.
|
Does this broadcast arrays/sclars? |
numpy/core/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
It is odd that there is no way to do this without the explicit nested call, but I suppose you are right, this would be a 3D array if you did [[[[one], [three], [four]], two], ...
There was a problem hiding this comment.
That would be 4D, as that is how deep the brackets nest.
What you seem to be asking for there is a separate syntax where [ alternatively means "stack vertically" and "stack horizontally". As an ND generalisation, you could introduce [ meaning "stack in the next dimension" and ( meaning "stack in the previous dimension" - but that would be pretty cryptic.
There was a problem hiding this comment.
I agree that the current syntax is best for now. It would be possible to avoid the nesting pretty easily in 2D (like MATLAB's ; vs ,), but it would not generalize well to higher dimensions. Concatenating along successive axes really does seem like the optimal approach here.
There was a problem hiding this comment.
I'm pretty sure that matlab provides no way to do this either - you still have to nest [
There was a problem hiding this comment.
All the more reason to keep things as they are :)
|
@charris: No, this does not broadcast at all, as As I mention in the front-matter though, broadcasting would be desirable in future, but I think most of the utility is already there without it. |
|
Can I have some feedback about the restrictions I proposed adding regarding list nesting and forbidding tuples? |
|
@eric-wieser The list nesting restriction seems reasonable. It never hurts to make things explicit, especially if you plan on adding possibilities. Why do you want to restrict tuples? Personally, that's how I prefer to pass in a sequence of arrays, so I would be against that particular restriction. Also, as far as your latest proposed enhancement goes (an array object containing other arrays), perhaps you could just treat an |
This would have bad effects on actually trying to concatenate object arrays. Also, I'd like this code path to forbid the kind of stacking that produces: |
|
I don't think item 1 is a particularly good reason, but item 2 is pretty convincing. Another item you may want to add to the expansion list is adding an explicit dtype parameter as your example suggests. That would probably go hand-in-hand with implementing preallocation and moving away from |
I think the correct path here is to add a Regarding preallocation, I think that would be best solved with an |
|
@eric-wieser - this is a very nice improvement on what was already a nice idea. I have only skimmed the code, but I find the notation without matching levels of Finally, I find your example in the release notes clearer than some of the examples in the docstring, so would suggest repeating it there. |
|
@mhvk I've committed a version that allows toggling the bracket-matching with a flag. I could make some cleanup by removing that flag. Do you want me to go ahead and enforce that always? |
|
Thinking about it more, I can see how Overall, I'd just go for simplicity and remove the toggle: one obvious way to do it and all that. |
Try
I agree with this - I only kept it because it was there in the original PR. |
|
Huge implementation overhaul here. Gotten rather functional in order to avoid writing the same recursion over and over again. |
|
A couple high level comments:
|
|
Something like this then? ;)
I did think I covered this with "Elements shapes must match along the appropiate axes (without
Things get pretty weird if I only treat This has some somewhat useful consequences: >>> np.block([
... ([1],
... [1]), ([1,2,3],
... [4,5,6]) # () guards these from np.block, and passes them
# to the array constructor instead
... ])
array([[1, 1, 2, 3],
[1, 4, 5, 6]]) |
It's not broadcasting per-se, but it sure looks a lot like broadcasting in the way it supports promoting scalars to higher dimensions.
OK, this is pretty confusing... |
|
Questions regarding that confusing thing:
|
|
@charris: I was hoping to add a default But yes, I'll squash the history down in some form, and I definitely intend to preserve @sotte's commit. |
|
I don't know how entangled the various commits are, but with so many it might be easiest to soft reset to after @sotte's commit and then commit the five files in some reasonable order. |
|
@charris: I'll work something out - my main issue with rebases is that something (typically Sublime Text) ends up sitting on a file handle, which messes up a |
|
@eric-wieser You work on windows, yes? The virtue of a soft reset is that all the files are there in their current state and you just need to commit them (again). |
|
Indeed, soft resetting has some advantages, although I'd then have to rewrite my commit messages rather than just combine/edit them. Don't worry about it, I'll work something out Anyway, are you -1 on putting |
|
I think the grid keyword can be added later if the default is |
|
Relatedly, it might be handy to add |
|
I just want to say that I'm very happy that someone picked this up! Thanks guys! |
Add a block function to the current stacking functions vstack, hstack, stack. It is similar to Matlab's square bracket stacking functionality for block matrices.
As requested in numpy#7768
Based on feedback in numpy#7768
This changes the API to not have any special cases for 2d arrays, and accept a single input for clarity
Refactor common loop, improve error messages
This: * Avoids having more than one way to do things (don't treat them like lists) * Avoids confusing behavior (don't treat them like ndarrays)
|
@charris: Trimmed off 7 commits, and reordered them to put fixups to @sotte's code nearer the beginning (where possible) - trying to keep those separate from my changes, without creating too much work for myself. I've verified the final commit matches the previous one exactly, but I haven't done too much to check that the intermediate commits are still somewhat valid Short enough now? |
|
My sense would be to keep the PR as is (which is very nice), and leave improvements (#8899) to 1.14, hopefully informed by feedback. |
|
@mhvk: Do we want to slip a deprecation of |
|
Seems a bit much to deprecate |
|
If there is no more feedback, I'll put this in tomorrow. |
|
Thanks Eric. |
This continues where #7768 left off (which continues where #5057 left off). The first commit is just a rebase of #7768 onto master, with the release notes moved to the latest release.
Changes since then:
*args, for consistency withstackandconcatenateThe semantics are essentially:
Normalize the list of lists to be of uniform depthreshapeall the list elements to be at least as many dimensions as the list depth (ENH: Added atleast_nd #7804 ?)concatenatefrom the innermost list to the outermost, working backwards from the last dimension - elements can have extra leading dimensionsProposed restrictions
Things that we should decide on before merging:
tuplesmean something else in future (perhaps records, like they do innp.arraywith structured dtypes).np.block([[a, b], c])illegal, must be speltnp.block([[a, b], [c]])(currently these are equivalent - this change would make step 1 a check, not a normalization). This makes things look more like thenp.arrayconstructor, but forces users to be more verboseFuture Enhancements
Moved to #8899