-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Add argmax_param "axis" to maximise output along the specified axis #3069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Wouldn't axis = 1 make more sense as a default? By convention in caffe, axis = 0 is the dimension that indexes across images in a batch. Typically, axis = 1 is the "channel" that separates classes. |
c1da114 to
b40a237
Compare
|
In order to argmax a classification task output as (10,1000,1,1) with 10 images and 1000 classes you could either set axis = 0 or axis = 1. The result would be the same, but in the first case the bottom blob is treated as a flattened array and in the second the width and height of the blob are considered. |
|
OK I see what you've done. This is a great improvement to the layer, but I think the convention you've chosen for the If axis > 0, then your proposed layer is computing what you expect (argmax along that axis). If axis == 0, then the layer reverts to the old behavior (which is not actually computing argmax along axis 0). I think it's unintuitive and surprising to do it this way. For example, I would expect that for a (N, C) shape input, computing argmax along axis 0 should give an output shape of (top_k, C). I would be surprised to discover that it has actually given me an output of shape (N, top_k) and this would seem like a bug to me. The current ArgmaxLayer didn't get updated during the 4D --> ND conversion in #1970, and has extra unnecessary dimensions of size 1 in the output. Also, we have easy efficient reshaping as of #2217. So the automatic reshaping that you get with this layer isn't really worth trying to preserve. I think a better way to preserve backwards compatibility would be: if Or, you could have a flag (e.g. |
b40a237 to
9d59979
Compare
|
I got your point. Basically I was already using axis = 0 as if no axis was specified so I changed that to use has_axis instead. Now one can even argmax across multiple images using axis = 0. Perhaps this might be useful for someone but it is definitely more intuitiv. Thank you! Sometimes you do not see the wood for the trees when you are in the middle of something... |
9d59979 to
f95bf35
Compare
src/caffe/layers/argmax_layer.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should check bottom[0]->num_axes() for generality now that blobs are N-D.
|
Thanks for contributing this @timmeinhardt and thanks for guiding the development @seanbell! This looks good in general but I'd like to see it done in N-D instead of the fixed 4 dimensions since it's nearly there. You could look at |
f95bf35 to
f9cad0a
Compare
f9cad0a to
def3d3c
Compare
|
I adjusted my commits and now it should work for N-D blobs as well. I updated the documentation as well. I did not know that you made this update to the blob architecture 👍 |
Add argmax_param "axis" to maximise output along the specified axis
|
Thanks @timmeinhardt ! |
|
@timmeinhardt could you please check the ArgMaxLayer::Reshape. I'm getting a assertion when i work with debug lvl == 2 while executing the caffe unit tests. In this test the num_axes of bottom[0] is 2 and the value of has_axis_ is false. |
This PR adds an option to the ArgMaxLayer which lets the user specify an axis along which the layer should maximise the output. This now works similar to e.g. numpy.argmax and might be useful for deploying segmentation task networks.
The default option is axis = 0, which computes the argmax of the flattened bottom blob per image. So existing deployments of the layer work as expected. Negative indexing is also possible, but eventually the axis value must be between 0 and 3.
If axis != 0 and out_max_val is set to true the layer outputs max_val instead of max_ind. Specifying an axis and outputting max_val and max_ind at the same time was not possible due to the general architecture of a blob.