Skip to content

Conversation

@timmeinhardt
Copy link
Contributor

This PR adds an option to the ArgMaxLayer which lets the user specify an axis along which the layer should maximise the output. This now works similar to e.g. numpy.argmax and might be useful for deploying segmentation task networks.
The default option is axis = 0, which computes the argmax of the flattened bottom blob per image. So existing deployments of the layer work as expected. Negative indexing is also possible, but eventually the axis value must be between 0 and 3.
If axis != 0 and out_max_val is set to true the layer outputs max_val instead of max_ind. Specifying an axis and outputting max_val and max_ind at the same time was not possible due to the general architecture of a blob.

@seanbell
Copy link

Wouldn't axis = 1 make more sense as a default? By convention in caffe, axis = 0 is the dimension that indexes across images in a batch. Typically, axis = 1 is the "channel" that separates classes.

@timmeinhardt
Copy link
Contributor Author

In order to argmax a classification task output as (10,1000,1,1) with 10 images and 1000 classes you could either set axis = 0 or axis = 1. The result would be the same, but in the first case the bottom blob is treated as a flattened array and in the second the width and height of the blob are considered.
I went for axis = 0 as default option because that's how the layer worked before and processing the flattened input usually is the default option for argmax implementations.

@timmeinhardt timmeinhardt changed the title Add argmax_param axis to maximise output along the specified axis Add argmax_param "axis" to maximise output along the specified axis Sep 15, 2015
@seanbell
Copy link

OK I see what you've done. This is a great improvement to the layer, but I think the convention you've chosen for the axis parameter is confusing and can be improved.

If axis > 0, then your proposed layer is computing what you expect (argmax along that axis). If axis == 0, then the layer reverts to the old behavior (which is not actually computing argmax along axis 0). I think it's unintuitive and surprising to do it this way. For example, I would expect that for a (N, C) shape input, computing argmax along axis 0 should give an output shape of (top_k, C). I would be surprised to discover that it has actually given me an output of shape (N, top_k) and this would seem like a bug to me.

The current ArgmaxLayer didn't get updated during the 4D --> ND conversion in #1970, and has extra unnecessary dimensions of size 1 in the output. Also, we have easy efficient reshaping as of #2217. So the automatic reshaping that you get with this layer isn't really worth trying to preserve.

I think a better way to preserve backwards compatibility would be: if axis is not specified (check with has_axis), perform the old behavior. Otherwise, if it is set, compute the argmax along that actual axis, rather than making axis: 0 do something special that's not actually argmax along axis 0.

Or, you could have a flag (e.g. flatten with default true) that enables automatic flattening of the blob, and make the default axis: 1. This would also preserve the legacy behavior of this layer.

@timmeinhardt
Copy link
Contributor Author

I got your point. Basically I was already using axis = 0 as if no axis was specified so I changed that to use has_axis instead. Now one can even argmax across multiple images using axis = 0. Perhaps this might be useful for someone but it is definitely more intuitiv. Thank you! Sometimes you do not see the wood for the trees when you are in the middle of something...
So now if no axis is specified the maximisation is computed along the flattened bottom blob per image and if axis is set it maximises along the specified axis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check bottom[0]->num_axes() for generality now that blobs are N-D.

@shelhamer
Copy link
Member

Thanks for contributing this @timmeinhardt and thanks for guiding the development @seanbell! This looks good in general but I'd like to see it done in N-D instead of the fixed 4 dimensions since it's nearly there. You could look at ConcatLayer or other layers with an axis arg.

@timmeinhardt
Copy link
Contributor Author

I adjusted my commits and now it should work for N-D blobs as well. I updated the documentation as well. I did not know that you made this update to the blob architecture 👍

shelhamer added a commit that referenced this pull request Oct 1, 2015
Add argmax_param "axis" to maximise output along the specified axis
@shelhamer shelhamer merged commit 01e15d0 into BVLC:master Oct 1, 2015
@shelhamer
Copy link
Member

Thanks @timmeinhardt !

@timmeinhardt timmeinhardt deleted the argmax branch October 1, 2015 16:47
@AndKo1201
Copy link

@timmeinhardt could you please check the ArgMaxLayer::Reshape.

I'm getting a assertion when i work with debug lvl == 2 while executing the caffe unit tests.
It crashes in NetTest, TestSkipPropagateDown.

In this test the num_axes of bottom[0] is 2 and the value of has_axis_ is false.
But within the else you are trying to change the value of shape[2], which is an OutOfBounds access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants