Skip to content

Conversation

@jeffdonahue
Copy link
Contributor

This PR is built on top of the refactoring in #495 -- all actually relevant changes are in the final commit of the PR.

This improves the Backward method's interface: previously, the Backward{,_cpu,_gpu} methods had an interface Backward(const vector<Blob<Dtype>*>& top, const bool propagate_down, vector<Blob<Dtype>*>* bottom). There were a couple problems with this: (1) The propagate_down variable was never actually set to anything but true. (2) The scalar bool propagate_down input did not quite make sense for some layers, namely those with multiple bottom blobs (bottom->size() > 1) -- it's possible some of these bottom blobs need backward computation, but others don't. The best example is loss layers, which typically take a prediction and a target/label. In any useful network, the prediction input will need backpropagation, but the target/label input probably won't. I changed the interface to Backward(const vector<Blob<Dtype>*>& top, vector<bool> propagate_down, vector<Blob<Dtype>*>* bottom), so these can now get a propagate_down input of [true, false] which tells them to propagate down to the prediction but not to the target (and crash if there are weights below the label input, as Caffe doesn't know how to handle this case). One exception may be a Euclidean loss layer, which in dev relies on the assumption that its second input is a target which doesn't need backprop. However, it's possible you might want to train a network with a Euclidean loss where both inputs have weights below them and therefore both need backward. The new vector<bool> propagate_down input allows us to handle this case correctly, as Net::Init will infer for each bottom blob (rather than just for each layer) whether it needs backprop.

@kloudkl
Copy link
Contributor

kloudkl commented Jun 13, 2014

There are two ways so called array of structures (AoS) and structure of arrays (SoA) to store multiple complex objects. vector<bool> propagate_down, vector<Blob<Dtype>*>* bottom is an example of SoA. AoS is sometimes simpler, since tightly related properties belong to the same object and are always passed around together. If propagate_down becomes a property of Blob, the two vectors can be boiled down to a single vector<Blob<Dtype>*>* bottom.

@jeffdonahue
Copy link
Contributor Author

True, one could store a bool in the blobs themselves to indicate whether they need their diffs computed. I don't have a very strong preference. I'd guess this interface makes it less likely one will just ignore the propagate_down in their Backward implementation, and less likely one will forget to correctly set propagate_down before calling the function. Open to arguments for changing though.

@Yangqing
Copy link
Member

Kudos for implementing a long-desired feature :)

I would vote for the current approach. Propagate_down, as a property of a
Blob, may be highly confusing (note that Blob is just a multi-dimensional
array - what does it mean by saying "an array needs gradients propagated
down"?) It is only when we put things in the context of a neural network
when we need this variable. Also, the propagate down value is computed by
the network not by the blob, so it makes more sense that the network
bookkeeps them, not the blob.

Yangqing

On Thu, Jun 12, 2014 at 6:44 PM, Jeff Donahue notifications@github.com
wrote:

True, one could store a bool in the blobs themselves to indicate whether
they need their diffs computed. I don't have a very strong preference. I'd
guess this interface makes it less likely one will just ignore the
propagate_down in their Backward implementation, and less likely one will
forget to correctly set propagate_down before calling the function. Open to
arguments for changing though.

Reply to this email directly or view it on GitHub
#497 (comment).

@jeffdonahue
Copy link
Contributor Author

Thanks for the input @Yangqing! I'll leave the interface as is then.

Don't merge this yet; I realized there is a bit more work to do -- this (probably, haven't tested) breaks setting force_backward: true (which I never use so I didn't think about it) as that would cause propagate_down to be set to true for label inputs (e.g. to SoftmaxLossLayer), and with this PR Caffe will crash if requested to propagate_down any inputs it doesn't know how to propagate down.

@shelhamer
Copy link
Member

@jeffdonahue agreed on the new interface. I could test force_backward once you make your fix with the gradient map code.

p.s. Although this PR shows the two initial commits shared with #495, they are already merged and will be gracefully dropped when this PR is merged (for the curious...).

@jeffdonahue
Copy link
Contributor Author

Just rebased as such. Thanks for merging the refactoring!

@jeffdonahue jeffdonahue mentioned this pull request Jun 13, 2014
@sguada
Copy link
Contributor

sguada commented Jun 14, 2014

I don't know if force_backward is actually needed, specially since there are layers and blobs for which doesn't make sense to compute backward, or even cannot.
I think it was only used to compute gradients back to the image. However in the new MatCaffe #501 I included an example of computing the gradients and didn't need to use it.

@shelhamer
Copy link
Member

Why the NOT_IMPLEMENTED; // Cannot backprop to targets. in loss layers? I'd prefer a LOG(FATAL) << "Cannot back-propagate to targets."; or the like.

I read the "yet" as meaning backpropagation to targets could have some sense and might be implemented, but except for distances like Euclidean loss (and perhaps sigmoid cross entropy loss) it isn't right. No?

@jeffdonahue
Copy link
Contributor Author

Thanks for taking a look at this Evan! I changed to a more appropriate LOG(FATAL) as suggested, and fixed the force_backward bug by adding another layer property AllowForceBackward(const int bottom_index) { return true; } which loss layers override with bottom_index != 1 to disallow backprop to labels.

@Yangqing
Copy link
Member

as far as I can recall, force_backward had an earlier history in decaf where the propagate_down functionality was actually implemented, and as a result if I wanted to get the gradients w.r.t. the images I had to force_backward so that conv1 computes the gradients (otherwise, since data layer never requires propagate down, conv1 will only compute gradients w.r.t. the filters and not the input).

This is not that useful for now I guess, and I agree that removing this (kind of flaky) functionality is better if supporting it is too much of a hassle.

@shelhamer
Copy link
Member

I'm going to rebase and merge to move on to weight sharing.

Re: the necessity of force_backward, we can drop it if we like in a future PR (since potentially prototxts will have to change). Sergio can test with matcaffe and I'll test pycaffe once all the related PRs are in.

shelhamer added a commit that referenced this pull request Jun 26, 2014
@shelhamer shelhamer merged commit dadf8cd into BVLC:dev Jun 26, 2014
@shelhamer shelhamer mentioned this pull request Aug 7, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants