Improve Backward method interface #497

jeffdonahue · 2014-06-13T00:33:20Z

This PR is built on top of the refactoring in #495 -- all actually relevant changes are in the final commit of the PR.

This improves the Backward method's interface: previously, the Backward{,_cpu,_gpu} methods had an interface Backward(const vector<Blob<Dtype>*>& top, const bool propagate_down, vector<Blob<Dtype>*>* bottom). There were a couple problems with this: (1) The propagate_down variable was never actually set to anything but true. (2) The scalar bool propagate_down input did not quite make sense for some layers, namely those with multiple bottom blobs (bottom->size() > 1) -- it's possible some of these bottom blobs need backward computation, but others don't. The best example is loss layers, which typically take a prediction and a target/label. In any useful network, the prediction input will need backpropagation, but the target/label input probably won't. I changed the interface to Backward(const vector<Blob<Dtype>*>& top, vector<bool> propagate_down, vector<Blob<Dtype>*>* bottom), so these can now get a propagate_down input of [true, false] which tells them to propagate down to the prediction but not to the target (and crash if there are weights below the label input, as Caffe doesn't know how to handle this case). One exception may be a Euclidean loss layer, which in dev relies on the assumption that its second input is a target which doesn't need backprop. However, it's possible you might want to train a network with a Euclidean loss where both inputs have weights below them and therefore both need backward. The new vector<bool> propagate_down input allows us to handle this case correctly, as Net::Init will infer for each bottom blob (rather than just for each layer) whether it needs backprop.

kloudkl · 2014-06-13T01:35:16Z

There are two ways so called array of structures (AoS) and structure of arrays (SoA) to store multiple complex objects. vector<bool> propagate_down, vector<Blob<Dtype>*>* bottom is an example of SoA. AoS is sometimes simpler, since tightly related properties belong to the same object and are always passed around together. If propagate_down becomes a property of Blob, the two vectors can be boiled down to a single vector<Blob<Dtype>*>* bottom.

jeffdonahue · 2014-06-13T01:44:22Z

True, one could store a bool in the blobs themselves to indicate whether they need their diffs computed. I don't have a very strong preference. I'd guess this interface makes it less likely one will just ignore the propagate_down in their Backward implementation, and less likely one will forget to correctly set propagate_down before calling the function. Open to arguments for changing though.

Yangqing · 2014-06-13T17:32:50Z

Kudos for implementing a long-desired feature :)

I would vote for the current approach. Propagate_down, as a property of a
Blob, may be highly confusing (note that Blob is just a multi-dimensional
array - what does it mean by saying "an array needs gradients propagated
down"?) It is only when we put things in the context of a neural network
when we need this variable. Also, the propagate down value is computed by
the network not by the blob, so it makes more sense that the network
bookkeeps them, not the blob.

Yangqing

On Thu, Jun 12, 2014 at 6:44 PM, Jeff Donahue notifications@github.com
wrote:

True, one could store a bool in the blobs themselves to indicate whether
they need their diffs computed. I don't have a very strong preference. I'd
guess this interface makes it less likely one will just ignore the
propagate_down in their Backward implementation, and less likely one will
forget to correctly set propagate_down before calling the function. Open to
arguments for changing though.

Reply to this email directly or view it on GitHub
#497 (comment).

jeffdonahue · 2014-06-13T18:59:42Z

Thanks for the input @Yangqing! I'll leave the interface as is then.

Don't merge this yet; I realized there is a bit more work to do -- this (probably, haven't tested) breaks setting force_backward: true (which I never use so I didn't think about it) as that would cause propagate_down to be set to true for label inputs (e.g. to SoftmaxLossLayer), and with this PR Caffe will crash if requested to propagate_down any inputs it doesn't know how to propagate down.

shelhamer · 2014-06-13T20:28:18Z

@jeffdonahue agreed on the new interface. I could test force_backward once you make your fix with the gradient map code.

p.s. Although this PR shows the two initial commits shared with #495, they are already merged and will be gracefully dropped when this PR is merged (for the curious...).

jeffdonahue · 2014-06-13T20:34:10Z

Just rebased as such. Thanks for merging the refactoring!

sguada · 2014-06-14T14:57:52Z

I don't know if force_backward is actually needed, specially since there are layers and blobs for which doesn't make sense to compute backward, or even cannot.
I think it was only used to compute gradients back to the image. However in the new MatCaffe #501 I included an example of computing the gradients and didn't need to use it.

shelhamer · 2014-06-15T03:34:37Z

Why the NOT_IMPLEMENTED; // Cannot backprop to targets. in loss layers? I'd prefer a LOG(FATAL) << "Cannot back-propagate to targets."; or the like.

I read the "yet" as meaning backpropagation to targets could have some sense and might be implemented, but except for distances like Euclidean loss (and perhaps sigmoid cross entropy loss) it isn't right. No?

jeffdonahue · 2014-06-17T00:05:58Z

Thanks for taking a look at this Evan! I changed to a more appropriate LOG(FATAL) as suggested, and fixed the force_backward bug by adding another layer property AllowForceBackward(const int bottom_index) { return true; } which loss layers override with bottom_index != 1 to disallow backprop to labels.

Yangqing · 2014-06-19T01:26:44Z

as far as I can recall, force_backward had an earlier history in decaf where the propagate_down functionality was actually implemented, and as a result if I wanted to get the gradients w.r.t. the images I had to force_backward so that conv1 computes the gradients (otherwise, since data layer never requires propagate down, conv1 will only compute gradients w.r.t. the filters and not the input).

This is not that useful for now I guess, and I agree that removing this (kind of flaky) functionality is better if supporting it is too much of a hassle.

shelhamer · 2014-06-26T17:30:33Z

I'm going to rebase and merge to move on to weight sharing.

Re: the necessity of force_backward, we can drop it if we like in a future PR (since potentially prototxts will have to change). Sergio can test with matcaffe and I'll test pycaffe once all the related PRs are in.

long-standing issue with how this is handled in loss layers (esp. EuclideanLossLayer)

Improve Backward method interface

jeffdonahue mentioned this pull request Jun 13, 2014

Weight sharing #500

Closed

shelhamer mentioned this pull request Jun 26, 2014

Bugs fixed in the euclidean loss layer. #137

Closed

jeffdonahue added 2 commits June 26, 2014 11:34

change Backward interface: propagate_down is a vector -- use to fix

7444359

long-standing issue with how this is handled in loss layers (esp. EuclideanLossLayer)

force_backward works properly with non-backproppable things

a04834f

shelhamer added a commit that referenced this pull request Jun 26, 2014

Merge pull request #497 from jeffdonahue/fix-backward-interface

dadf8cd

Improve Backward method interface

shelhamer merged commit dadf8cd into BVLC:dev Jun 26, 2014

shelhamer mentioned this pull request Aug 7, 2014

Next: 0.9999 #880

Merged

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#497 from jeffdonahue/fix-backward-interface

6b4de49

Improve Backward method interface

ih4cku mentioned this pull request Apr 13, 2016

propagate_down ih4cku/caffe-notes#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Backward method interface #497

Improve Backward method interface #497

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

kloudkl commented Jun 13, 2014

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

Yangqing commented Jun 13, 2014

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

shelhamer commented Jun 13, 2014

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

sguada commented Jun 14, 2014

Uh oh!

shelhamer commented Jun 15, 2014

Uh oh!

jeffdonahue commented Jun 17, 2014

Uh oh!

Yangqing commented Jun 19, 2014

Uh oh!

shelhamer commented Jun 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Improve Backward method interface #497

Improve Backward method interface #497

Uh oh!

Conversation

jeffdonahue commented Jun 13, 2014

Uh oh!

kloudkl commented Jun 13, 2014

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

Yangqing commented Jun 13, 2014

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

shelhamer commented Jun 13, 2014

Uh oh!

jeffdonahue commented Jun 13, 2014

Uh oh!

sguada commented Jun 14, 2014

Uh oh!

shelhamer commented Jun 15, 2014

Uh oh!

jeffdonahue commented Jun 17, 2014

Uh oh!

Yangqing commented Jun 19, 2014

Uh oh!

shelhamer commented Jun 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants