-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Improve Backward method interface #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There are two ways so called array of structures (AoS) and structure of arrays (SoA) to store multiple complex objects. |
|
True, one could store a |
|
Kudos for implementing a long-desired feature :) I would vote for the current approach. Propagate_down, as a property of a Yangqing On Thu, Jun 12, 2014 at 6:44 PM, Jeff Donahue notifications@github.com
|
|
Thanks for the input @Yangqing! I'll leave the interface as is then. Don't merge this yet; I realized there is a bit more work to do -- this (probably, haven't tested) breaks setting |
|
@jeffdonahue agreed on the new interface. I could test p.s. Although this PR shows the two initial commits shared with #495, they are already merged and will be gracefully dropped when this PR is merged (for the curious...). |
|
Just rebased as such. Thanks for merging the refactoring! |
|
I don't know if |
|
Why the I read the "yet" as meaning backpropagation to targets could have some sense and might be implemented, but except for distances like Euclidean loss (and perhaps sigmoid cross entropy loss) it isn't right. No? |
|
Thanks for taking a look at this Evan! I changed to a more appropriate LOG(FATAL) as suggested, and fixed the force_backward bug by adding another layer property |
|
as far as I can recall, force_backward had an earlier history in decaf where the propagate_down functionality was actually implemented, and as a result if I wanted to get the gradients w.r.t. the images I had to force_backward so that conv1 computes the gradients (otherwise, since data layer never requires propagate down, conv1 will only compute gradients w.r.t. the filters and not the input). This is not that useful for now I guess, and I agree that removing this (kind of flaky) functionality is better if supporting it is too much of a hassle. |
|
I'm going to rebase and merge to move on to weight sharing. Re: the necessity of |
long-standing issue with how this is handled in loss layers (esp. EuclideanLossLayer)
Improve Backward method interface
Improve Backward method interface
This PR is built on top of the refactoring in #495 -- all actually relevant changes are in the final commit of the PR.
This improves the Backward method's interface: previously, the
Backward{,_cpu,_gpu}methods had an interfaceBackward(const vector<Blob<Dtype>*>& top, const bool propagate_down, vector<Blob<Dtype>*>* bottom). There were a couple problems with this: (1) Thepropagate_downvariable was never actually set to anything but true. (2) The scalarbool propagate_downinput did not quite make sense for some layers, namely those with multiple bottom blobs (bottom->size() > 1) -- it's possible some of these bottom blobs need backward computation, but others don't. The best example is loss layers, which typically take a prediction and a target/label. In any useful network, the prediction input will need backpropagation, but the target/label input probably won't. I changed the interface toBackward(const vector<Blob<Dtype>*>& top, vector<bool> propagate_down, vector<Blob<Dtype>*>* bottom), so these can now get apropagate_downinput of[true, false]which tells them to propagate down to the prediction but not to the target (and crash if there are weights below the label input, as Caffe doesn't know how to handle this case). One exception may be a Euclidean loss layer, which indevrelies on the assumption that its second input is a target which doesn't need backprop. However, it's possible you might want to train a network with a Euclidean loss where both inputs have weights below them and therefore both need backward. The newvector<bool> propagate_downinput allows us to handle this case correctly, asNet::Initwill infer for each bottom blob (rather than just for each layer) whether it needs backprop.