Skip to content

Conversation

@mtamburrano
Copy link
Contributor

Removed checks in the loss layers as discussed in #1448 with @longjon (point 7).
Now the loss layers don't check anymore if a backpropagation on labels is requested, this could now happen because #1482

This was referenced Nov 26, 2014
@jeffdonahue
Copy link
Contributor

I don't understand this, how are these layers able to backprop the error to integer labels now? It doesn't look like this adds anything that accesses bottom[1]->mutable_cpu_diff() in these layers...

@mtamburrano
Copy link
Contributor Author

these layer are not able to backprop to labels, they didn't do it before and they don't do it now.
But the checks I've removed I think were there to avoid to attach non-input data into loss layers. But this doesn't allow some layer (like the new proposed in #1482) to require backpropagation on some blob and not for other.

@longjon
Copy link
Contributor

longjon commented Nov 26, 2014

So, to be clear, what I meant to say in #1448 was, "figure out why this check exists, why you are running into it, and what should be done about it". So let's think about that...

Backpropagation to labels doesn't make sense, of course, because labels lack topological, let alone differentiable structure. So why does the propagate_down mechanism think we need to backprop the labels? Without checking the code, I believe it's because the computation path for the labels passes through layers with (free) parameters, so without any extra knowledge, the computation graph seems to require that backward step. What could be done about this?

  • Some extra mechanism could be added so that the "needs backward" computation knows that loss layers are not differentiable wrt labels. That might ultimately be the right thing (what happens if there's a whole chain of layers between loss and labels?), but is probably too heavy for now.
  • The loss could just ignore propagate_down for labels (I guess that's this PR as it stands). Seems to do the right thing, but are the diffs guaranteed to be zero?
  • The loss could explicitly fill the diff with zeros. Seem to do the right thing, but isn't this extra work?

And what about these errors catching mistakes where differentiable things are hooked up to the loss?

I'll let y'all mull that over for now.

@shelhamer
Copy link
Member

As a small motivational aside, I had to disable this check in a recent
branch to make use of a loss-mangling layer that computed labels on-the-fly
for sampling and class balancing. Since this layer was on the computation
path for both data and labels, propagate_down was triggered, making the
check fail.

On Wed, Nov 26, 2014 at 4:27 PM, longjon notifications@github.com wrote:

So, to be clear, what I meant to say in #1448
#1448 was, "figure out why this check
exists, why you are running into it, and what should be done about it". So
let's think about that...

Backpropagation to labels doesn't make sense, of course, because labels
lack topological, let alone differentiable structure. So why does the
propagate_down mechanism think we need to backprop the labels? Without
checking the code, I believe it's because the computation path for the
labels passes through layers with (free) parameters, so without any extra
knowledge, the computation graph seems to require that backward step. What
could be done about this?

  • Some extra mechanism could be added so that the "needs backward"
    computation knows that loss layers are not differentiable wrt labels. That
    might ultimately be the right thing (what happens if there's a whole chain
    of layers between loss and labels?), but is probably too heavy for now.
  • The loss could just ignore propagate_down for labels (I guess that's
    this PR as it stands). Seems to do the right thing, but are the diffs
    guaranteed to be zero?
  • The loss could explicitly fill the diff with zeros. Seem to do the
    right thing, but isn't this extra work?

And what about these errors catching mistakes where differentiable things
are hooked up to the loss?

I'll let y'all mull that over for now.


Reply to this email directly or view it on GitHub
#1483 (comment).

@bhack
Copy link
Contributor

bhack commented Nov 26, 2014

I think it is because propagate_down flag is reasoning in "or" fashion so it is not atomically differentiated at single blob level. The backprop in loss layers was engineered to work directly with label from input layers (data_layers etc.) So the logic wasn't engineered to support intermediate layers that handle labels and need back propagation.

@sguada
Copy link
Contributor

sguada commented Nov 26, 2014

Maybe we should allow to each layer have a vector of propagate_down defined
by the prototxt, or a default behaviour, that for most layers will be
propagate_down to all bottoms, for loss_layers propagate_down only for data
but not for label, and data_layers don't propagate down.
But the default behaviour could be overwritten in the prototxt params of
the layer.

Sergio

2014-11-26 13:33 GMT-08:00 Evan Shelhamer notifications@github.com:

As a small motivational aside, I had to disable this check in a recent
branch to make use of a loss-mangling layer that computed labels
on-the-fly
for sampling and class balancing. Since this layer was on the computation
path for both data and labels, propagate_down was triggered, making the
check fail.

On Wed, Nov 26, 2014 at 4:27 PM, longjon notifications@github.com
wrote:

So, to be clear, what I meant to say in #1448
#1448 was, "figure out why this
check
exists, why you are running into it, and what should be done about it".
So
let's think about that...

Backpropagation to labels doesn't make sense, of course, because labels
lack topological, let alone differentiable structure. So why does the
propagate_down mechanism think we need to backprop the labels? Without
checking the code, I believe it's because the computation path for the
labels passes through layers with (free) parameters, so without any
extra
knowledge, the computation graph seems to require that backward step.
What
could be done about this?

  • Some extra mechanism could be added so that the "needs backward"
    computation knows that loss layers are not differentiable wrt labels.
    That
    might ultimately be the right thing (what happens if there's a whole
    chain
    of layers between loss and labels?), but is probably too heavy for now.
  • The loss could just ignore propagate_down for labels (I guess that's
    this PR as it stands). Seems to do the right thing, but are the diffs
    guaranteed to be zero?
  • The loss could explicitly fill the diff with zeros. Seem to do the
    right thing, but isn't this extra work?

And what about these errors catching mistakes where differentiable
things
are hooked up to the loss?

I'll let y'all mull that over for now.


Reply to this email directly or view it on GitHub
#1483 (comment).


Reply to this email directly or view it on GitHub
#1483 (comment).

@bhack
Copy link
Contributor

bhack commented Dec 22, 2014

@shelhamer This need an unified address by BVLC members. It is waiting this by 26 days.

@mtamburrano
Copy link
Contributor Author

closed, new PR is #2052

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants