Removed propagate labels check in loss layers #1483

mtamburrano · 2014-11-26T18:50:13Z

Removed checks in the loss layers as discussed in #1448 with @longjon (point 7).
Now the loss layers don't check anymore if a backpropagation on labels is requested, this could now happen because #1482

jeffdonahue · 2014-11-26T20:09:13Z

I don't understand this, how are these layers able to backprop the error to integer labels now? It doesn't look like this adds anything that accesses bottom[1]->mutable_cpu_diff() in these layers...

mtamburrano · 2014-11-26T20:17:23Z

these layer are not able to backprop to labels, they didn't do it before and they don't do it now.
But the checks I've removed I think were there to avoid to attach non-input data into loss layers. But this doesn't allow some layer (like the new proposed in #1482) to require backpropagation on some blob and not for other.

longjon · 2014-11-26T21:27:52Z

So, to be clear, what I meant to say in #1448 was, "figure out why this check exists, why you are running into it, and what should be done about it". So let's think about that...

Backpropagation to labels doesn't make sense, of course, because labels lack topological, let alone differentiable structure. So why does the propagate_down mechanism think we need to backprop the labels? Without checking the code, I believe it's because the computation path for the labels passes through layers with (free) parameters, so without any extra knowledge, the computation graph seems to require that backward step. What could be done about this?

Some extra mechanism could be added so that the "needs backward" computation knows that loss layers are not differentiable wrt labels. That might ultimately be the right thing (what happens if there's a whole chain of layers between loss and labels?), but is probably too heavy for now.
The loss could just ignore propagate_down for labels (I guess that's this PR as it stands). Seems to do the right thing, but are the diffs guaranteed to be zero?
The loss could explicitly fill the diff with zeros. Seem to do the right thing, but isn't this extra work?

And what about these errors catching mistakes where differentiable things are hooked up to the loss?

I'll let y'all mull that over for now.

shelhamer · 2014-11-26T21:33:52Z

As a small motivational aside, I had to disable this check in a recent
branch to make use of a loss-mangling layer that computed labels on-the-fly
for sampling and class balancing. Since this layer was on the computation
path for both data and labels, propagate_down was triggered, making the
check fail.

On Wed, Nov 26, 2014 at 4:27 PM, longjon notifications@github.com wrote:

So, to be clear, what I meant to say in #1448
#1448 was, "figure out why this check
exists, why you are running into it, and what should be done about it". So
let's think about that...

Backpropagation to labels doesn't make sense, of course, because labels
lack topological, let alone differentiable structure. So why does the
propagate_down mechanism think we need to backprop the labels? Without
checking the code, I believe it's because the computation path for the
labels passes through layers with (free) parameters, so without any extra
knowledge, the computation graph seems to require that backward step. What
could be done about this?

Some extra mechanism could be added so that the "needs backward"
computation knows that loss layers are not differentiable wrt labels. That
might ultimately be the right thing (what happens if there's a whole chain
of layers between loss and labels?), but is probably too heavy for now.

The loss could just ignore propagate_down for labels (I guess that's
this PR as it stands). Seems to do the right thing, but are the diffs
guaranteed to be zero?

The loss could explicitly fill the diff with zeros. Seem to do the
right thing, but isn't this extra work?

And what about these errors catching mistakes where differentiable things
are hooked up to the loss?

I'll let y'all mull that over for now.

—
Reply to this email directly or view it on GitHub
#1483 (comment).

bhack · 2014-11-26T22:24:44Z

I think it is because propagate_down flag is reasoning in "or" fashion so it is not atomically differentiated at single blob level. The backprop in loss layers was engineered to work directly with label from input layers (data_layers etc.) So the logic wasn't engineered to support intermediate layers that handle labels and need back propagation.

sguada · 2014-11-26T22:32:54Z

Maybe we should allow to each layer have a vector of propagate_down defined
by the prototxt, or a default behaviour, that for most layers will be
propagate_down to all bottoms, for loss_layers propagate_down only for data
but not for label, and data_layers don't propagate down.
But the default behaviour could be overwritten in the prototxt params of
the layer.

Sergio

2014-11-26 13:33 GMT-08:00 Evan Shelhamer notifications@github.com:

As a small motivational aside, I had to disable this check in a recent
branch to make use of a loss-mangling layer that computed labels
on-the-fly
for sampling and class balancing. Since this layer was on the computation
path for both data and labels, propagate_down was triggered, making the
check fail.

On Wed, Nov 26, 2014 at 4:27 PM, longjon notifications@github.com
wrote:

So, to be clear, what I meant to say in #1448
#1448 was, "figure out why this
check
exists, why you are running into it, and what should be done about it".
So
let's think about that...

Backpropagation to labels doesn't make sense, of course, because labels
lack topological, let alone differentiable structure. So why does the
propagate_down mechanism think we need to backprop the labels? Without
checking the code, I believe it's because the computation path for the
labels passes through layers with (free) parameters, so without any
extra
knowledge, the computation graph seems to require that backward step.
What
could be done about this?

Some extra mechanism could be added so that the "needs backward"
computation knows that loss layers are not differentiable wrt labels.
That
might ultimately be the right thing (what happens if there's a whole
chain
of layers between loss and labels?), but is probably too heavy for now.

The loss could just ignore propagate_down for labels (I guess that's
this PR as it stands). Seems to do the right thing, but are the diffs
guaranteed to be zero?

The loss could explicitly fill the diff with zeros. Seem to do the
right thing, but isn't this extra work?

And what about these errors catching mistakes where differentiable
things
are hooked up to the loss?

I'll let y'all mull that over for now.

—
Reply to this email directly or view it on GitHub
#1483 (comment).

—
Reply to this email directly or view it on GitHub
#1483 (comment).

bhack · 2014-12-22T07:48:56Z

@shelhamer This need an unified address by BVLC members. It is waiting this by 26 days.

mtamburrano · 2015-03-06T12:22:52Z

closed, new PR is #2052

Removed propagate labels check in loss layers

1f0bffd

This was referenced Nov 26, 2014

Filter layer #1482

Closed

Conditional layer #1448

Closed

mtamburrano closed this Mar 6, 2015

mtamburrano mentioned this pull request Mar 6, 2015

Removed propagate labels check in loss layers #2052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removed propagate labels check in loss layers #1483

Removed propagate labels check in loss layers #1483

Uh oh!

mtamburrano commented Nov 26, 2014

Uh oh!

jeffdonahue commented Nov 26, 2014

Uh oh!

mtamburrano commented Nov 26, 2014

Uh oh!

longjon commented Nov 26, 2014

Uh oh!

shelhamer commented Nov 26, 2014

Uh oh!

bhack commented Nov 26, 2014

Uh oh!

sguada commented Nov 26, 2014

Uh oh!

bhack commented Dec 22, 2014

Uh oh!

mtamburrano commented Mar 6, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Removed propagate labels check in loss layers #1483

Removed propagate labels check in loss layers #1483

Uh oh!

Conversation

mtamburrano commented Nov 26, 2014

Uh oh!

jeffdonahue commented Nov 26, 2014

Uh oh!

mtamburrano commented Nov 26, 2014

Uh oh!

longjon commented Nov 26, 2014

Uh oh!

shelhamer commented Nov 26, 2014

Uh oh!

bhack commented Nov 26, 2014

Uh oh!

sguada commented Nov 26, 2014

Uh oh!

bhack commented Dec 22, 2014

Uh oh!

mtamburrano commented Mar 6, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants