Fix MVNLayer #2964

jyegerlehner · 2015-08-24T00:11:33Z

This PR will repair some broken MVNLayer behavior. This first commit just fixes the tests, so that they test what they purport to. On my machine I see 12 of 24 tests failing.

[Edit]
So fixing only the tests did show failing CPU tests on the build server. Looks like the the GPU tests didn't run. I'm guessing this is because the build server doesn't have a GPU to run them.

So next I'm committing the fixes. 1) shape sum_multiplier_ member blob correctly, depending on whether across_channels is true. 2) Fix the gradient calculation for the mean-only case. The correct calculation is from analysis by @seanbell in issue 1938.

jyegerlehner · 2015-08-24T01:53:25Z

There was another thing I changed in MVNLayer in 1979. I think in the case where the variance is exactly zero (such as image with a solid color, and across_channels=false), the floating point value for the variance wasn't always exactly zero; I got ever-so-slightly negative values sometimes presumably due to floating point resolution. When the negative variances then are square-rooted by the subsequent caffe_powx() call, it would produce NaNs. So I checked for and zeroed out any negative variances before the square root.

I was thinking if we put that change in, perhaps I should write a test to try to reproduce that behaviour first. Would you want that in this PR too? Or should we keep this PR small and focused.

[Edit]
OK I managed to produce a test that reproduces the problem. You can paste this into test_mvn_layer.cpp:

TYPED_TEST(MVNLayerTest, TestForward_ZeroVariance_NaN) {
  typedef typename TypeParam::Dtype Dtype;
  const int BLOB_SIZE = 100;
  Blob<Dtype> bottom_blob(1,1,BLOB_SIZE, BLOB_SIZE);
  std::vector<Blob<Dtype>*> bottom_vec;
  bottom_vec.push_back(&bottom_blob);

  FillerParameter filler_param;
  filler_param.set_value(10.0);
  ConstantFiller<Dtype> filler(filler_param);
  filler.Fill(&bottom_blob);

  LayerParameter layer_param;
  MVNLayer<Dtype> layer(layer_param);
  layer.SetUp(bottom_vec, this->blob_top_vec_);
  layer.Forward(bottom_vec, this->blob_top_vec_);

  int num = bottom_blob.num();
  int channels = bottom_blob.channels();
  int height = bottom_blob.height();
  int width = bottom_blob.width();

  for (int i = 0; i < num; ++i) {
    for (int j = 0; j < channels; ++j) {
      for (int k = 0; k < height; ++k) {
        for (int l = 0; l < width; ++l) {
          Dtype data = this->blob_top_->data_at(i, j, k, l);
          // Check for NaN. NaN never equals itself.
          ASSERT_EQ(data, data);
        }
      }
    }
  }
}

Which only fails in the GPU case:

Note: Google Test filter = *TestForward_ZeroVariance*
[==========] Running 4 tests from 4 test cases.
[----------] Global test environment set-up.
[----------] 1 test from MVNLayerTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] MVNLayerTest/0.TestForward_ZeroVariance_NaN
[       OK ] MVNLayerTest/0.TestForward_ZeroVariance_NaN (156 ms)
[----------] 1 test from MVNLayerTest/0 (156 ms total)

[----------] 1 test from MVNLayerTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] MVNLayerTest/1.TestForward_ZeroVariance_NaN
[       OK ] MVNLayerTest/1.TestForward_ZeroVariance_NaN (0 ms)
[----------] 1 test from MVNLayerTest/1 (0 ms total)

[----------] 1 test from MVNLayerTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] MVNLayerTest/2.TestForward_ZeroVariance_NaN
/home/jd/dev/caffe/inversemvn/src/caffe/test/test_mvn_layer.cpp:102: Failure
Value of: data
  Actual: nan
Expected: data
Which is: nan
[  FAILED  ] MVNLayerTest/2.TestForward_ZeroVariance_NaN, where TypeParam = caffe::GPUDevice<float> (6 ms)
[----------] 1 test from MVNLayerTest/2 (7 ms total)

[----------] 1 test from MVNLayerTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] MVNLayerTest/3.TestForward_ZeroVariance_NaN
[       OK ] MVNLayerTest/3.TestForward_ZeroVariance_NaN (2 ms)
[----------] 1 test from MVNLayerTest/3 (2 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 4 test cases ran. (166 ms total)
[  PASSED  ] 3 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MVNLayerTest/2.TestForward_ZeroVariance_NaN, where TypeParam = caffe::GPUDevice<float>

If you set BLOB_SIZE to 10000 it will fail in on the CPU too but takes a long time to run.

jeffdonahue · 2015-08-26T02:33:41Z

src/caffe/layers/mvn_layer.cu

Is there a reason to use the mean_ and temp_ Blobs' diffs rather than their data here, as in the CPU implementation? If not, data should be used to save memory.

jeffdonahue · 2015-08-26T02:48:25Z

See above, otherwise LGTM -- thanks for separating out the bug fix @jyegerlehner.

I think the fix for the NaN issue can be a separate PR. Switching to computing the variance by E((X - E(X))^2) (rather than E(X^2) - (E(X))^2) would probably fix that?

Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.

jyegerlehner · 2015-08-26T06:13:05Z

I think the fix for the NaN issue can be a separate PR.

OK

Switching to computing the variance by E((X - E(X))^2) (rather than E(X^2) - (E(X))^2) would probably fix that?

Yes but has disadvantage of requiring me to think.

jeffdonahue · 2015-08-26T18:42:34Z

Thanks for the changes @jyegerlehner, LGTM.

Fix MVNLayer

jyegerlehner · 2015-08-26T19:20:44Z

You're welcome @jeffdonahue.

jyegerlehner mentioned this pull request Aug 24, 2015

Bugfix for #2017 broken across_channels computation in MVNLayer #2958

Closed

jyegerlehner changed the title ~~Fix MVNLayer~~ [WIP] Fix MVNLayer Aug 24, 2015

longjon added the bug label Aug 24, 2015

jyegerlehner changed the title ~~[WIP] Fix MVNLayer~~ Fix MVNLayer Aug 24, 2015

jyegerlehner mentioned this pull request Aug 24, 2015

InverseMVNLayer #1979

Closed

jeffdonahue reviewed Aug 26, 2015
View reviewed changes

MVNLayer fixes.

1f3f952

Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.

jyegerlehner force-pushed the mvn-layer-fixes branch from 3a8b7f4 to 1f3f952 Compare August 26, 2015 06:12

jeffdonahue added a commit that referenced this pull request Aug 26, 2015

Merge pull request #2964 from jyegerlehner/mvn-layer-fixes

215bea0

Fix MVNLayer

jeffdonahue merged commit 215bea0 into BVLC:master Aug 26, 2015

jyegerlehner deleted the mvn-layer-fixes branch August 26, 2015 20:03

shelhamer mentioned this pull request Apr 13, 2017

MVNLayer across_channels is broken #2017

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix MVNLayer #2964

Fix MVNLayer #2964

Uh oh!

jyegerlehner commented Aug 24, 2015

Uh oh!

jyegerlehner commented Aug 24, 2015

Uh oh!

jeffdonahue Aug 26, 2015

Uh oh!

jeffdonahue commented Aug 26, 2015

Uh oh!

jyegerlehner commented Aug 26, 2015

Uh oh!

jeffdonahue commented Aug 26, 2015

Uh oh!

jyegerlehner commented Aug 26, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix MVNLayer #2964

Fix MVNLayer #2964

Uh oh!

Conversation

jyegerlehner commented Aug 24, 2015

Uh oh!

jyegerlehner commented Aug 24, 2015

Uh oh!

jeffdonahue Aug 26, 2015

Choose a reason for hiding this comment

Uh oh!

jeffdonahue commented Aug 26, 2015

Uh oh!

jyegerlehner commented Aug 26, 2015

Uh oh!

jeffdonahue commented Aug 26, 2015

Uh oh!

jyegerlehner commented Aug 26, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants