Skip to content

Conversation

@jyegerlehner
Copy link
Contributor

This PR will repair some broken MVNLayer behavior. This first commit just fixes the tests, so that they test what they purport to. On my machine I see 12 of 24 tests failing.

[Edit]
So fixing only the tests did show failing CPU tests on the build server. Looks like the the GPU tests didn't run. I'm guessing this is because the build server doesn't have a GPU to run them.

So next I'm committing the fixes. 1) shape sum_multiplier_ member blob correctly, depending on whether across_channels is true. 2) Fix the gradient calculation for the mean-only case. The correct calculation is from analysis by @seanbell in issue 1938.

@jyegerlehner jyegerlehner changed the title Fix MVNLayer [WIP] Fix MVNLayer Aug 24, 2015
@longjon longjon added the bug label Aug 24, 2015
@jyegerlehner
Copy link
Contributor Author

There was another thing I changed in MVNLayer in 1979. I think in the case where the variance is exactly zero (such as image with a solid color, and across_channels=false), the floating point value for the variance wasn't always exactly zero; I got ever-so-slightly negative values sometimes presumably due to floating point resolution. When the negative variances then are square-rooted by the subsequent caffe_powx() call, it would produce NaNs. So I checked for and zeroed out any negative variances before the square root.

I was thinking if we put that change in, perhaps I should write a test to try to reproduce that behaviour first. Would you want that in this PR too? Or should we keep this PR small and focused.

[Edit]
OK I managed to produce a test that reproduces the problem. You can paste this into test_mvn_layer.cpp:

TYPED_TEST(MVNLayerTest, TestForward_ZeroVariance_NaN) {
  typedef typename TypeParam::Dtype Dtype;
  const int BLOB_SIZE = 100;
  Blob<Dtype> bottom_blob(1,1,BLOB_SIZE, BLOB_SIZE);
  std::vector<Blob<Dtype>*> bottom_vec;
  bottom_vec.push_back(&bottom_blob);

  FillerParameter filler_param;
  filler_param.set_value(10.0);
  ConstantFiller<Dtype> filler(filler_param);
  filler.Fill(&bottom_blob);

  LayerParameter layer_param;
  MVNLayer<Dtype> layer(layer_param);
  layer.SetUp(bottom_vec, this->blob_top_vec_);
  layer.Forward(bottom_vec, this->blob_top_vec_);

  int num = bottom_blob.num();
  int channels = bottom_blob.channels();
  int height = bottom_blob.height();
  int width = bottom_blob.width();

  for (int i = 0; i < num; ++i) {
    for (int j = 0; j < channels; ++j) {
      for (int k = 0; k < height; ++k) {
        for (int l = 0; l < width; ++l) {
          Dtype data = this->blob_top_->data_at(i, j, k, l);
          // Check for NaN. NaN never equals itself.
          ASSERT_EQ(data, data);
        }
      }
    }
  }
}

Which only fails in the GPU case:

Note: Google Test filter = *TestForward_ZeroVariance*
[==========] Running 4 tests from 4 test cases.
[----------] Global test environment set-up.
[----------] 1 test from MVNLayerTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] MVNLayerTest/0.TestForward_ZeroVariance_NaN
[       OK ] MVNLayerTest/0.TestForward_ZeroVariance_NaN (156 ms)
[----------] 1 test from MVNLayerTest/0 (156 ms total)

[----------] 1 test from MVNLayerTest/1, where TypeParam = caffe::CPUDevice<double>
[ RUN      ] MVNLayerTest/1.TestForward_ZeroVariance_NaN
[       OK ] MVNLayerTest/1.TestForward_ZeroVariance_NaN (0 ms)
[----------] 1 test from MVNLayerTest/1 (0 ms total)

[----------] 1 test from MVNLayerTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] MVNLayerTest/2.TestForward_ZeroVariance_NaN
/home/jd/dev/caffe/inversemvn/src/caffe/test/test_mvn_layer.cpp:102: Failure
Value of: data
  Actual: nan
Expected: data
Which is: nan
[  FAILED  ] MVNLayerTest/2.TestForward_ZeroVariance_NaN, where TypeParam = caffe::GPUDevice<float> (6 ms)
[----------] 1 test from MVNLayerTest/2 (7 ms total)

[----------] 1 test from MVNLayerTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN      ] MVNLayerTest/3.TestForward_ZeroVariance_NaN
[       OK ] MVNLayerTest/3.TestForward_ZeroVariance_NaN (2 ms)
[----------] 1 test from MVNLayerTest/3 (2 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 4 test cases ran. (166 ms total)
[  PASSED  ] 3 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MVNLayerTest/2.TestForward_ZeroVariance_NaN, where TypeParam = caffe::GPUDevice<float>

If you set BLOB_SIZE to 10000 it will fail in on the CPU too but takes a long time to run.

@jyegerlehner jyegerlehner changed the title [WIP] Fix MVNLayer Fix MVNLayer Aug 24, 2015
@jyegerlehner jyegerlehner mentioned this pull request Aug 24, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use the mean_ and temp_ Blobs' diffs rather than their data here, as in the CPU implementation? If not, data should be used to save memory.

@jeffdonahue
Copy link
Contributor

See above, otherwise LGTM -- thanks for separating out the bug fix @jyegerlehner.

I think the fix for the NaN issue can be a separate PR. Switching to computing the variance by E((X - E(X))^2) (rather than E(X^2) - (E(X))^2) would probably fix that?

Fix the MVNLayer tests so they actually test what they claim.

MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation.

Gradient calculation per analysis of seanbell, found here:
BVLC#1938

Fixes according to review comments.
@jyegerlehner
Copy link
Contributor Author

I think the fix for the NaN issue can be a separate PR.

OK

Switching to computing the variance by E((X - E(X))^2) (rather than E(X^2) - (E(X))^2) would probably fix that?

Yes but has disadvantage of requiring me to think.

@jeffdonahue
Copy link
Contributor

Thanks for the changes @jyegerlehner, LGTM.

jeffdonahue added a commit that referenced this pull request Aug 26, 2015
@jeffdonahue jeffdonahue merged commit 215bea0 into BVLC:master Aug 26, 2015
@jyegerlehner
Copy link
Contributor Author

You're welcome @jeffdonahue.

@jyegerlehner jyegerlehner deleted the mvn-layer-fixes branch August 26, 2015 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants