-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Fix MVNLayer #2964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MVNLayer #2964
Conversation
|
There was another thing I changed in MVNLayer in 1979. I think in the case where the variance is exactly zero (such as image with a solid color, and across_channels=false), the floating point value for the variance wasn't always exactly zero; I got ever-so-slightly negative values sometimes presumably due to floating point resolution. When the negative variances then are square-rooted by the subsequent caffe_powx() call, it would produce NaNs. So I checked for and zeroed out any negative variances before the square root. I was thinking if we put that change in, perhaps I should write a test to try to reproduce that behaviour first. Would you want that in this PR too? Or should we keep this PR small and focused. [Edit] TYPED_TEST(MVNLayerTest, TestForward_ZeroVariance_NaN) {
typedef typename TypeParam::Dtype Dtype;
const int BLOB_SIZE = 100;
Blob<Dtype> bottom_blob(1,1,BLOB_SIZE, BLOB_SIZE);
std::vector<Blob<Dtype>*> bottom_vec;
bottom_vec.push_back(&bottom_blob);
FillerParameter filler_param;
filler_param.set_value(10.0);
ConstantFiller<Dtype> filler(filler_param);
filler.Fill(&bottom_blob);
LayerParameter layer_param;
MVNLayer<Dtype> layer(layer_param);
layer.SetUp(bottom_vec, this->blob_top_vec_);
layer.Forward(bottom_vec, this->blob_top_vec_);
int num = bottom_blob.num();
int channels = bottom_blob.channels();
int height = bottom_blob.height();
int width = bottom_blob.width();
for (int i = 0; i < num; ++i) {
for (int j = 0; j < channels; ++j) {
for (int k = 0; k < height; ++k) {
for (int l = 0; l < width; ++l) {
Dtype data = this->blob_top_->data_at(i, j, k, l);
// Check for NaN. NaN never equals itself.
ASSERT_EQ(data, data);
}
}
}
}
}Which only fails in the GPU case: If you set BLOB_SIZE to 10000 it will fail in on the CPU too but takes a long time to run. |
src/caffe/layers/mvn_layer.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use the mean_ and temp_ Blobs' diffs rather than their data here, as in the CPU implementation? If not, data should be used to save memory.
|
See above, otherwise LGTM -- thanks for separating out the bug fix @jyegerlehner. I think the fix for the NaN issue can be a separate PR. Switching to computing the variance by |
Fix the MVNLayer tests so they actually test what they claim. MVNLayer fixes: sum_multiplier_ sized correctly; backward gradient calculation. Gradient calculation per analysis of seanbell, found here: BVLC#1938 Fixes according to review comments.
3a8b7f4 to
1f3f952
Compare
OK
Yes but has disadvantage of requiring me to think. |
|
Thanks for the changes @jyegerlehner, LGTM. |
|
You're welcome @jeffdonahue. |
This PR will repair some broken MVNLayer behavior. This first commit just fixes the tests, so that they test what they purport to. On my machine I see 12 of 24 tests failing.
[Edit]
So fixing only the tests did show failing CPU tests on the build server. Looks like the the GPU tests didn't run. I'm guessing this is because the build server doesn't have a GPU to run them.
So next I'm committing the fixes. 1) shape sum_multiplier_ member blob correctly, depending on whether across_channels is true. 2) Fix the gradient calculation for the mean-only case. The correct calculation is from analysis by @seanbell in issue 1938.