-
Notifications
You must be signed in to change notification settings - Fork 18.6k
CuDNNConvolutionLayer accumulate gradients #3254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Good catch @ronghanghu -- this definitely looks wrong. I'll leave open for the moment since I haven't read this code before, but we should take care of this immediately. We should really have better testing for this situation... if this is bugged as it appears, this breaks weight sharing and gradient accumulation. |
|
Yep, this does look like a bug -- when I wrote the initial support on our fork and merged it across, it looks like this got dragged in (it's still present in our code) |
|
Good catch @ronghanghu and sorry for missing this is my review. While there
|
|
We need to adapt the gradient checker to gradient accumulation, in order to prevent future mistakes like this one and #2532. |
CuDNNConvolutionLayer accumulate gradients
|
👍 I agree that the gradient checker needs to be updated. A simple fixed was proposed earlier by @tnarihi : tnarihi@7d45526 See also #3036. |
After #1977, all layers should accumulate gradients for parameters. These two lines introduced in #3160 seems to be a bug.
@shelhamer @slayton58 can you take a look?