Specify default initialization schemes for modules in docs#9038
Specify default initialization schemes for modules in docs#9038vishwakftw wants to merge 7 commits intopytorch:masterfrom
Conversation
|
Please fix the lint check: https://travis-ci.org/pytorch/pytorch/builds/398354286?utm_source=github_status&utm_medium=notification |
|
This is great, thanks! |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@fmassa Once this PR is merged, I will modify the initialization schemes in the modules to use |
|
@pytorchbot retest this please |
|
Would be great to have similar things for BatchNorm and InstanceNorm too :) |
| the mini-batches and :math:`\gamma` and :math:`\beta` are learnable parameter vectors | ||
| of size `C` (where `C` is the input size). | ||
| of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are sampled | ||
| from :math:`\mathcal{U}(0, 1)` and the elements of :math:`\beta` are set to 0. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| momentum: the value used for the running_mean and running_var computation. Default: 0.1 | ||
| affine: a boolean value that when set to ``True``, this module has | ||
| learnable affine parameters. Default: ``False`` | ||
| learnable affine parameters, initialized the same way as done for batch normalization. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@fmassa I have added the initialization from nn.init . |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
torch/nn/modules/conv.py
Outdated
| n *= k | ||
| stdv = 1. / math.sqrt(n) | ||
| self.weight.data.uniform_(-stdv, stdv) | ||
| init.uniform_(self.weight, -stdv, stdv) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/nn/modules/linear.py
Outdated
| def reset_parameters(self): | ||
| stdv = 1. / math.sqrt(self.weight.size(1)) | ||
| self.weight.data.uniform_(-stdv, stdv) | ||
| init.uniform_(self.weight, -stdv, stdv) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/nn/modules/linear.py
Outdated
| def reset_parameters(self): | ||
| stdv = 1. / math.sqrt(self.weight.size(1)) | ||
| self.weight.data.uniform_(-stdv, stdv) | ||
| init.uniform_(self.weight, -stdv, stdv) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Is this good to go? |
|
Hold on a bit, I'll get you the initialization schemes for you today, so that we can simplify things (forgot to do it yesterday) |
fmassa
left a comment
There was a problem hiding this comment.
I've added the equivalent initialization methods (which rely on kaiming_uniform_ using fan_in).
Please double check and then make the changes so that we can finally remove the dependency on the hand-tuned (and potentially buggy) initializations.
torch/nn/modules/conv.py
Outdated
| n *= k | ||
| stdv = 1. / math.sqrt(n) | ||
| self.weight.data.uniform_(-stdv, stdv) | ||
| init.uniform_(self.weight, -stdv, stdv) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/nn/modules/linear.py
Outdated
| def reset_parameters(self): | ||
| stdv = 1. / math.sqrt(self.weight.size(1)) | ||
| self.weight.data.uniform_(-stdv, stdv) | ||
| init.uniform_(self.weight, -stdv, stdv) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/nn/modules/linear.py
Outdated
| def reset_parameters(self): | ||
| stdv = 1. / math.sqrt(self.weight.size(1)) | ||
| self.weight.data.uniform_(-stdv, stdv) | ||
| init.uniform_(self.weight, -stdv, stdv) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| self.bias.data.uniform_(-stdv, stdv) | ||
| fan_in, _ = init._calculate_fan_in_fan_out(self.weight) | ||
| bound = 1 / math.sqrt(fan_in) | ||
| init.uniform_(self.bias, -bound, bound) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| self.bias.data.uniform_(-stdv, stdv) | ||
| fan_in, _ = init._calculate_fan_in_fan_out(self.weight) | ||
| bound = 1 / math.sqrt(fan_in) | ||
| init.uniform_(self.bias, -bound, bound) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| self.weight.data.uniform_(-stdv, stdv) | ||
| fan_in, _ = init._calculate_fan_in_fan_out(self.weight) | ||
| bound = 1 / math.sqrt(fan_in) | ||
| init.uniform_(self.weight, -bound, bound) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
fmassa
left a comment
There was a problem hiding this comment.
There are a few things that still look a bit weird, but which might require doing backward-incompatible changes to modify, so I'm ok with how it looks now. Thanks!
|
@pytorchbot test this please |
|
I have one minor concern: would this make the initialization slower in the case of |
|
Computing |
|
Is this good to go? |
|
Sadly, it seems to be failing tests now. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Oh, that was my bad. I have fixed them now. |
|
Is this good to go? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@vishwakftw should be good to go, it looks like @weiyangfb is working on merging it. |
|
@weiyangfb gentle reminder. Sorry. |
|
Is this good to go? |
Summary: This closes #6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48
) Summary: This closes pytorch#6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48
) Summary: This closes pytorch#6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48
|
Why Kaiming over Xavier? |
This closes #6906 .
cc: @ssnl @zou3519