Specify default initialization schemes for modules in docs by vishwakftw · Pull Request #9038 · pytorch/pytorch

vishwakftw · 2018-06-29T17:42:34Z

This closes #6906 .

zou3519 · 2018-06-29T17:46:56Z

Please fix the lint check: https://travis-ci.org/pytorch/pytorch/builds/398354286?utm_source=github_status&utm_medium=notification

fmassa · 2018-06-29T17:54:27Z

This is great, thanks!
Maybe for a future commit, it would be nice to use the nn.init methods for initialization, inclusive in the documentation. I think it would be simpler for the users maybe

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vishwakftw · 2018-07-01T12:54:44Z

@fmassa Once this PR is merged, I will modify the initialization schemes in the modules to use nn.init. Hope that's fine.

soumith · 2018-07-01T13:35:23Z

@pytorchbot retest this please

ssnl · 2018-07-01T17:40:13Z

Would be great to have similar things for BatchNorm and InstanceNorm too :)

torch/nn/modules/batchnorm.py

    the mini-batches and :math:`\gamma` and :math:`\beta` are learnable parameter vectors
-    of size `C` (where `C` is the input size).
+    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are sampled
+    from :math:`\mathcal{U}(0, 1)` and the elements of :math:`\beta` are set to 0.


torch/nn/modules/instancenorm.py

        momentum: the value used for the running_mean and running_var computation. Default: 0.1
        affine: a boolean value that when set to ``True``, this module has
-            learnable affine parameters. Default: ``False``
+            learnable affine parameters, initialized the same way as done for batch normalization.


vishwakftw · 2018-07-02T19:10:53Z

@fmassa I have added the initialization from nn.init .

facebook-github-bot

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

torch/nn/modules/conv.py

            n *= k
        stdv = 1. / math.sqrt(n)
-        self.weight.data.uniform_(-stdv, stdv)
+        init.uniform_(self.weight, -stdv, stdv)


torch/nn/modules/linear.py

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
-        self.weight.data.uniform_(-stdv, stdv)
+        init.uniform_(self.weight, -stdv, stdv)


torch/nn/modules/linear.py

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
-        self.weight.data.uniform_(-stdv, stdv)
+        init.uniform_(self.weight, -stdv, stdv)


vishwakftw · 2018-07-03T17:29:08Z

Is this good to go?

fmassa · 2018-07-04T11:25:25Z

Hold on a bit, I'll get you the initialization schemes for you today, so that we can simplify things (forgot to do it yesterday)

fmassa

I've added the equivalent initialization methods (which rely on kaiming_uniform_ using fan_in).
Please double check and then make the changes so that we can finally remove the dependency on the hand-tuned (and potentially buggy) initializations.

torch/nn/modules/conv.py

            n *= k
        stdv = 1. / math.sqrt(n)
-        self.weight.data.uniform_(-stdv, stdv)
+        init.uniform_(self.weight, -stdv, stdv)


torch/nn/modules/linear.py

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
-        self.weight.data.uniform_(-stdv, stdv)
+        init.uniform_(self.weight, -stdv, stdv)


torch/nn/modules/linear.py

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
-        self.weight.data.uniform_(-stdv, stdv)
+        init.uniform_(self.weight, -stdv, stdv)


torch/nn/modules/conv.py

-            self.bias.data.uniform_(-stdv, stdv)
+            fan_in, _ = init._calculate_fan_in_fan_out(self.weight)
+            bound = 1 / math.sqrt(fan_in)
+            init.uniform_(self.bias, -bound, bound)


torch/nn/modules/linear.py

-            self.bias.data.uniform_(-stdv, stdv)
+            fan_in, _ = init._calculate_fan_in_fan_out(self.weight)
+            bound = 1 / math.sqrt(fan_in)
+            init.uniform_(self.bias, -bound, bound)


torch/nn/modules/linear.py

-        self.weight.data.uniform_(-stdv, stdv)
+        fan_in, _ = init._calculate_fan_in_fan_out(self.weight)
+        bound = 1 / math.sqrt(fan_in)
+        init.uniform_(self.weight, -bound, bound)


fmassa

There are a few things that still look a bit weird, but which might require doing backward-incompatible changes to modify, so I'm ok with how it looks now. Thanks!

soumith · 2018-07-04T16:22:51Z

@pytorchbot test this please

vishwakftw · 2018-07-04T16:27:44Z

I have one minor concern: would this make the initialization slower in the case of Conv and Linear, by having to the compute fan_in and fan_out twice?

fmassa · 2018-07-04T16:43:21Z

Computing fan_in and fan_out is practically a free operation, and only involves a few multiplications, so I wouldn't worry about it.

vishwakftw · 2018-07-07T20:42:24Z

Is this good to go?

ezyang · 2018-07-08T23:39:44Z

Sadly, it seems to be failing tests now.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vishwakftw · 2018-07-08T23:45:36Z

Oh, that was my bad. I have fixed them now.

vishwakftw · 2018-07-10T15:47:54Z

Is this good to go?

facebook-github-bot

@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519 · 2018-07-11T17:22:43Z

@vishwakftw should be good to go, it looks like @weiyangfb is working on merging it.

vishwakftw · 2018-07-17T16:03:51Z

@weiyangfb gentle reminder. Sorry.

vishwakftw · 2018-07-23T17:46:40Z

Is this good to go?

Summary: This closes #6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48

) Summary: This closes pytorch#6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48

Atcold · 2018-11-15T22:41:34Z

Why Kaiming over Xavier?

Specify default init schemes for modules

685b6da

vishwakftw requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners June 29, 2018 17:42

fix lint and typo

366bc50

soumith approved these changes Jun 29, 2018

View reviewed changes

facebook-github-bot reviewed Jun 29, 2018

View reviewed changes

vishwakftw commented Jul 1, 2018

View reviewed changes

Add default initialization using nn.init

c0a43ef

facebook-github-bot reviewed Jul 2, 2018

View reviewed changes

Fix nit

7de998a

fmassa reviewed Jul 2, 2018

View reviewed changes

fmassa requested changes Jul 4, 2018

View reviewed changes

Address comments, modify ConvNd and Linear inits

c28448d

vishwakftw commented Jul 4, 2018

View reviewed changes

torch/nn/modules/conv.py

self.bias.data.uniform_(-stdv, stdv)

fan_in, _ = init._calculate_fan_in_fan_out(self.weight)

bound = 1 / math.sqrt(fan_in)

init.uniform_(self.bias, -bound, bound)

This comment was marked as off-topic.

Sign in to view

vishwakftw commented Jul 4, 2018

View reviewed changes

torch/nn/modules/linear.py

self.bias.data.uniform_(-stdv, stdv)

fan_in, _ = init._calculate_fan_in_fan_out(self.weight)

bound = 1 / math.sqrt(fan_in)

init.uniform_(self.bias, -bound, bound)

This comment was marked as off-topic.

Sign in to view

vishwakftw commented Jul 4, 2018

View reviewed changes

Fix nit

1a5bf8e

fmassa approved these changes Jul 4, 2018

View reviewed changes

soumith approved these changes Jul 4, 2018

View reviewed changes

facebook-github-bot reviewed Jul 8, 2018

View reviewed changes

Fix typos

15dab68

facebook-github-bot reviewed Jul 10, 2018

View reviewed changes

vishwakftw closed this Jul 24, 2018

vishwakftw deleted the default-init-scheme-docs branch July 24, 2018 19:05

ssnl mentioned this pull request Aug 5, 2018

Initialization can be surprisingly important, and under-rated #10243

Closed

mratsim mentioned this pull request Dec 17, 2018

Kaiming init of conv and linear layers, why gain = sqrt(5) #15314

Closed

Kaixhin mentioned this pull request Mar 19, 2019

Update weight initialisations to current best practices #18182

Open

ezyang added the open source label Jun 24, 2019

msbaines mentioned this pull request Apr 10, 2021

_ConvNd weight initialization does not match docs #55741

Closed

soumith mentioned this pull request Apr 28, 2021

nn.Linear weight initalization - uniform or kaiming_uniform? #57109

Closed

Conversation

vishwakftw commented Jun 29, 2018

Uh oh!

zou3519 commented Jun 29, 2018

Uh oh!

fmassa commented Jun 29, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Jul 1, 2018

Uh oh!

soumith commented Jul 1, 2018

Uh oh!

ssnl commented Jul 1, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

vishwakftw commented Jul 2, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

vishwakftw commented Jul 3, 2018

Uh oh!

fmassa commented Jul 4, 2018

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

soumith commented Jul 4, 2018

Uh oh!

vishwakftw commented Jul 4, 2018

Uh oh!

fmassa commented Jul 4, 2018

Uh oh!

vishwakftw commented Jul 7, 2018

Uh oh!

ezyang commented Jul 8, 2018

Uh oh!

facebook-github-bot left a comment