Skip to content

Small optimization for adam#12107

Closed
jma127 wants to merge 1 commit intopytorch:masterfrom
jma127:master
Closed

Small optimization for adam#12107
jma127 wants to merge 1 commit intopytorch:masterfrom
jma127:master

Conversation

@jma127
Copy link
Contributor

@jma127 jma127 commented Sep 26, 2018

Apply weight decay for Adam in-place instead of via copy.

Synced offline with @soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g.

d_p.add_(weight_decay, p.data)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ssnl
Copy link
Collaborator

ssnl commented Sep 27, 2018

well... I would certainly expect .grad to not change after optimizer step.

@jma127
Copy link
Contributor Author

jma127 commented Sep 27, 2018

Hmm, then the SGD implementation should be fixed to satisfy that invariant.

I'll leave it to you guys to determine whether or not this is a necessary invariant -- feel free to revert as you see fit.

@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants