Skip to content

[WIP] Support weight decay in SGD optimizer when gradient is sparse#1305

Closed
ezyang wants to merge 7 commits intopytorch:masterfrom
ezyang:sparse-optimizers
Closed

[WIP] Support weight decay in SGD optimizer when gradient is sparse#1305
ezyang wants to merge 7 commits intopytorch:masterfrom
ezyang:sparse-optimizers

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Apr 19, 2017

Towards fixing #1285.

This is a WIP commit to get the ball rolling on code review
(I am sure I have done great violence to the various coding
standards of your project.)

Things to be done:

- Continue adding sparse support for other parameters
  and optimizers

- Add some more tests, including a unit test ensuring
  that a single step is what we expect

Signed-off-by: Edward Z. Yang ezyang@fb.com

CC @adamlerer @soumith

@apaszke
Copy link
Contributor

apaszke commented Apr 19, 2017

I wonder if it wouldn't be simpler and less magical to offer that deferred sparse weight decay as an alternative kind of an optimizer. The user would create the regular one with weight_decay=0 and then another one wd = optim.DeferredWD(model.parameters()). It seems more explicit, otherwise it might be easy to forget about flush() because right now the user doesn't really need to be aware which gradients are sparse.

@ezyang
Copy link
Contributor Author

ezyang commented Apr 19, 2017

That's OK by me. Would you suggest just erroring (with a suggestion) if they try to use weight decay while optimizing something sparse? (@adamlerer, what do you think?)

ezyang added 7 commits May 8, 2017 14:00
Towards fixing #1285.

This is a WIP commit to get the ball rolling on code review
(I am sure I have done great violence to the various coding
standards of your project.)

Things to be done:

    - Continue adding sparse support for other parameters
      and optimizers

    - Add some more tests, including a unit test ensuring
      that a single step is what we expect

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This is a work-in-progress commit for adding support for
sparse_select/sparse_copy, towards fixing #1462

Things to do:

1. Delete sparse_mask

2. sparse_copy implementation

3. CUDA versions

4. Tests

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang ezyang force-pushed the sparse-optimizers branch from 712b6dc to 4e800a8 Compare May 8, 2017 18:01
@ezyang ezyang closed this Jun 7, 2017
@ezyang ezyang deleted the sparse-optimizers branch September 7, 2017 20:23
@graytowne
Copy link

Hi @ezyang,
Can I know if the weight decay for sparse gradient is supported currently?
If not, is there any other alternated ways to perform it?

I'm working on recommendation, and the weight decay for sparse embeddings is crucial as models are easily get overfitted.

@ezyang
Copy link
Contributor Author

ezyang commented May 3, 2018

No, it's not been merged yet. But it shouldn't be too hard to adapt this patch set to get it going for what you need. Give it a shot maybe?

@graytowne
Copy link

OK, I will take a try, thanks so much! One simple question is: Here why it needs to call flush() every time? I think the dense tensor doesn't need this.

@csarofeen csarofeen mentioned this pull request Dec 14, 2021
eqy pushed a commit to eqy/pytorch that referenced this pull request Jan 20, 2022
Once a pair of domains is determined to be invalid to map, keep that
information during the traversal in ComputeAtRootDomainMapBuilder. This
is to avoid indirectly cause invalid mappings. See issue pytorch#1305 for an
example.

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
hubertlu-tw pushed a commit to hubertlu-tw/pytorch that referenced this pull request Nov 1, 2022
scotts added a commit to scotts/pytorch that referenced this pull request Mar 18, 2026
New commits included:

- 0c52fa6 Fix compilation of XPU part of kineto (pytorch#1292)
- f882254 Add MTIA_COUNTERS ActivityType and counter event output support (pytorch#1303)
- c6c84d0 Use whole data from PTI activity record (pytorch#1278)
- bb1e194 Add additionalLoggerCollector mechanism to ActivityProfilerController (pytorch#1290)
- 041c3e1 Disable test_record_function_fast (pytorch#1309)
- e8956c4 Integrate PyTorch's disabled tests mechanism into CI (pytorch#1311)
- 7d860f2 Fix unit test (pytorch#1305)
- 058386f Add Mac CPU workflow (pytorch#1304)
- c12ddc2 refactor CuptiCbidRegistry member function names (pytorch#1301)
- 3b5cdca Add comms Id to trace output JSON (pytorch#1300)

Authored with Claude.
pytorchmergebot pushed a commit that referenced this pull request Mar 19, 2026
New commits included:

- 0c52fa6 Fix compilation of XPU part of kineto (#1292)
- f882254 Add MTIA_COUNTERS ActivityType and counter event output support (#1303)
- c6c84d0 Use whole data from PTI activity record (#1278)
- bb1e194 Add additionalLoggerCollector mechanism to ActivityProfilerController (#1290)
- 041c3e1 Disable test_record_function_fast (#1309)
- e8956c4 Integrate PyTorch's disabled tests mechanism into CI (#1311)
- 7d860f2 Fix unit test (#1305)
- 058386f Add Mac CPU workflow (#1304)
- c12ddc2 refactor CuptiCbidRegistry member function names (#1301)
- 3b5cdca Add comms Id to trace output JSON (#1300)

Authored with Claude.
Pull Request resolved: #177753
Approved by: https://github.com/ryanzhang22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants