[WIP] Support weight decay in SGD optimizer when gradient is sparse by ezyang · Pull Request #1305 · pytorch/pytorch

ezyang · 2017-04-19T21:00:32Z

Towards fixing #1285.

This is a WIP commit to get the ball rolling on code review
(I am sure I have done great violence to the various coding
standards of your project.)

Things to be done:

- Continue adding sparse support for other parameters
  and optimizers

- Add some more tests, including a unit test ensuring
  that a single step is what we expect

Signed-off-by: Edward Z. Yang ezyang@fb.com

CC @adamlerer @soumith

apaszke · 2017-04-19T21:07:38Z

I wonder if it wouldn't be simpler and less magical to offer that deferred sparse weight decay as an alternative kind of an optimizer. The user would create the regular one with weight_decay=0 and then another one wd = optim.DeferredWD(model.parameters()). It seems more explicit, otherwise it might be easy to forget about flush() because right now the user doesn't really need to be aware which gradients are sparse.

ezyang · 2017-04-19T21:18:44Z

That's OK by me. Would you suggest just erroring (with a suggestion) if they try to use weight decay while optimizing something sparse? (@adamlerer, what do you think?)

Towards fixing #1285. This is a WIP commit to get the ball rolling on code review (I am sure I have done great violence to the various coding standards of your project.) Things to be done: - Continue adding sparse support for other parameters and optimizers - Add some more tests, including a unit test ensuring that a single step is what we expect Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

This is a work-in-progress commit for adding support for sparse_select/sparse_copy, towards fixing #1462 Things to do: 1. Delete sparse_mask 2. sparse_copy implementation 3. CUDA versions 4. Tests Signed-off-by: Edward Z. Yang <ezyang@fb.com>

graytowne · 2018-05-02T05:56:08Z

Hi @ezyang,
Can I know if the weight decay for sparse gradient is supported currently?
If not, is there any other alternated ways to perform it?

I'm working on recommendation, and the weight decay for sparse embeddings is crucial as models are easily get overfitted.

ezyang · 2018-05-03T01:37:39Z

No, it's not been merged yet. But it shouldn't be too hard to adapt this patch set to get it going for what you need. Give it a shot maybe?

graytowne · 2018-05-03T02:25:20Z

OK, I will take a try, thanks so much! One simple question is: Here why it needs to call flush() every time? I think the dense tensor doesn't need this.

Once a pair of domains is determined to be invalid to map, keep that information during the traversal in ComputeAtRootDomainMapBuilder. This is to avoid indirectly cause invalid mappings. See issue pytorch#1305 for an example. Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>

New commits included: - 0c52fa6 Fix compilation of XPU part of kineto (pytorch#1292) - f882254 Add MTIA_COUNTERS ActivityType and counter event output support (pytorch#1303) - c6c84d0 Use whole data from PTI activity record (pytorch#1278) - bb1e194 Add additionalLoggerCollector mechanism to ActivityProfilerController (pytorch#1290) - 041c3e1 Disable test_record_function_fast (pytorch#1309) - e8956c4 Integrate PyTorch's disabled tests mechanism into CI (pytorch#1311) - 7d860f2 Fix unit test (pytorch#1305) - 058386f Add Mac CPU workflow (pytorch#1304) - c12ddc2 refactor CuptiCbidRegistry member function names (pytorch#1301) - 3b5cdca Add comms Id to trace output JSON (pytorch#1300) Authored with Claude.

New commits included: - 0c52fa6 Fix compilation of XPU part of kineto (#1292) - f882254 Add MTIA_COUNTERS ActivityType and counter event output support (#1303) - c6c84d0 Use whole data from PTI activity record (#1278) - bb1e194 Add additionalLoggerCollector mechanism to ActivityProfilerController (#1290) - 041c3e1 Disable test_record_function_fast (#1309) - e8956c4 Integrate PyTorch's disabled tests mechanism into CI (#1311) - 7d860f2 Fix unit test (#1305) - 058386f Add Mac CPU workflow (#1304) - c12ddc2 refactor CuptiCbidRegistry member function names (#1301) - 3b5cdca Add comms Id to trace output JSON (#1300) Authored with Claude. Pull Request resolved: #177753 Approved by: https://github.com/ryanzhang22

ezyang added 7 commits May 8, 2017 14:00

Beginning of Adam support

16d27ab

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

More sparse tests (failing)

9687103

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Sparse optim docs.

dbf4780

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some small docs for function_tests in autograd.

38f9d7f

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

wip

f60410d

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[WIP] sparse_select

4e800a8

This is a work-in-progress commit for adding support for sparse_select/sparse_copy, towards fixing #1462 Things to do: 1. Delete sparse_mask 2. sparse_copy implementation 3. CUDA versions 4. Tests Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang force-pushed the sparse-optimizers branch from 712b6dc to 4e800a8 Compare May 8, 2017 18:01

ezyang closed this Jun 7, 2017

ezyang deleted the sparse-optimizers branch September 7, 2017 20:23

rrkarim mentioned this pull request Nov 25, 2019

There is no support for weight_decay/momentum in SGD for sparse tensors. #30402

Open

csarofeen mentioned this pull request Dec 14, 2021

Issue 1305 #69910

Closed

hubertlu-tw pushed a commit to hubertlu-tw/pytorch that referenced this pull request Nov 1, 2022

skip FastLayerNorm (pytorch#1305)

4506a68

ivegner mentioned this pull request Oct 27, 2025

SGD fails on sparse matrix #29814

Open

scotts mentioned this pull request Mar 18, 2026

Update kineto submodule to 0c52fa62fe31 #177753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support weight decay in SGD optimizer when gradient is sparse#1305

[WIP] Support weight decay in SGD optimizer when gradient is sparse#1305
ezyang wants to merge 7 commits intopytorch:masterfrom
ezyang:sparse-optimizers

ezyang commented Apr 19, 2017

Uh oh!

apaszke commented Apr 19, 2017

Uh oh!

ezyang commented Apr 19, 2017

Uh oh!

graytowne commented May 2, 2018

Uh oh!

ezyang commented May 3, 2018

Uh oh!

graytowne commented May 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ezyang commented Apr 19, 2017

Uh oh!

apaszke commented Apr 19, 2017

Uh oh!

ezyang commented Apr 19, 2017

Uh oh!

graytowne commented May 2, 2018

Uh oh!

ezyang commented May 3, 2018

Uh oh!

graytowne commented May 3, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants