[WIP] Support weight decay in SGD optimizer when gradient is sparse#1305
[WIP] Support weight decay in SGD optimizer when gradient is sparse#1305ezyang wants to merge 7 commits intopytorch:masterfrom
Conversation
|
I wonder if it wouldn't be simpler and less magical to offer that deferred sparse weight decay as an alternative kind of an optimizer. The user would create the regular one with weight_decay=0 and then another one |
|
That's OK by me. Would you suggest just erroring (with a suggestion) if they try to use weight decay while optimizing something sparse? (@adamlerer, what do you think?) |
Towards fixing #1285.
This is a WIP commit to get the ball rolling on code review
(I am sure I have done great violence to the various coding
standards of your project.)
Things to be done:
- Continue adding sparse support for other parameters
and optimizers
- Add some more tests, including a unit test ensuring
that a single step is what we expect
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This is a work-in-progress commit for adding support for sparse_select/sparse_copy, towards fixing #1462 Things to do: 1. Delete sparse_mask 2. sparse_copy implementation 3. CUDA versions 4. Tests Signed-off-by: Edward Z. Yang <ezyang@fb.com>
|
Hi @ezyang, I'm working on recommendation, and the weight decay for sparse embeddings is crucial as models are easily get overfitted. |
|
No, it's not been merged yet. But it shouldn't be too hard to adapt this patch set to get it going for what you need. Give it a shot maybe? |
|
OK, I will take a try, thanks so much! One simple question is: Here why it needs to call flush() every time? I think the dense tensor doesn't need this. |
Once a pair of domains is determined to be invalid to map, keep that information during the traversal in ComputeAtRootDomainMapBuilder. This is to avoid indirectly cause invalid mappings. See issue pytorch#1305 for an example. Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
New commits included: - 0c52fa6 Fix compilation of XPU part of kineto (pytorch#1292) - f882254 Add MTIA_COUNTERS ActivityType and counter event output support (pytorch#1303) - c6c84d0 Use whole data from PTI activity record (pytorch#1278) - bb1e194 Add additionalLoggerCollector mechanism to ActivityProfilerController (pytorch#1290) - 041c3e1 Disable test_record_function_fast (pytorch#1309) - e8956c4 Integrate PyTorch's disabled tests mechanism into CI (pytorch#1311) - 7d860f2 Fix unit test (pytorch#1305) - 058386f Add Mac CPU workflow (pytorch#1304) - c12ddc2 refactor CuptiCbidRegistry member function names (pytorch#1301) - 3b5cdca Add comms Id to trace output JSON (pytorch#1300) Authored with Claude.
New commits included: - 0c52fa6 Fix compilation of XPU part of kineto (#1292) - f882254 Add MTIA_COUNTERS ActivityType and counter event output support (#1303) - c6c84d0 Use whole data from PTI activity record (#1278) - bb1e194 Add additionalLoggerCollector mechanism to ActivityProfilerController (#1290) - 041c3e1 Disable test_record_function_fast (#1309) - e8956c4 Integrate PyTorch's disabled tests mechanism into CI (#1311) - 7d860f2 Fix unit test (#1305) - 058386f Add Mac CPU workflow (#1304) - c12ddc2 refactor CuptiCbidRegistry member function names (#1301) - 3b5cdca Add comms Id to trace output JSON (#1300) Authored with Claude. Pull Request resolved: #177753 Approved by: https://github.com/ryanzhang22
Towards fixing #1285.
This is a WIP commit to get the ball rolling on code review
(I am sure I have done great violence to the various coding
standards of your project.)
Things to be done:
Signed-off-by: Edward Z. Yang ezyang@fb.com
CC @adamlerer @soumith