[feat] Add AdaScaleWrapper by min-xu-ai · Pull Request #347 · facebookresearch/fairscale

min-xu-ai · 2021-02-01T05:29:27Z

This enables a different API for wrapping an optimizer with AdaScale.
This also enables AdaScale to be wrapped by OSS.
However, OSS wrapping AdaScale results in different optimization,
which future research will be needed to study its effects.

testing: add unit tests.

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #302.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

- This enables a different API for wrapping an optimizer with AdaScale. - This also enables AdaScale to be wrapped by OSS. - However, OSS wrapping AdaScale results in different optimization, which future research will be needed to study its effects. testing: add unit tests.

min-xu-ai · 2021-02-01T21:32:22Z

@blefaudeux, what do you think about this? I will add a test with shard_ddp next if this looks OK. For the next time, small shard_ddp change might be needed to detect that AdaScale wrapping OSS is also allowed in shard_ddp.

min-xu-ai · 2021-02-02T19:54:28Z

ping reviewers

msbaines

Why is a separate wrapper class necessary? Why can't we just change AdaScale directly to work as a wrapper?

min-xu-ai · 2021-02-03T02:02:54Z

Why is a separate wrapper class necessary? Why can't we just change AdaScale directly to work as a wrapper?

Very good question. The current AdaScale API takes an instantiated optimizer object. That won't work with OSS, which expecting to wrap a optimizer that takes list of parameters. The wrapper allows both way of the initialization to be possible. This allow AdaScale to be wrapped by OSS. However, numerically and ML algorithm-wise, OSS wrap AdaScale is different from AdaScale's original idea, which mean that requires more research. But this wrapper allows such research to be done in the future. (i.e. study the effect of OSS wrapping AdaScale).

min-xu-ai · 2021-02-03T02:06:25Z

Why is a separate wrapper class necessary? Why can't we just change AdaScale directly to work as a wrapper?

I should add that AdaScale(OSS) already works. OSS(AdaScale) is what this is trying to address.

fairscale/optim/adascale.py

tests/optim/test_oss_adascale.py

fairscale/optim/adascale.py

mikerabbat · 2021-02-10T14:38:19Z

Why is a separate wrapper class necessary? Why can't we just change AdaScale directly to work as a wrapper?

I should add that AdaScale(OSS) already works. OSS(AdaScale) is what this is trying to address.

Just curious, if AdaScale(OSS) already works, do we also need to support OSS(AdaScale)?

min-xu-ai · 2021-02-10T17:36:56Z

Why is a separate wrapper class necessary? Why can't we just change AdaScale directly to work as a wrapper?

I should add that AdaScale(OSS) already works. OSS(AdaScale) is what this is trying to address.

Just curious, if AdaScale(OSS) already works, do we also need to support OSS(AdaScale)?

We don't have to. It is marked as experimental. In case we need to do some research in that direction in the future.

The wrapper is to allow some flexibility in terms of wrapping an underlying optimizer, just in case some trainer loop needs that form to be used. Does it sound OK to you?

mikerabbat · 2021-02-10T18:21:40Z

Yes, of course, sounds good! Just wasn't sure if there was already a compelling use case :-)

min-xu-ai · 2021-02-10T19:00:55Z

Yes, of course, sounds good! Just wasn't sure if there was already a compelling use case :-)
totally. a very good question! Thanks for reviewing.

* [chore] Fix lint errors that broke master (#348) authored-by: Anjali Sridhar <anj@devfair0443.h2.fair> * [fix] ShardedDDP - cpu testfix - remove Gloo/CPU (#350) * no idea about the root issue, but it proved to be fairly narrowed (gloo+cpu+python3.8+no cuda installed) so I guess that's out of scope for fairscale * [feat][OSS] elastic and pytorch compatible checkpoints (#310) * adding a test to prove the inter operability with upstream pytorch * updating the changelog * eager state pruning * pytorch 1.5 compat * [fix] ShardedDDP - properly handle post device change (#353) * adding the .to(device) support + unit testing * doc update * [feat] Add AdaScaleWrapper (#347) * [feat] Add AdaScaleWrapper - This enables a different API for wrapping an optimizer with AdaScale. - This also enables AdaScale to be wrapped by OSS. - However, OSS wrapping AdaScale results in different optimization, which future research will be needed to study its effects. testing: add unit tests. * addressed comment: typo * [refactor] Refactor and enable multiprocess nn.Pipe benchmarks. (#319) * mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair> * [refactor] pipe: simplify balance and module checks (#346) * [chore] v0.1.5 (#355) * [chore] disheartening switch off of a OSS cpu test (#356) * precise skip, only if agent has only cpu * [feat][minor] OSS Benchmark - regression test + background testing new optims (#352) * restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags * [refactor] multiprocess_pipe: cleanup __init__ (#357) * [refactor] multiprocess_pipe: remove retain_graph __init__ param (#358) It is not currently being used so we can simplify the interface by removing it. * [refactor] multiprocess_pipe: focus on LazyModule usage (#360) * [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) * Adding a proper ddp parity / AMP unit test, overdue * catch non-AMP pytorch * [perf][OSS] Clip grad norm : minor obvious speedup (#363) cache this iterator, easy speed up * [refactor] multiprocess_pipe: remove pipelined_backward (#362) * [perf] ShardedDDP - small memory use reduction - minor speedup (#366) * minor * minor * [fix] repro+fix (#365) fix a broken earlier commit, only worked for the first step * [refactor] OSS only use flat buffers (#371) * flat params all along, way simpler * updating the docstring * [refactor] AsyncPipe: do not sub-class MultiProcessPipe (#370) * [refactor] remove multiprocess dependency on async (#373) * [fix] Workaround need for pip --no-build-isolation (#375) * Add fairscale.nn.misc.checkpoint_activations (#376) * Add fairscale.utils.containers Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Add fairscale.nn.misc.checkpoint_activations Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * [chore] v0.1.6 (#377) * v0.1.6 Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com> Co-authored-by: Benjamin Lefaudeux <blefaudeux@users.noreply.github.com> Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair> Co-authored-by: msbaines <35972327+msbaines@users.noreply.github.com> Co-authored-by: Leonard Lausen <leonard@lausen.nl> Co-authored-by: Myle Ott <myleott@fb.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 1, 2021

min-xu-ai requested review from anj-s, blefaudeux, mikerabbat and msbaines February 1, 2021 05:29

Merge remote-tracking branch 'origin/master' into min/adaoss

8901f32

msbaines reviewed Feb 3, 2021

View reviewed changes

anj-s reviewed Feb 3, 2021

View reviewed changes

fairscale/optim/adascale.py Show resolved Hide resolved

anj-s reviewed Feb 3, 2021

View reviewed changes

tests/optim/test_oss_adascale.py Outdated Show resolved Hide resolved

anj-s reviewed Feb 3, 2021

View reviewed changes

fairscale/optim/adascale.py Show resolved Hide resolved

anj-s approved these changes Feb 3, 2021

View reviewed changes

Min Xu added 2 commits February 3, 2021 08:53

addressed comment: typo

fd1508e

Merge remote-tracking branch 'origin/master' into min/adaoss

46490f0

min-xu-ai merged commit a2408eb into master Feb 3, 2021

min-xu-ai deleted the min/adaoss branch February 19, 2021 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add AdaScaleWrapper#347

[feat] Add AdaScaleWrapper#347
min-xu-ai merged 4 commits intomasterfrom
min/adaoss

min-xu-ai commented Feb 1, 2021 •

edited

Loading

Uh oh!

min-xu-ai commented Feb 1, 2021

Uh oh!

min-xu-ai commented Feb 2, 2021

Uh oh!

msbaines left a comment

Uh oh!

min-xu-ai commented Feb 3, 2021

Uh oh!

min-xu-ai commented Feb 3, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikerabbat commented Feb 10, 2021

Uh oh!

min-xu-ai commented Feb 10, 2021

Uh oh!

mikerabbat commented Feb 10, 2021

Uh oh!

min-xu-ai commented Feb 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

min-xu-ai commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

What does this PR do?

PR review

Did you have fun?

Uh oh!

min-xu-ai commented Feb 1, 2021

Uh oh!

min-xu-ai commented Feb 2, 2021

Uh oh!

msbaines left a comment

Choose a reason for hiding this comment

Uh oh!

min-xu-ai commented Feb 3, 2021

Uh oh!

min-xu-ai commented Feb 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikerabbat commented Feb 10, 2021

Uh oh!

min-xu-ai commented Feb 10, 2021

Uh oh!

mikerabbat commented Feb 10, 2021

Uh oh!

min-xu-ai commented Feb 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

min-xu-ai commented Feb 1, 2021 •

edited

Loading

min-xu-ai commented Feb 3, 2021 •

edited

Loading