Pulling upstream #1

arashashari · 2020-09-02T22:54:14Z

No description provided.

* adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting

Updates for ZeRO stage 2 + ZeRO stage 1 w. RS Co-authored-by: Tunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Elton Zheng <eltonz@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: yuxionghe <yuxhe@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com>

* BERT title

* updates to support fp32 grad clipping and disable max_grad_norm

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

Contiguous Gradients should be set to false by default. Its not useful unless the model is very large

* add support for predivide as a flag * add predivide json config, remove allgather_disable (as it's not currently used anymore)

Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fix: typo in code docs * more pythonic code

* Transformer kernels release Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Elton Zheng <eltonz@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Tunji Ruwase <olruwase@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Elton Zheng <eltonz@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Tunji Ruwase <olruwase@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

* fix nv_peer_mem version in dockerfile * fix security issue, remove pillow dependency (this is only needed for cifar example which has its own requirements.txt)

mpu object is bound to the class instance.. the if statement uses `self.mpu' but just `mpu` is called in the following lines.. This raises a NameError

The parenthesis alter the evaluation of the assert() and it will always evaluate to True.

Add webinar on-demand links and update readme

* add fix and tests for get_lr from lr_scheduler before training starts

* update fan out flag for pdsh

…316)

* turn off multi-node launch if only 1 node

* Create CODEOWNERS

* Update deepspeed_checkpointing.py * formatting Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation * Gradient Accumulation support for Stage 2. Model tests added to test the feature * formatting * Update deepspeed_light.py removing comment * Update ds_config_func_bs8_zero1.json reverting this file back. Its not needed for this PR * defining baseline prefix Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…2_gas3.json

…0_gas3.json

Renaming config files to gas3

…ry. (#341)

* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com>

Remove llvm/cmake install for now, causing pyyaml issues

arashashari and others added 30 commits May 18, 2020 09:33

adding BingSqaud e2e test (#214)

c61e23b

* adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting

adds readthedocs badge (#218)

10a46a1

news edits (#219)

0c82483

News edits (#220)

4eade17

* BERT title

Adds links to new blog. (#221)

1230e31

Updates new live news links on deepspeed.ai (#222)

8a18e73

reduce size of megatron tests (#223)

53ac794

WIP tutorial warning (#224)

00183ed

Tense fix (#225)

e62afbc

fix redundant (#228)

b466e84

updating website packages (#231)

6a9d57f

Support fp32 grad clipping and fix max_grad_norm confusion (#232)

abe2204

* updates to support fp32 grad clipping and disable max_grad_norm

Default Contiguous Gradients False (#239)

01e848b

Contiguous Gradients should be set to false by default. Its not useful unless the model is very large

add support for predivide as a config option (#235)

bc36b91

* add support for predivide as a flag * add predivide json config, remove allgather_disable (as it's not currently used anymore)

if using lamb force legacy_fusion=True (#236)

a2b7552

remove redundant init code (#234)

6fe0edb

Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

fix: typo (#238)

b652395

* fix: typo in code docs * more pythonic code

remove old img

dddc526

update bert tutorial

6622de1

Image links (#243)

c7d0b0c

center images (#244)

0c77f87

update bert images (#245)

c5c9aaa

Point BERT pretraining tutorial to new perf tuning. (#246)

b1ddea7

adding overlap_comm documentation (#247)

e04e401

update tests

bbd8cd7

reduce memcpy between host and device (#248)

8353c59

Update paths; include information about submodule. (#250)

fd2a8fd

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

jeffra and others added 29 commits July 24, 2020 10:21

bump DSExamples (#300)

0f94f7e

DeepSpeed webinar announcement (#301)

7ae8f8b

Update README.md (#302)

67821f9

Fixing a typo (#303)

97c5427

Fix nv_peer_mem version (#304)

e50b883

* fix nv_peer_mem version in dockerfile * fix security issue, remove pillow dependency (this is only needed for cifar example which has its own requirements.txt)

NameError: name 'mpu' is not defined (#305)

9d07d75

mpu object is bound to the class instance.. the if statement uses `self.mpu' but just `mpu` is called in the following lines.. This raises a NameError

Removing () from assertion. (#307)

c35e944

The parenthesis alter the evaluation of the assert() and it will always evaluate to True.

Add webinar link (#309)

29c5fe2

Add webinar on-demand links and update readme

updates website gems after kramdown alert (#311)

903a41a

Fix+tests for get_lr from lr_scheduler before training starts (#310)

cd68e6e

* add fix and tests for get_lr from lr_scheduler before training starts

bumping DSE commit for pillow security fix (#312)

892ece6

Update deepspeed_lr_schedules.py (#314)

3437342

Update fan out flag for pdsh (#315)

6855ba1

* update fan out flag for pdsh

attach empty grad to its param to ensure it's copied after reduction (#…

e1bea67

…316)

bump DSE (#317)

de0523d

Turn off multi-node launch if only 1 node (#322)

e69b1ee

* turn off multi-node launch if only 1 node

Add code owners for DeepSpeed team (#335)

21d5f63

* Create CODEOWNERS

bump DSE

6823db3

Update deepspeed_checkpointing.py (#336)

458c0d9

* Update deepspeed_checkpointing.py * formatting Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Rename ds_config_func_bs8_zero2_gas10.json to ds_config_func_bs8_zero…

7a356b2

…2_gas3.json

Rename ds_config_func_bs8_zero0_gas10.json to ds_config_func_bs8_zero…

6122a74

…0_gas3.json

Update run_func_test.py

f4726b7

Renaming config files to gas3

Update .gitignore

e8dd47d

Switches BBS example to use mbsize=3 and gas=2 to fit in 16GB of memo…

838f53b

…ry. (#341)

Sparse attn + ops/runtime refactor + v0.3.0 (#343)

e5bbc2e

* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: Arash Ashari <arashari@microsoft.com>

Update Dockerfile

8716540

Update Dockerfile

5518aae

Remove llvm/cmake install for now, causing pyyaml issues

update DSE and rename SA tests

1661e83

arashashari merged commit a2984d0 into arashashari:master Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pulling upstream #1

Pulling upstream #1

Uh oh!

arashashari commented Sep 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Pulling upstream #1

Pulling upstream #1

Uh oh!

Conversation

arashashari commented Sep 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants