Update deepspeed_checkpointing.py #336

samyam · 2020-08-27T22:28:39Z

DeepSpeedConfig in DeepSpeed Checkpointing was missing mpu during in initialization resulting in buggy world_size, micro_batch. train_batch_size and gradient accumulation step calculations when used with model parallelism and activation checkpointing.

jeffra · 2020-08-27T22:39:55Z

Approved (after formatting fixes and tests pass)

Update deepspeed_checkpointing.py

fef49f5

samyam added the bug Something isn't working label Aug 27, 2020

samyam requested a review from jeffra August 27, 2020 22:28

samyam requested review from RezaYazdaniAminabadi, ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, minjiaz, niumanar and tjruwase as code owners August 27, 2020 22:28

jeffra approved these changes Aug 27, 2020

View reviewed changes

tjruwase approved these changes Aug 27, 2020

View reviewed changes

jeffra added 2 commits August 29, 2020 06:15

formatting

8118587

Merge branch 'master' into samyam-DeepSpeedConfig-initialization-fix

fb6540d

samyam merged commit 458c0d9 into master Aug 31, 2020

jeffra deleted the samyam-DeepSpeedConfig-initialization-fix branch November 19, 2020 23:27

bobisapotato mentioned this pull request Jan 24, 2021

Another thing to merge. (MY EYES HURT) bobisai/DeepSpeed#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update deepspeed_checkpointing.py #336

Update deepspeed_checkpointing.py #336

Uh oh!

samyam commented Aug 27, 2020

Uh oh!

jeffra commented Aug 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update deepspeed_checkpointing.py #336

Update deepspeed_checkpointing.py #336

Uh oh!

Conversation

samyam commented Aug 27, 2020

Uh oh!

jeffra commented Aug 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants