Skip to content

Conversation

@samyam
Copy link
Contributor

@samyam samyam commented Aug 27, 2020

DeepSpeedConfig in DeepSpeed Checkpointing was missing mpu during in initialization resulting in buggy world_size, micro_batch. train_batch_size and gradient accumulation step calculations when used with model parallelism and activation checkpointing.

@jeffra
Copy link
Collaborator

jeffra commented Aug 27, 2020

Approved (after formatting fixes and tests pass)

@samyam samyam merged commit 458c0d9 into master Aug 31, 2020
@jeffra jeffra deleted the samyam-DeepSpeedConfig-initialization-fix branch November 19, 2020 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants