Skip to content

Conversation

@jeffra
Copy link
Collaborator

@jeffra jeffra commented Feb 8, 2020

No description provided.

@jeffra jeffra linked an issue Feb 8, 2020 that may be closed by this pull request
@jeffra jeffra requested a review from ShadenSmith February 9, 2020 05:57
@jeffra jeffra merged commit 20ff66a into master Feb 9, 2020
@jeffra jeffra deleted the jeffra/azure_updates branch February 9, 2020 06:00
[Megatron tutorial](tutorials/MegatronGPT2Tutorial.md) for more details.
* In order to fully train GPT2 with DeepSpeed and ZeRO we recommend using 8 instances of
Azure's Standard_ND40rs_v2 SKU for a total of 64 NVIDIA V100 GPUs. With this setup you
should be able to train 153.6 million samples in less than 2 weeks of training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its important to mention the batch size since performance depends so much on it. Maybe say with this setup you should be able to train 100K steps at batch size 1536 (153.6 million samples) in less than 2 weeks?

kouml pushed a commit to kouml/DeepSpeed that referenced this pull request Apr 3, 2020
jeffra pushed a commit that referenced this pull request May 19, 2020
rraminen pushed a commit to rraminen/DeepSpeed that referenced this pull request Nov 18, 2021
delock referenced this pull request in delock/DeepSpeedSYCLSupport Sep 21, 2022
Liangliang-Ma added a commit to Liangliang-Ma/DeepSpeed that referenced this pull request Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Installation documentation needed

4 participants