-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When I try to run the Megatron deepspeed example(https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples) with latest master branch of deepspeed, I met following error:

To Reproduce
Steps to reproduce the behavior:
- Go to 'Megatron Example'
- Install latest DeepSpeed(0.7.0+9dcfb93a) & Megatron(0.5.1)
- Prepare the data:
- bash dataset/download_books.sh
- bash dataset/download_vocab.sh
- Start the training:
bash examples/azure/run-benchmark-model.sh
The env I used is:
1 node/ 8GPUs, A100.
Expected behavior
A clear and concise description of what you expected to happen.
ds_report output
Please run ds_report to give us details about your setup.
System info (please complete the following information):
- ubuntu2004-cu115-py38-torch1110
- 1 machines with x8 A100s each
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working

