Skip to content

Conversation

@siddharth9820
Copy link

This PR adds zero-offload support to Megatron-Deepspeed.
Below I compare the loss curves for ZeRO-stage 0 and ZeRO-stage2+cpu offload using DeepspeedCPUAdam optimizer.

Base Model - 1.3B
Number of Experts - 8
Batch Size - 256
Machine - Azure A100 40GB
Number of GPUs - 8
Dataset - BookCorpus

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants