Skip to content

vatsal0/default-moe

Repository files navigation

Default MoE

This repository implements the Default MoE from the paper "Dense Backpropagation Improves Training for Sparse Mixture-of-Experts"

Code

This repository builds off the GPT-NeoX library, specifically a pull request implementing Dropless MoE. The main changes include:

  • Default MoE implementation with EMA update and filling in missing expert outputs.
  • New "default_vector" MoE config type that adds a buffer containing default expert outputs.
  • Additional arguments in MoE forward pass for computing default vector update.
  • Minor changes to integrate load balancing loss and first layer not using MoE

Installation

Install PyTorch, DeepSpeed, and the GPT-NeoX requirements:

pip install --no-cache-dir --no-build-isolation torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
pip install --no-cache-dir --no-build-isolation -r gpt-neox/requirements/requirements.txt
pip install --no-cache-dir --no-build-isolation deepspeed==0.14.4

Edit the index-url for PyTorch based on your GPU architecture.

The implementation we build off of uses MegaBlocks for dropless MoEs:

pip install --no-cache-dir --no-build-isolation megablocks==0.5.1
pip install --no-cache-dir --no-build-isolation grouped_gemm 0.1.6

Training

See an example config in configs/default-moe-2B.yml. You will need to edit the train data paths to match your own dataset.

Launch a distributed training run with the following command:

python deepy.py train.py configs/default-moe-2B.yml

Inference

See eval.py and generate.py from the original GPT-NeoX library for examples of evaluation and generation with trained model checkpoints.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 96