Default MoE

This repository implements the Default MoE from the paper "Dense Backpropagation Improves Training for Sparse Mixture-of-Experts"

Code

This repository builds off the GPT-NeoX library, specifically a pull request implementing Dropless MoE. The main changes include:

Default MoE implementation with EMA update and filling in missing expert outputs.
New "default_vector" MoE config type that adds a buffer containing default expert outputs.
Additional arguments in MoE forward pass for computing default vector update.
Minor changes to integrate load balancing loss and first layer not using MoE

Installation

Install PyTorch, DeepSpeed, and the GPT-NeoX requirements:

pip install --no-cache-dir --no-build-isolation torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
pip install --no-cache-dir --no-build-isolation -r gpt-neox/requirements/requirements.txt
pip install --no-cache-dir --no-build-isolation deepspeed==0.14.4

Edit the index-url for PyTorch based on your GPU architecture.

The implementation we build off of uses MegaBlocks for dropless MoEs:

pip install --no-cache-dir --no-build-isolation megablocks==0.5.1
pip install --no-cache-dir --no-build-isolation grouped_gemm 0.1.6

Training

See an example config in configs/default-moe-2B.yml. You will need to edit the train data paths to match your own dataset.

Launch a distributed training run with the following command:

python deepy.py train.py configs/default-moe-2B.yml

Inference

See eval.py and generate.py from the original GPT-NeoX library for examples of evaluation and generation with trained model checkpoints.

Name		Name	Last commit message	Last commit date
Latest commit History 2,263 Commits
.github		.github
configs		configs
eval_tasks		eval_tasks
images		images
megatron		megatron
requirements		requirements
tests		tests
tools		tools
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README-MUP.md		README-MUP.md
README.md		README.md
deepy.py		deepy.py
docker-compose-dockerhub.yml		docker-compose-dockerhub.yml
docker-compose.yml		docker-compose.yml
eval.py		eval.py
generate.py		generate.py
prepare_data.py		prepare_data.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Default MoE

Code

Installation

Training

Inference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 96

Uh oh!

Languages

License

vatsal0/default-moe

Folders and files

Latest commit

History

Repository files navigation

Default MoE

Code

Installation

Training

Inference

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 96

Uh oh!

Languages

Packages