Fix conflict between Tutel and top-2 gate in MoE layer #2053

yetiansh · 2022-06-25T05:19:37Z

Previously, using both Tutel optimization and top-2 gating in MoE model training would fail. If we enable both Tutel and top-2, MoELayer would try to unpack top-2 gate's output at here, which would fail because top-2 gate does not produce these number of outputs.

Fix by checking the gate's type when constructing MoELayer.

awan-10 · 2022-06-29T17:53:22Z

Thank you for the PR @yetiansh :) It looks good to me. Alex had added tutel support so let me tag him and ask for a quick review.

@alexandremuzio - can you please review this real quick?

awan-10 · 2022-06-29T18:41:53Z

@yetiansh - can you please follow the guide here and update your PR? I see its failing in format checks.

https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md

alexandremuzio · 2022-06-30T00:01:43Z

Looks good to me. Thanks!

yetiansh · 2022-06-30T00:08:14Z

Thanks @alexandremuzio @awan-10. I've run the pre-commit and it looks like running format checking workflow needs your approval.

yetiansh · 2022-07-17T03:16:29Z

Hi, is this PR still active? @awan-10 @alexandremuzio

awan-10 · 2022-07-22T22:41:33Z

Sorry for the delay in getting back @yetiansh. I approved this PR so tests can run. Will merge it as soon as tests pass. Thank you!

jeffra · 2022-07-22T22:45:34Z

deepspeed/moe/sharded_moe.py

            logger.warning("Tutel optimization requested but not installed. "
                           "Proceeding without Tutel.")
+        elif use_tutel and TUTEL_INSTALLED and gate.k != 1:
+            logger.warning(


Can we wrap this in a if torch.distributed.get_rank() ==0:?

Yeah it is possible. But I wonder should we also wrap other warnings and infos? For example, L480 and L482-483?

fix conflit between tutel and top-2 gate

66bc759

yetiansh requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, awan-10, cli99, conglongli, duli2012, eltonzheng, jeffra, minjiaz, mrwyattii, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners June 25, 2022 05:19

lint

81ea83c

Merge branch 'master' into master

f44c12e

awan-10 requested a review from samadejacobs as a code owner July 22, 2022 22:40

awan-10 approved these changes Jul 22, 2022

View reviewed changes

Merge branch 'master' into master

d99f967

jeffra reviewed Jul 22, 2022

View reviewed changes

jeffra approved these changes Jul 22, 2022

View reviewed changes

Merge branch 'master' into master

0624fce

Merge branch 'master' into master

2f58da4

awan-10 enabled auto-merge (squash) July 26, 2022 18:05

awan-10 merged commit 31582d7 into deepspeedai:master Jul 26, 2022

xenshinu mentioned this pull request Jan 14, 2025

Update sharded_moe.py to support top2 gate with Tutel #6948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix conflict between Tutel and top-2 gate in MoE layer #2053

Fix conflict between Tutel and top-2 gate in MoE layer #2053

Uh oh!

yetiansh commented Jun 25, 2022

Uh oh!

awan-10 commented Jun 29, 2022

Uh oh!

awan-10 commented Jun 29, 2022

Uh oh!

alexandremuzio commented Jun 30, 2022

Uh oh!

yetiansh commented Jun 30, 2022

Uh oh!

yetiansh commented Jul 17, 2022

Uh oh!

awan-10 commented Jul 22, 2022

Uh oh!

jeffra Jul 22, 2022

Uh oh!

yetiansh Jul 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix conflict between Tutel and top-2 gate in MoE layer #2053

Fix conflict between Tutel and top-2 gate in MoE layer #2053

Uh oh!

Conversation

yetiansh commented Jun 25, 2022

Uh oh!

awan-10 commented Jun 29, 2022

Uh oh!

awan-10 commented Jun 29, 2022

Uh oh!

alexandremuzio commented Jun 30, 2022

Uh oh!

yetiansh commented Jun 30, 2022

Uh oh!

yetiansh commented Jul 17, 2022

Uh oh!

awan-10 commented Jul 22, 2022

Uh oh!

jeffra Jul 22, 2022

Choose a reason for hiding this comment

Uh oh!

yetiansh Jul 22, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants