[Add Mixtral] Adds support for the Mixtral MoE#27942
Conversation
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
LysandreJik
left a comment
There was a problem hiding this comment.
Looks awesome! Thanks for the integration @ArthurZucker @younesbelkada, merge once you're ready and CI is green.
A few documentation things that we can improve on, but let's do that after it has landed in the lib.
| Tips: | ||
|
|
||
|
|
||
| - The model needs to be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py). |
There was a problem hiding this comment.
Maybe not relevant once the weights are pushed to the Hub
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Lysandre Debut <hi@lysand.re>
younesbelkada
left a comment
There was a problem hiding this comment.
A small nit on the conversion script!
|
Are there any other aux losses apart from the LM loss? |
|
The auxiliary loss can be computed with |
| @@ -1246,7 +1241,7 @@ def forward( | |||
|
|
|||
| aux_loss = None | |||
| if output_router_logits: | |||
There was a problem hiding this comment.
Setting output_router_logits = True should automatically add the aux_loss
* up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>
What does this PR do?
Adds the latest MoE model from mistral AI