Fix MoE for V5 by 3outeille · Pull Request #42456 · huggingface/transformers

3outeille · 2025-11-27T15:10:16Z

No description provided.

3outeille · 2025-11-27T15:12:01Z

run-slow: flex_olmo, gpt_oss, minimax, mixtral, olmoe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl_moe

molbap · 2025-11-27T15:20:16Z

run-slow: flex_olmo, gpt_oss, minimax, mixtral, olmoe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl_moe

github-actions · 2025-11-27T15:21:21Z

This comment contains run-slow, running the specified jobs:

models: ["models/flex_olmo", "models/gpt_oss", "models/minimax", "models/mixtral", "models/olmoe", "models/qwen2_moe", "models/qwen3_moe", "models/qwen3_next", "models/qwen3_omni_moe", "models/qwen3_vl_moe"]
quantizations: []

HuggingFaceDocBuilderDev · 2025-11-27T15:21:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

into fix-moe-v5

3outeille · 2025-11-27T18:06:08Z

run-slow: flex_olmo, gpt_oss, minimax, mixtral, olmoe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl_moe

into fix-moe-v5

github-actions · 2025-11-27T21:41:28Z

CI Results

Workflow Run ⚙️

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2025-11-27T21:41:55Z

💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

3outeille · 2025-11-28T08:43:10Z

run-slow: flex_olmo, gpt_oss, minimax, mixtral, olmoe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl_moe

github-actions · 2025-11-28T08:44:22Z

This comment contains run-slow, running the specified jobs:

models: ["models/flex_olmo", "models/gpt_oss", "models/minimax", "models/mixtral", "models/olmoe", "models/qwen2_moe", "models/qwen3_moe", "models/qwen3_next", "models/qwen3_omni_moe", "models/qwen3_vl_moe"]
quantizations: []

* fixes missed * gemma test fix * refactor * rm legacy from llama * added renaming * add _model * update legacy * update legacy * fix docstring * always load blank, then set _tokenizer if we have it * new toks * update all berttokenizer based models * apply feedback - delete bert duplicates * more models --> fast only * more convert_slow models * fix common test refs * updating fast only tokenizers * openai and pegasus * enable sentencepiecebackend * more models * code gen * t5 * code gen tests * speecht5 * mbart * mbart50 * more models * more models * layouglmv2 * update tests * update tests * update tests * pretrainedtokenizer * whisper * whisper * layoutxlm and storing backends * refactor sentencepiecebackend and additional_special_tokens * renaming tokenization_utils --> tokenization_python * udpate tests * bert test * blenderbot * clip * codegen * code_llama * cohere * deberata, deberat v2, funnel * gpt2 * batch update tests * pegasus qwen2 roberta * more models * layout tests * some renaming * fix references to utils_fast * fix refs * fix refs * fix refs * fix refs * fix refs * fix refs * fix refs * fix some tests * regression * fix refs * fix refs * missed the most crucial file in my last commit * fix refs * fix refs * fix refs * batch encode fix * fix some tests * BC for batch_decode bc too many refs * more tests * fix more tests * fix for processors * fixing more models * deleted mbart50 by accident * seamless m4t * albert fix * whisper * layout3 * attempt to fix cached tokenizers on CI * trying another fix on CI * again try to work around CI * bertweet * tapas * mbart50 * luke * mluke * markuplm * markuplm * fix some more auto tests * some random model failures * mistralcommontestser * more fixes * ref fix * siglip * marian * plbart * update utils toks * seamless m4t * roc bert * udpate byt5 test * xlm * esm * roformer * code llama * biogpt * m2m100 * dpr and flaubert * xlm and speech to text * tok backend pass object * tokenizer object pass * wav2vec2 * wav2vec2 * cpmant * update utils tokenizers * cpmant * bartpho * test apply chat template assistant mask * apply chat template video * apply chat template assistant mask * test torch * update from slow in base and fix donut processor errors * auto to point to tokenizers backend, fix kosmos2 * some non model fixes for old slow models that no longer have their own tokenizer file as they are the same as bert * missed file from last commit * idefics2 * fixup * fixup * pretrained tokenizer fast test update * stash * bad merged * cherry pick more stuff that did not merge well * fix gptsw3 * nit warn for now * update error raising * just ran fixup * bring back bert legacy * fix * nit * fix 56 errors on blenderbotsmall? * 18 for blenderbotsmall * tok auto * missed clip * fix tests * something missed * token healing * tok common tests update - nonmodel * try to fix non-model test in test_tokenization_utils * fix hub tests * try to fix hub tests * custom vocab related fixed * bert jap * BERT JAP * rename bert legacy to bert legacy * Wav2vec2 * fix in tok python to update total vocab size - fixes speech t5 * blender bot small * forgot test file * test failures * marian * gpt2 tiktoken * big bird / marian * udop * forgot couple changes * test_serve fix * missing import * a couple processors fixes * style partly * fix to fetch tests ci * Revert branch back to commit f5bc69e state * revert branch to styling * update mistral after merge * fixes for non model tests * some processor test fixes * more processor test fixes * more processor fixes * hub tests * python tok utils * fix hub test * make style for now * remove problemattic fic copies * python utils/check_copies.py --fix_and_overwrite * more styling * fixup * silence docstirng * fix import? * fix imports * add the local test as well * throw spm error * llamas * fix a couple tests * broke ci * broke ci * broke ci * broke ci * add logs to debug gemma on ci * gemma and llama * gemma * revert las commit * gemma debug * gemma debug * gemma * safely import spiece backend * tok tests * check none * setup and qual * ruff * del dev files * tok auto * fill docstrings * update auto * blenderbot small nit * add migration guide * move mixtral patch to `TokenizersBackend`, move `TokenizerExtractor` * rename MistralCommonTokenizer to MistralCommonB ackend * nit * fix failures * fixup * remoove one old test * mark the slow one as slow * very small fixes * update auto mapping for missing ones * fixup lorsd * fixup doc and stuff * should be the final fixe * processing update * update * FIX or brute AI fix the llava test * style * slow? * fix is offline mode? * fix mt5 * One tok utils (#42462) * consolidate python and utils tokenization files, they are copies * ruff and ref * Format * fix cohere * ? * up * am I dumbb? * grumble --------- Co-authored-by: Arthur <arthur.zucker@gmail.com>

* first shot * default to reversing * oupso * oupsi 2 * oupsi 3 * fix renamed kwargs * fix timm_wrapper * remove fix_state_dict methods * can do it all the time, with __init__ as well * doc * oupsi * fix * create helper * fix annotation annoying isue * small fix * small fixes * alright commit all that already * oupsi * the fix * update quantizers * this works * the hardcoded regex got me hard.... * style * the final one * cleanup a bit * better * style * oupsi readded it * do it inside the ops instead - no need for full names anymore * reverse quantizers and simplify signatures * small thingy * add no_grad decorator * utils to rename keys * oupssii again * add test * simplify nicely

* pass the generation parameters to generate() * fix use_task_specific_params to separate model.config and model.generation_config params * fix style * some fixes * remove redundant check * update expectation for llama_7b_bf16 on rocm * Update tests/models/llama/test_modeling_llama.py Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> --------- Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>

github-actions · 2025-11-28T17:09:52Z

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

gpt_oss:
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_assisted_decoding_matches_greedy_search_0_random
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_assisted_decoding_matches_greedy_search_1_same
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_assisted_decoding_sample
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_attention_outputs
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_batching_equivalence
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_beam_sample_generate
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_beam_sample_generate_dict_output
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_beam_search_generate
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_beam_search_generate_dict_output
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_beam_search_generate_dict_outputs_use_cache
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_causal_lm_can_accept_training_kwargs
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_cpu_offload
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_determinism
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_disk_offload_bin
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_disk_offload_safetensors
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_feed_forward_chunking
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_forward_with_logits_to_keep
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_compilation_all_outputs
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_compile_model_forward_fullgraph
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_continue_from_inputs_embeds
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_continue_from_past_key_values
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_from_inputs_embeds_0_greedy
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_from_inputs_embeds_1_beam_search
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_from_inputs_embeds_with_static_cache
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_from_random_inputs_embeds
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_methods_with_logits_to_keep
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_generate_with_static_cache
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_greedy_generate
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_greedy_generate_dict_outputs
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_greedy_generate_dict_outputs_use_cache
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_hidden_states_output
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_inputs_embeds
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_inputs_embeds_matches_input_ids
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_left_padding_compatibility
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_load_with_mismatched_shapes
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_model
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_model_outputs_equivalence
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_model_rope_scaling_from_config_0_linear
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_model_rope_scaling_from_config_1_dynamic
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_model_rope_scaling_from_config_2_yarn
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_past_key_values_format
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_prompt_lookup_decoding_matches_greedy_search
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_resize_embeddings_untied
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_sample_generate
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_sample_generate_dict_output
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_save_load
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_sequence_classification_model
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_sequence_classification_model_for_multi_label
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_sequence_classification_model_for_single_label
tests/models/gpt_oss/test_modeling_gpt_oss.py::GptOssModelTest::test_token_classification_model

3outeille · 2025-11-28T23:05:10Z

run-slow: deepseek_v2, deepseek_v3, dots1, flex_olmo, glm4_moe, glm4v_moe, gpt_oss, hunyuan_v1_moe, jamba, lfm2_moe, minimax, mixtral, nanochat, olmoe, phimoe, qwen2_moe

github-actions · 2025-11-28T23:06:22Z

This comment contains run-slow, running the specified jobs:

models: ["models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/flex_olmo", "models/glm4_moe", "models/glm4v_moe", "models/gpt_oss", "models/hunyuan_v1_moe", "models/jamba", "models/lfm2_moe", "models/minimax", "models/mixtral", "models/nanochat", "models/olmoe", "models/phimoe", "models/qwen2_moe"]
quantizations: []

github-actions · 2025-11-29T00:19:16Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

ArthurZucker

Thanks

ArthurZucker · 2025-12-01T15:27:07Z

    _supports_attention_backend = True
    _can_record_outputs = {
-        "router_logits": OutputRecorder(Qwen3VLMoeTextTopKRouter, layer_name="mlp.router", index=0),
+        "router_logits": OutputRecorder(Qwen3VLMoeTextTopKRouter, layer_name="mlp.gate", index=0),


Okay looked weird but should be alright

ArthurZucker · 2025-12-01T15:27:32Z

+    mapping["flex_olmo"] = mapping["qwen2_moe"].copy()
+    mapping["olmoe"] = mapping["qwen2_moe"].copy()


github-actions · 2025-12-01T16:12:24Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v2, deepseek_v3, dots1, flex_olmo, glm4_moe, glm4v_moe, gpt_oss, hunyuan_v1_moe, jamba, lfm2_moe, minimax, mixtral, olmoe, phimoe, qwen2_moe, qwen3_moe

* remove zero_like + scatter * fix mixtral moe * fix other moe models as well * fix ci * fix modular mixtral * fix qwen2_moe + qwen3_next * fix device mismatch for qwen3_vl_moe to pass tests * fix modular mixtral * fix other models * rm slow tokenizers (huggingface#40936) * fixes missed * gemma test fix * refactor * rm legacy from llama * added renaming * add _model * update legacy * update legacy * fix docstring * always load blank, then set _tokenizer if we have it * new toks * update all berttokenizer based models * apply feedback - delete bert duplicates * more models --> fast only * more convert_slow models * fix common test refs * updating fast only tokenizers * openai and pegasus * enable sentencepiecebackend * more models * code gen * t5 * code gen tests * speecht5 * mbart * mbart50 * more models * more models * layouglmv2 * update tests * update tests * update tests * pretrainedtokenizer * whisper * whisper * layoutxlm and storing backends * refactor sentencepiecebackend and additional_special_tokens * renaming tokenization_utils --> tokenization_python * udpate tests * bert test * blenderbot * clip * codegen * code_llama * cohere * deberata, deberat v2, funnel * gpt2 * batch update tests * pegasus qwen2 roberta * more models * layout tests * some renaming * fix references to utils_fast * fix refs * fix refs * fix refs * fix refs * fix refs * fix refs * fix refs * fix some tests * regression * fix refs * fix refs * missed the most crucial file in my last commit * fix refs * fix refs * fix refs * batch encode fix * fix some tests * BC for batch_decode bc too many refs * more tests * fix more tests * fix for processors * fixing more models * deleted mbart50 by accident * seamless m4t * albert fix * whisper * layout3 * attempt to fix cached tokenizers on CI * trying another fix on CI * again try to work around CI * bertweet * tapas * mbart50 * luke * mluke * markuplm * markuplm * fix some more auto tests * some random model failures * mistralcommontestser * more fixes * ref fix * siglip * marian * plbart * update utils toks * seamless m4t * roc bert * udpate byt5 test * xlm * esm * roformer * code llama * biogpt * m2m100 * dpr and flaubert * xlm and speech to text * tok backend pass object * tokenizer object pass * wav2vec2 * wav2vec2 * cpmant * update utils tokenizers * cpmant * bartpho * test apply chat template assistant mask * apply chat template video * apply chat template assistant mask * test torch * update from slow in base and fix donut processor errors * auto to point to tokenizers backend, fix kosmos2 * some non model fixes for old slow models that no longer have their own tokenizer file as they are the same as bert * missed file from last commit * idefics2 * fixup * fixup * pretrained tokenizer fast test update * stash * bad merged * cherry pick more stuff that did not merge well * fix gptsw3 * nit warn for now * update error raising * just ran fixup * bring back bert legacy * fix * nit * fix 56 errors on blenderbotsmall? * 18 for blenderbotsmall * tok auto * missed clip * fix tests * something missed * token healing * tok common tests update - nonmodel * try to fix non-model test in test_tokenization_utils * fix hub tests * try to fix hub tests * custom vocab related fixed * bert jap * BERT JAP * rename bert legacy to bert legacy * Wav2vec2 * fix in tok python to update total vocab size - fixes speech t5 * blender bot small * forgot test file * test failures * marian * gpt2 tiktoken * big bird / marian * udop * forgot couple changes * test_serve fix * missing import * a couple processors fixes * style partly * fix to fetch tests ci * Revert branch back to commit f5bc69e state * revert branch to styling * update mistral after merge * fixes for non model tests * some processor test fixes * more processor test fixes * more processor fixes * hub tests * python tok utils * fix hub test * make style for now * remove problemattic fic copies * python utils/check_copies.py --fix_and_overwrite * more styling * fixup * silence docstirng * fix import? * fix imports * add the local test as well * throw spm error * llamas * fix a couple tests * broke ci * broke ci * broke ci * broke ci * add logs to debug gemma on ci * gemma and llama * gemma * revert las commit * gemma debug * gemma debug * gemma * safely import spiece backend * tok tests * check none * setup and qual * ruff * del dev files * tok auto * fill docstrings * update auto * blenderbot small nit * add migration guide * move mixtral patch to `TokenizersBackend`, move `TokenizerExtractor` * rename MistralCommonTokenizer to MistralCommonB ackend * nit * fix failures * fixup * remoove one old test * mark the slow one as slow * very small fixes * update auto mapping for missing ones * fixup lorsd * fixup doc and stuff * should be the final fixe * processing update * update * FIX or brute AI fix the llava test * style * slow? * fix is offline mode? * fix mt5 * One tok utils (huggingface#42462) * consolidate python and utils tokenization files, they are copies * ruff and ref * Format * fix cohere * ? * up * am I dumbb? * grumble --------- Co-authored-by: Arthur <arthur.zucker@gmail.com> * [loading/saving] Reverse all loading operations when saving (huggingface#42396) * first shot * default to reversing * oupso * oupsi 2 * oupsi 3 * fix renamed kwargs * fix timm_wrapper * remove fix_state_dict methods * can do it all the time, with __init__ as well * doc * oupsi * fix * create helper * fix annotation annoying isue * small fix * small fixes * alright commit all that already * oupsi * the fix * update quantizers * this works * the hardcoded regex got me hard.... * style * the final one * cleanup a bit * better * style * oupsi readded it * do it inside the ops instead - no need for full names anymore * reverse quantizers and simplify signatures * small thingy * add no_grad decorator * utils to rename keys * oupssii again * add test * simplify nicely * Fix T5 tests: use generation_config for generation parameters (huggingface#42419) * pass the generation parameters to generate() * fix use_task_specific_params to separate model.config and model.generation_config params * fix style * some fixes * remove redundant check * update expectation for llama_7b_bf16 on rocm * Update tests/models/llama/test_modeling_llama.py Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> --------- Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> * linting * more fix to pass the CI tests * fix lfm2 moe * fix docstring * fix docstring * fix qwen like model * fix flex olmo * revert lfm2 moe config * make fixup * fix docstring * fix conversion mapping * fix inference of gpt-oss * add some fixes to gpt-oss (but still not good) * fix modular * we need errors I think * fix config issue * this was fixed --------- Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: BADAOUI Abdennacer <106801897+Abdennacer-Badaoui@users.noreply.github.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>

remove zero_like + scatter

c480438

3outeille changed the title ~~remove zero_like + scatter~~ Fix MoE for V5 Nov 27, 2025

3outeille mentioned this pull request Nov 27, 2025

Fix transformers MoE compatibility with VLLM #42429

Closed

Merge branch 'main' into fix-moe-v5

c615c47

3outeille and others added 5 commits November 27, 2025 17:39

fix mixtral moe

073326f

Merge branch 'fix-moe-v5' of https://github.com/huggingface/transformers

8ff6c18

into fix-moe-v5

fix other moe models as well

f3457e2

fix ci

16737a4

Merge branch 'main' into fix-moe-v5

01da12d

3outeille added 2 commits November 27, 2025 18:14

fix modular mixtral

57541cd

Merge branch 'fix-moe-v5' of https://github.com/huggingface/transformers

b7eb918

into fix-moe-v5

3outeille added 3 commits November 28, 2025 08:34

fix qwen2_moe + qwen3_next

3992748

fix device mismatch for qwen3_vl_moe to pass tests

15f41b9

fix modular mixtral

35e8bf8

3outeille and others added 4 commits November 28, 2025 09:18

fix other models

e6f026f

3outeille force-pushed the fix-moe-v5 branch from fa02933 to 326eb75 Compare November 28, 2025 09:18

3outeille and others added 2 commits November 28, 2025 10:22

Merge branch 'main' into fix-moe-v5

50cc1e9

linting

8bccd8c

revert lfm2 moe config

bf66927

3outeille mentioned this pull request Nov 28, 2025

Fix broken models due to modeling/weight mapping #42468

Merged

Merge branch 'main' into fix-moe-v5

4d6e993

3outeille and others added 4 commits November 28, 2025 22:54

Merge branch 'main' into fix-moe-v5

144ec86

make fixup

ede2116

fix docstring

3132b5f

fix conversion mapping

2e04f12

3outeille and others added 3 commits December 1, 2025 11:52

Merge branch 'main' into fix-moe-v5

61d1b87

fix inference of gpt-oss

cdb3eb1

add some fixes to gpt-oss (but still not good)

5edd375

ArthurZucker approved these changes Dec 1, 2025

View reviewed changes

3outeille and others added 3 commits December 1, 2025 16:30

Merge branch 'main' into fix-moe-v5

8cc40f0

fix modular

a02f8bf

we need errors I think

d213808

ArthurZucker added 2 commits December 1, 2025 17:21

fix config issue

1317b4d

this was fixed

51d4b52

ArthurZucker merged commit 6316a9e into main Dec 1, 2025
15 of 24 checks passed

ArthurZucker deleted the fix-moe-v5 branch December 1, 2025 16:34

Rocketknight1 mentioned this pull request Dec 1, 2025

Fix Qwen3OmniMoE weight init #42531

Merged

ArthurZucker mentioned this pull request Dec 11, 2025

Add support for MiniMax-M2 #42028

Merged

5 tasks

vasqu mentioned this pull request Jan 7, 2026

[Fp8] Fix experts #43154

Merged

IlyasMoutawwakil mentioned this pull request Jan 14, 2026

Fix eager experts for DeepSeekV2 #43288

Merged

speediedan mentioned this pull request Jan 29, 2026

transformers v5 support TransformerLensOrg/TransformerLens#1164

Merged

7 tasks

		mapping["flex_olmo"] = mapping["qwen2_moe"].copy()
		mapping["olmoe"] = mapping["qwen2_moe"].copy()

Conversation

3outeille commented Nov 27, 2025

Uh oh!

3outeille commented Nov 27, 2025

Uh oh!

molbap commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 27, 2025

Uh oh!

3outeille commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

CI Results

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

3outeille commented Nov 28, 2025

Uh oh!

github-actions Bot commented Nov 28, 2025

Uh oh!

github-actions Bot commented Nov 28, 2025

CI Results

Model CI Report

❌ Failed tests

Uh oh!

3outeille commented Nov 28, 2025

Uh oh!

github-actions Bot commented Nov 28, 2025

Uh oh!

github-actions Bot commented Nov 29, 2025

CI Results

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants