[MoE] Qwen3MoE, Qwen3VLMoE, GPT OSS, Glm 4.7, DeepseekV3 MoE kernels 🚀 by Datta0 · Pull Request #450 · unslothai/unsloth-zoo

Datta0 · 2026-01-29T05:41:22Z

This PR extensively improves fine tuning performance for the above mentioned MoE models, but is reliant on some changes that are integral to transformers V5.

PS: If we want to use triton kernels or grouped_mm as mentioned here, we need the changes in unsloth

Note that along with speed improvements, I also observed memory usage improvements wherein the grouped_mm was able to do a 8192 sequence length fine-tuning on H100 in 16-bit LoRA, but the same was not true for pure PyTorch code which threw OOMs

Extensive benchmarks and release blog

Transformers v4 unsloth latest release

Transformers v5 + pure pytorch

Transformers v5 + grouped_mm

Transformers v5 + unsloth triton kernels

Previous PRs: #396 #447

…rnels

This reverts commit 169b1ea.

Two fixes: 1. Early return when HAS_TRITON_KERNELS=False to skip MXFP4 patches gracefully 2. Move mlp_forward inside if HAS_TRITON_KERNELS block since it uses routing variable

The file already has AGPLv3 license header at top, so inline comments are unnecessary for trivial 1-line functions: - native_moe_grouped_mm() - _should_use_separated_lora() - register_weight_preprocessor() - get_weight_preprocessor()

This reverts commit 690f25ede162777ace69f08dbf7fe83bbc3a4db5.

This reverts commit e9dddc3597b2dd333b10278951591c07aa811fa5.

Removed random AI mat muls and lora extractions that slowed down entire MoE forward pass

This reverts commit 39c68f1.

This reverts commit 44f50c2.

Datta0 added 30 commits December 12, 2025 06:17

[WIP] fix for qwen3 moe torch compile issue

730c598

faster forward passes

f7967d3

Cleanup

8e7dc5c

[WIP] Use unsloth triton kernels

b38a20e

Perf go brrr sad memory

5a8adbb

clear cache after autotune

9cb5c88

Efficient

f2cda49

Fix tensor is none check

e93c67d

torch.grouped_mm

32ffa85

Adapt to qwen3-vl-moe

32bf9a4

fix qwen3-moe bugs

2fee30c

cleanup

44f4eda

cleanup

820be05

cleanup

3d7cc2c

Fix issues with triton

7dc9e17

Merge branch 'nightly' into qwen3_moe_kernels

3383613

refactor lora request handling

699945f

Merge remote-tracking branch 'origin/main' into qwen3_moe_kernels

2d30faa

Merge remote-tracking branch 'datta0/vllm_lora_req' into qwen3_moe_ke…

1ef4d0e

…rnels

fixup qwen3_vl_moe training

225b285

grouped_mm for H100 or higher

bbc8072

contiguous for triton

b502a48

cleanup

ded4bf2

rework triton import logic

dcd5a14

indentation fix

ed37079

Explicit tensor handling

9382b6f

rework operations to suit newer transformers v5

5225caa

GRPO fixes

4aa9bd6

grouped_mm forward check :)

169b1ea

Revert "grouped_mm forward check :)"

ee88018

This reverts commit 169b1ea.

Datta0 added 3 commits February 2, 2026 08:58

AGPL license

08f935b

Minor qwen3_vl_moe derp

44b4c9e

Merge remote-tracking branch 'origin/main' into glm47_moe_kernels

98034c8

Datta0 changed the title ~~[MoE] Glm 4.7 moe kernels~~ [MoE] Qwen3MoE, Qwen3VLMoE, GPT OSS, Glm 4.7, DeepseekV3 moe kernels 🚀 Feb 3, 2026

Datta0 changed the title ~~[MoE] Qwen3MoE, Qwen3VLMoE, GPT OSS, Glm 4.7, DeepseekV3 moe kernels 🚀~~ [MoE] Qwen3MoE, Qwen3VLMoE, GPT OSS, Glm 4.7, DeepseekV3 MoE kernels 🚀 Feb 3, 2026

This was referenced Feb 3, 2026

[WIP][MoE] Gpt oss moe kernels #447

Closed

Qwen3 moe optimisations #396

Closed

danielhanchen and others added 18 commits February 3, 2026 11:12

Fix MXFP4 routing bug when triton_kernels unavailable

3803908

Two fixes: 1. Early return when HAS_TRITON_KERNELS=False to skip MXFP4 patches gracefully 2. Move mlp_forward inside if HAS_TRITON_KERNELS block since it uses routing variable

Proper AGPL license for zoo

83b2912

[WIP] fix weight extraction

ec2ffa1

Revert "[WIP] fix weight extraction"

6f71b72

This reverts commit 690f25ede162777ace69f08dbf7fe83bbc3a4db5.

Reapply "[WIP] fix weight extraction"

117b255

This reverts commit e9dddc3597b2dd333b10278951591c07aa811fa5.

Simplify grouped_mm patch

4beff5f

Use precomputed indices

990821d

[WIP] Fix saving moe lora

6b8abf5

Update moe_utils.py

c15d773

Removed random AI mat muls and lora extractions that slowed down entire MoE forward pass

Update moe_utils.py

b8b8728

Fix MoE

67b7fc0

Fix saving for GLM and rename functions

d29c634

Save merge fix for GPT OSS

39c68f1

Revert "Save merge fix for GPT OSS"

44f50c2

This reverts commit 39c68f1.

Reapply "Save merge fix for GPT OSS"

7056e23

This reverts commit 44f50c2.

Fix saving for Qwen3, GLM, gpt oss, deepseek

f6d8ed9

Merge remote-tracking branch 'origin/main' into glm47_moe_kernels

ef8f21f

danielhanchen merged commit fa5b2bf into unslothai:main Feb 5, 2026

This was referenced Feb 6, 2026

[FIX] Qwen3 moe torch compile issue #381

Closed

[Fix][MoE] Gpt OSS Fixes #471

Merged

Datta0 mentioned this pull request Feb 16, 2026

Fix MoE wrapper handling for PR 450 #453

Closed

Datta0 mentioned this pull request Feb 24, 2026

Fix MoE target_parameters module_count alignment (#3405, #3701) #499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MoE] Qwen3MoE, Qwen3VLMoE, GPT OSS, Glm 4.7, DeepseekV3 MoE kernels 🚀#450

[MoE] Qwen3MoE, Qwen3VLMoE, GPT OSS, Glm 4.7, DeepseekV3 MoE kernels 🚀#450
danielhanchen merged 98 commits into
unslothai:mainfrom
Datta0:glm47_moe_kernels

Datta0 commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Datta0 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Datta0 commented Jan 29, 2026 •

edited

Loading