[mxfp8 moe training][docs] add tutorial for training with MXFP8 expert parallel by danielvegamyhre · Pull Request #3752 · pytorch/ao

danielvegamyhre · 2026-01-29T01:42:33Z

Stacked PRs:

[mxfp8 moe training][docs] add tutorial for training with MXFP8 expert parallel

pytorch-bot · 2026-01-29T01:42:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3752

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 49737db with merge base 30fcb15 ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/test_low_bit_optim.py::TestQuantize::test_bf16_stochastic_round_dtensor_device_cuda_compile_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo

stamp for be week, didn't read

you should rebase on #3743 and make sure you put this in the new docs subfolder

vkuzo · 2026-01-30T01:16:47Z

will need a rebase on #3769

svekars · 2026-01-30T19:00:31Z

+^^^^^^^^^^^^^
+
+1. (Recommended) Create a new virtual environment with conda or venv.
+2. `Install torchao <https://github.com/pytorch/ao/tree/main?tab=readme-ov-file#installation>`__ nightly build (required for CUDA 12.8+ support).


Do you need torch too? Which versions of all these dependencies? For example, 2.10 or later?

svekars · 2026-01-30T19:01:18Z

+    |   131072 |  2048 |         1408 |             8 |         3.278 |          2.913 | 1.13x         |         4.934 |          3.881 | 1.27x         | 1.21x           |
+    +----------+-------+--------------+---------------+---------------+----------------+---------------+---------------+----------------+---------------+-----------------+
+
+As shown, using MXFP8 for all-to-all communications achieves **1.14-1.25x total speedup** versus only quantizing directly before the grouped GEMMs.


Can you add a conclusion? Like what did the user learn in the tutorial and where else to find more info?

good point, done!

ezyang · 2026-01-30T19:44:53Z

+            # 🔥 happens in BF16 as well.
+            # 🔥 In the backward pass, the incoming upstream gradients will be the MXTensor outputs of the
+            # 🔥 MXFP8 all-to-all combine backward pass, so this unpermute autograd func accepts MXTensor
+            # 🔥 inputs and performs the reordering in MXFP8.


ok haha i was trying to draw the reader's eye to the most important parts... will de-emojify

ezyang · 2026-01-30T19:45:22Z

How is the code tested?

danielvegamyhre · 2026-01-30T20:32:46Z

How is the code tested?

@ezyang we have unit tests for this pattern of chaining the autograd functions (with eager and compile), and we also integrated into Torchtitan and did large scale convergence + performance testing on an external cloud partner's B200 cluster (joint blog post on this coming soon!)

…t parallel stack-info: PR: #3752, branch: danielvegamyhre/stack/127

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 806eb90 to a54d592 Compare January 29, 2026 01:42

danielvegamyhre force-pushed the danielvegamyhre/stack/126 branch from 302d0b4 to 4270ba0 Compare January 29, 2026 01:42

This was referenced Jan 29, 2026

[mxfp8 moe training] remove outdated example #3750

Merged

[mxfp8 moe training] add MXFP8 expert parallel example #3751

Merged

[mxfp8 moe training] fix bug in bench_ep_pipeline script #3753

Merged

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 29, 2026

danielvegamyhre added mx topic: bug fix Use this tag for PRs that fix bugs moe topic: documentation Use this tag if this PR adds or improves documentation and removed topic: bug fix Use this tag for PRs that fix bugs labels Jan 29, 2026

danielvegamyhre marked this pull request as draft January 29, 2026 02:14

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 02:14

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from a54d592 to 8282223 Compare January 29, 2026 02:14

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 02:14

danielvegamyhre marked this pull request as ready for review January 29, 2026 02:14

danielvegamyhre marked this pull request as draft January 29, 2026 02:26

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 02:26

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 8282223 to 3f8c54a Compare January 29, 2026 02:26

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 02:26

danielvegamyhre marked this pull request as ready for review January 29, 2026 02:26

danielvegamyhre marked this pull request as draft January 29, 2026 04:54

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 04:54

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 3f8c54a to 761c09d Compare January 29, 2026 04:54

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 04:54

danielvegamyhre marked this pull request as ready for review January 29, 2026 04:54

danielvegamyhre marked this pull request as draft January 29, 2026 16:57

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 16:57

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 761c09d to cbe10e7 Compare January 29, 2026 16:57

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 19:03

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 021de8c to 9d79704 Compare January 29, 2026 19:03

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 19:03

danielvegamyhre marked this pull request as ready for review January 29, 2026 19:03

danielvegamyhre marked this pull request as draft January 29, 2026 19:13

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 19:13

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 9d79704 to 67b3b96 Compare January 29, 2026 19:13

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 19:13

danielvegamyhre marked this pull request as ready for review January 29, 2026 19:13

danielvegamyhre requested a review from andrewor14 January 29, 2026 19:32

vkuzo approved these changes Jan 29, 2026

View reviewed changes

danielvegamyhre marked this pull request as draft January 29, 2026 22:14

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 22:14

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 67b3b96 to 0c0e066 Compare January 29, 2026 22:14

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 22:14

danielvegamyhre marked this pull request as ready for review January 29, 2026 22:14

danielvegamyhre force-pushed the danielvegamyhre/stack/126 branch from c449c6d to cc3891f Compare January 30, 2026 01:36

danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 30, 2026 02:16

danielvegamyhre marked this pull request as draft January 30, 2026 02:23

danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 0c0e066 to 4c607e4 Compare January 30, 2026 02:24

danielvegamyhre marked this pull request as ready for review January 30, 2026 02:24

danielvegamyhre marked this pull request as draft January 30, 2026 03:02

svekars reviewed Jan 30, 2026

View reviewed changes

ezyang reviewed Jan 30, 2026

View reviewed changes

[mxfp8 moe training][docs] add tutorial for training with MXFP8 exper…

49737db

…t parallel stack-info: PR: #3752, branch: danielvegamyhre/stack/127

danielvegamyhre mentioned this pull request Feb 10, 2026

MXFP8 MoE training prototype graduation tracker #3599

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mxfp8 moe training][docs] add tutorial for training with MXFP8 expert parallel#3752

[mxfp8 moe training][docs] add tutorial for training with MXFP8 expert parallel#3752
danielvegamyhre merged 1 commit into
mainfrom
danielvegamyhre/stack/127

danielvegamyhre commented Jan 29, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

vkuzo left a comment

Uh oh!

vkuzo commented Jan 30, 2026

Uh oh!

svekars Jan 30, 2026

Uh oh!

danielvegamyhre Feb 6, 2026

Uh oh!

svekars Jan 30, 2026

Uh oh!

danielvegamyhre Feb 6, 2026

Uh oh!

ezyang Jan 30, 2026

Uh oh!

danielvegamyhre Jan 30, 2026 •

edited

Loading

Uh oh!

ezyang commented Jan 30, 2026

Uh oh!

danielvegamyhre commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

danielvegamyhre commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!