Skip to content

[mxfp8 moe training][docs] add tutorial for training with MXFP8 expert parallel#3752

Merged
danielvegamyhre merged 1 commit into
mainfrom
danielvegamyhre/stack/127
Feb 9, 2026
Merged

[mxfp8 moe training][docs] add tutorial for training with MXFP8 expert parallel#3752
danielvegamyhre merged 1 commit into
mainfrom
danielvegamyhre/stack/127

Conversation

@danielvegamyhre

@danielvegamyhre danielvegamyhre commented Jan 29, 2026

Copy link
Copy Markdown
Contributor

@pytorch-bot

pytorch-bot Bot commented Jan 29, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3752

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 49737db with merge base 30fcb15 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 806eb90 to a54d592 Compare January 29, 2026 01:42
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/126 branch from 302d0b4 to 4270ba0 Compare January 29, 2026 01:42
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 29, 2026
@danielvegamyhre danielvegamyhre added mx topic: bug fix Use this tag for PRs that fix bugs moe topic: documentation Use this tag if this PR adds or improves documentation and removed topic: bug fix Use this tag for PRs that fix bugs labels Jan 29, 2026
@danielvegamyhre danielvegamyhre marked this pull request as draft January 29, 2026 02:14
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 02:14
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from a54d592 to 8282223 Compare January 29, 2026 02:14
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 02:14
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 29, 2026 02:14
@danielvegamyhre danielvegamyhre marked this pull request as draft January 29, 2026 02:26
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 02:26
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 8282223 to 3f8c54a Compare January 29, 2026 02:26
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 02:26
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 29, 2026 02:26
@danielvegamyhre danielvegamyhre marked this pull request as draft January 29, 2026 04:54
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 04:54
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 3f8c54a to 761c09d Compare January 29, 2026 04:54
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 04:54
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 29, 2026 04:54
@danielvegamyhre danielvegamyhre marked this pull request as draft January 29, 2026 16:57
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 16:57
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 761c09d to cbe10e7 Compare January 29, 2026 16:57
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 19:03
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 021de8c to 9d79704 Compare January 29, 2026 19:03
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 19:03
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 29, 2026 19:03
@danielvegamyhre danielvegamyhre marked this pull request as draft January 29, 2026 19:13
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 19:13
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 9d79704 to 67b3b96 Compare January 29, 2026 19:13
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 19:13
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 29, 2026 19:13

@vkuzo vkuzo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp for be week, didn't read

you should rebase on #3743 and make sure you put this in the new docs subfolder

@danielvegamyhre danielvegamyhre marked this pull request as draft January 29, 2026 22:14
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 29, 2026 22:14
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 67b3b96 to 0c0e066 Compare January 29, 2026 22:14
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/126 January 29, 2026 22:14
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 29, 2026 22:14
@vkuzo

vkuzo commented Jan 30, 2026

Copy link
Copy Markdown
Contributor

will need a rebase on #3769

@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/126 branch from c449c6d to cc3891f Compare January 30, 2026 01:36
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/126 to main January 30, 2026 02:16
@danielvegamyhre danielvegamyhre marked this pull request as draft January 30, 2026 02:23
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/127 branch from 0c0e066 to 4c607e4 Compare January 30, 2026 02:24
@danielvegamyhre danielvegamyhre marked this pull request as ready for review January 30, 2026 02:24
@danielvegamyhre danielvegamyhre marked this pull request as draft January 30, 2026 03:02
^^^^^^^^^^^^^

1. (Recommended) Create a new virtual environment with conda or venv.
2. `Install torchao <https://github.com/pytorch/ao/tree/main?tab=readme-ov-file#installation>`__ nightly build (required for CUDA 12.8+ support).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need torch too? Which versions of all these dependencies? For example, 2.10 or later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

| 131072 | 2048 | 1408 | 8 | 3.278 | 2.913 | 1.13x | 4.934 | 3.881 | 1.27x | 1.21x |
+----------+-------+--------------+---------------+---------------+----------------+---------------+---------------+----------------+---------------+-----------------+

As shown, using MXFP8 for all-to-all communications achieves **1.14-1.25x total speedup** versus only quantizing directly before the grouped GEMMs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a conclusion? Like what did the user learn in the tutorial and where else to find more info?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done!

# 🔥 happens in BF16 as well.
# 🔥 In the backward pass, the incoming upstream gradients will be the MXTensor outputs of the
# 🔥 MXFP8 all-to-all combine backward pass, so this unpermute autograd func accepts MXTensor
# 🔥 inputs and performs the reordering in MXFP8.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-emojify

@danielvegamyhre danielvegamyhre Jan 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok haha i was trying to draw the reader's eye to the most important parts... will de-emojify

@ezyang

ezyang commented Jan 30, 2026

Copy link
Copy Markdown

How is the code tested?

@danielvegamyhre

danielvegamyhre commented Jan 30, 2026

Copy link
Copy Markdown
Contributor Author

How is the code tested?

@ezyang we have unit tests for this pattern of chaining the autograd functions (with eager and compile), and we also integrated into Torchtitan and did large scale convergence + performance testing on an external cloud partner's B200 cluster (joint blog post on this coming soon!)

…t parallel

stack-info: PR: #3752, branch: danielvegamyhre/stack/127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: training quantize_ api training flow moe mx topic: documentation Use this tag if this PR adds or improves documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants