meta registration for torch._scaled_mm with mxfp8 by vkuzo · Pull Request #148461 · pytorch/pytorch

vkuzo · 2025-03-04T18:20:09Z

Stack from ghstack (oldest at bottom):

-> meta registration for torch._scaled_mm with mxfp8 #148461

Summary:

Adds the meta registration logic for torch.compile to work with
torch._scaled_mm with mxfp8. Thanks to @eellison for the pointer to make inductor work with this.

Test Plan:

pytest test/test_matmul_cuda.py -k test_blockwise_mxfp8_compile -s

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-03-04T18:20:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148461

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 5e38d0b with merge base 23183fe ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-xpu-2025.0-py3.9 / build (gh) (trunk failure)
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/sstream:152:52: error: expected value in expression

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx) (gh) (#149979)
detectron2_fcos_r_50_fpn
pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (#149370)
Process completed with exit code 1.
pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge) (gh) (#144480)
backends/xnnpack/test/passes/test_convert_to_linear.py::TestConvertToLinear::test_fp32_convert_to_linear

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Adds the meta registration logic for torch.compile to work with `torch._scaled_mm` with mxfp8, with `aot_eager` backend. Note that we need #147873 for inductor to work. Test Plan: ``` pytest test/test_matmul_cuda.py -k test_blockwise_mxfp8_compile -s ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 248830a Pull Request resolved: #148461

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#148461 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 033d817 ghstack-comment-id: 2701679811 Pull Request resolved: #1841

vkuzo · 2025-03-11T14:35:41Z

+        C_ref = A_ref @ B_ref.t()
+
+        # TODO(#147873): switch to inductor backend after e8m0 is supported there
+        compiled_scaled_mm = torch.compile(torch._scaled_mm, backend="aot_eager")


when I rebase past https://github.com/pytorch/pytorch/pull/148722/files and then change the backend in this code to inductor, I see https://www.internalfb.com/phabricator/paste/view/P1753483593 . cc @eellison

[ghstack-poisoned]

Summary: Adds the meta registration logic for torch.compile to work with `torch._scaled_mm` with mxfp8, with `aot_eager` backend. Note that we need #147873 for inductor to work. Test Plan: ``` pytest test/test_matmul_cuda.py -k test_blockwise_mxfp8_compile -s ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 67791ed Pull Request resolved: #148461

[ghstack-poisoned]

Summary: Adds the meta registration logic for torch.compile to work with `torch._scaled_mm` with mxfp8, with `aot_eager` backend. Note that we need #147873 for inductor to work. Test Plan: ``` pytest test/test_matmul_cuda.py -k test_blockwise_mxfp8_compile -s ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ea7eb2f Pull Request resolved: #148461

[ghstack-poisoned]

Summary: After pytorch/pytorch#148461 lands, we can use `torch.float8_e8m0fnu` throughout our codebase and compile will still work, removing the workarounds. Test Plan: ``` pytest test/prototype/mx_formats/ -s -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 278117b ghstack-comment-id: 2755728114 Pull Request resolved: #1966

[ghstack-poisoned]

Summary: Adds the meta registration logic for torch.compile to work with `torch._scaled_mm` with mxfp8, with `aot_eager` backend. Note that we need #147873 for inductor to work. Test Plan: ``` pytest test/test_matmul_cuda.py -k test_blockwise_mxfp8_compile -s ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c22e77 Pull Request resolved: #148461

Summary: After pytorch/pytorch#148461 lands, we can use `torch.float8_e8m0fnu` throughout our codebase and compile will still work, removing the workarounds. Test Plan: ``` pytest test/prototype/mx_formats/ -s -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 278117b ghstack-comment-id: 2755728114 Pull Request resolved: #1966

vkuzo · 2025-03-26T23:35:05Z

@pytorchbot merge

pytorchmergebot · 2025-03-26T23:37:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@eellison

Summary: Adds the meta registration logic for torch.compile to work with `torch._scaled_mm` with mxfp8. Thanks to @eellison for the pointer to make inductor work with this. Test Plan: ``` pytest test/test_matmul_cuda.py -k test_blockwise_mxfp8_compile -s ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#148461 Approved by: https://github.com/drisspg, https://github.com/eellison

Update

c08f45a

[ghstack-poisoned]

vkuzo added the topic: not user facing topic category label Mar 4, 2025

vkuzo mentioned this pull request Mar 5, 2025

enable torch.compile for mxfp8_cublas recipe pytorch/ao#1841

Merged

vkuzo commented Mar 11, 2025

View reviewed changes

Update

ceed023

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor module: inductor labels Mar 13, 2025

vkuzo requested review from drisspg and eellison March 13, 2025 14:19

Update

d837ecc

[ghstack-poisoned]

drisspg approved these changes Mar 13, 2025

View reviewed changes

eellison approved these changes Mar 13, 2025

View reviewed changes

vkuzo mentioned this pull request Mar 17, 2025

use torch.float8_e8m0fnu in mx_formats pytorch/ao#1882

Closed

Update

d38798f

[ghstack-poisoned]

This was referenced Mar 26, 2025

delete mxfp8 torch._scaled_mm wrapper pytorch/ao#1965

Merged

use torch.float8_e8m0fnu in mx_formats pytorch/ao#1966

Merged

Update

5e38d0b

[ghstack-poisoned]

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 26, 2025

pytorchmergebot added the merging label Mar 26, 2025

pytorchmergebot added the Merged label Mar 27, 2025

pytorchmergebot closed this in dad0854 Mar 27, 2025

pytorchmergebot removed the merging label Mar 27, 2025

github-actions Bot deleted the gh/vkuzo/7/head branch May 2, 2025 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta registration for torch._scaled_mm with mxfp8#148461

meta registration for torch._scaled_mm with mxfp8#148461
vkuzo wants to merge 5 commits into
gh/vkuzo/7/basefrom
gh/vkuzo/7/head

vkuzo commented Mar 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 4, 2025 •

edited

Loading

Uh oh!

vkuzo Mar 11, 2025

Uh oh!

vkuzo commented Mar 26, 2025

Uh oh!

pytorchmergebot commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vkuzo commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148461

✅ You can merge normally! (4 Unrelated Failures)

Uh oh!

vkuzo Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Mar 26, 2025

Uh oh!

pytorchmergebot commented Mar 26, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vkuzo commented Mar 4, 2025 •

edited

Loading

pytorch-bot Bot commented Mar 4, 2025 •

edited

Loading