enable torch.compile for mxfp8_cublas recipe by vkuzo · Pull Request #1841 · pytorch/ao

vkuzo · 2025-03-05T17:58:20Z

Summary:

This PR enables MXLinear with mxfp8_cublas recipe to use
torch.compile.

The current approach is a short term workaround until
pytorch/pytorch#148461 is done. Since we can't
use e8m0 in torchinductor or triton yet, we create a custom op wrapper
around torch._scaled_mm which takes uint8 scales and does the cast to
e8m0 inside the wrapper, where torchinductor can't see it.

Test Plan:

// this now works (although performance is not ideal due to https://github.com/pytorch/ao/issues/1788)
python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas

// we can also uncomment the hardware check and run the unit test
pytest test/prototype/mx_formats -s -k test_linear_compile

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-03-05T17:58:21Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-03-05T17:58:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1841

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#148461 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 033d817 ghstack-comment-id: 2701679811 Pull Request resolved: #1841

[ghstack-poisoned]

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#147873 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f3ebd12 ghstack-comment-id: 2701679811 Pull Request resolved: #1841

[ghstack-poisoned]

Summary: This PR enables `MXLinear` with `mxfp8_cublas` recipe to use torch.compile. The current approach is a short term workaround until pytorch/pytorch#147873 is done. Since we can't use e8m0 in torchinductor or triton yet, we create a custom op wrapper around `torch._scaled_mm` which takes `uint8` scales and does the cast to e8m0 inside the wrapper, where torchinductor can't see it. Test Plan: ``` // this now works (although performance is not ideal due to #1788) python benchmarks/float8/profile_lowp_training.py ~/local/tmp/20250305_test --mx_recipe_name mxfp8_cublas // we can also uncomment the hardware check and run the unit test pytest test/prototype/mx_formats -s -k test_linear_compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e5687e3 ghstack-comment-id: 2701679811 Pull Request resolved: #1841

[ghstack-poisoned]

danielvegamyhre · 2025-03-06T15:56:03Z

+    is_sm_at_least_100(),
+    reason="triton does not work yet on CUDA capability 10.0",
+)
+@pytest.mark.skipif(
+    not is_sm_at_least_100(),
+    reason="MX gemms require CUDA capability 10.0",
+)


Combining skip if is_sm_at_least_100() with skip if not is_sm_at_least_100() will prevent the test from ever running, so I just want to confirm, is this test intentionally being skipped until the new release of pytorch (with triton that supports compute capability 10.0) is part of CI?

yes, that's corrrect. It's skipped in CI because we don't have B200s in CI, and it's skipped locally because it requires building triton from source. I uncomment these tests if I need to run them, for now.

[ghstack-poisoned]

* Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned]

vkuzo added 17 commits March 4, 2025 12:53

Update

535a51a

[ghstack-poisoned]

Update

600dc5a

[ghstack-poisoned]

Update

cf4d538

[ghstack-poisoned]

Update

1c3f6af

[ghstack-poisoned]

Update

2290767

[ghstack-poisoned]

Update

be3430f

[ghstack-poisoned]

Update

cacff4e

[ghstack-poisoned]

Update

f7b8e37

[ghstack-poisoned]

Update

9c36ad5

[ghstack-poisoned]

Update

04621a1

[ghstack-poisoned]

Update

65a997b

[ghstack-poisoned]

Update

bd70141

[ghstack-poisoned]

Update

f7099cd

[ghstack-poisoned]

Update

d80d0e2

[ghstack-poisoned]

Update

eb89bb2

[ghstack-poisoned]

Update

ee76770

[ghstack-poisoned]

Update

f9cfbaf

[ghstack-poisoned]

vkuzo mentioned this pull request Mar 5, 2025

enable compile with mxfp8 and mxfp4 cutlass gemm #1838

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025

vkuzo added the topic: performance Use this tag if this PR improves the performance of a feature label Mar 5, 2025

Update

4a3bb73

[ghstack-poisoned]

vkuzo requested review from danielvegamyhre and drisspg March 5, 2025 18:06

vkuzo added 2 commits March 5, 2025 10:23

Update

8f82ae7

[ghstack-poisoned]

Update

cc0ae0e

[ghstack-poisoned]

vkuzo mentioned this pull request Mar 5, 2025

enable mxfp8_cublas recipe in roofline script #1843

Merged

vkuzo added 4 commits March 5, 2025 10:39

Update

ef4975f

[ghstack-poisoned]

Update

e2383bb

[ghstack-poisoned]

Update

55df7fc

[ghstack-poisoned]

Update

cb824b2

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/52/head to main March 5, 2025 19:49

Update

729e471

[ghstack-poisoned]

danielvegamyhre approved these changes Mar 6, 2025

View reviewed changes

Update

042da70

[ghstack-poisoned]

vkuzo merged commit 2ca3016 into main Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable torch.compile for mxfp8_cublas recipe#1841

enable torch.compile for mxfp8_cublas recipe#1841
vkuzo merged 26 commits into
mainfrom
gh/vkuzo/53/head

vkuzo commented Mar 5, 2025

Uh oh!

vkuzo commented Mar 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 5, 2025 •

edited

Loading

Uh oh!

danielvegamyhre Mar 6, 2025

Uh oh!

vkuzo Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vkuzo commented Mar 5, 2025

Uh oh!

vkuzo commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1841

Uh oh!

danielvegamyhre Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vkuzo commented Mar 5, 2025 •

edited

Loading

pytorch-bot Bot commented Mar 5, 2025 •

edited

Loading