[ROCm] port CK rowwise F8 from fbgemm by jeffdaily · Pull Request #140856 · pytorch/pytorch

jeffdaily · 2024-11-15T23:35:17Z

This ports (copies) FBGEMM's implementation from @jwfromm.

https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @yanbing-j @vkuzo @albanD @kadeng @penguinwu

@jwfromm

This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise

pytorch-bot · 2024-11-15T23:35:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140856

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 64cef29 with merge base a7509e9 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda12.4-py3.10-gcc9-sm89 / test (default, 5, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
inductor/test_cooperative_reductions.py::TestFixedConfigs::test_fixed_configs_persistent_False_cooperative_False_cfg0
rocm / linux-focal-rocm6.2-py3.10 / test (default, 3, 6, linux.rocm.gpu.2) (gh) (trunk failure)
inductor/test_cooperative_reductions.py::TestFixedConfigs::test_fixed_configs_persistent_False_cooperative_False_cfg0

This comment was automatically generated by Dr. CI and updates every 15 minutes.

...ernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip

drisspg · 2024-11-16T01:40:17Z

Left a few comments, I think it looks good. Would be good to also note the increase in binary size from these PR

vkuzo · 2024-11-18T19:24:10Z

test/test_matmul_cuda.py


-        x_fp8 = x.to(torch.float8_e4m3fn)
-        y_fp8 = y.to(torch.float8_e4m3fn).t()
+        x_fp8 = x.to(e4m3_type)


I saw that the code to automatically set this was added in #117822 - IMO in a separate PR we should change these dtypes to be set explicitly by the user / testing framework, to follow the convention used elsewhere in similar files and make it crystal clear which dtypes are being tested where.

I disagree but am open to being convinced otherwise. For F8 types there are effectively two sets (e4m3fnuz/e5m2fnuz and e4m3fn/e5m2) and the abstraction introduced in #117822 is useful to avoid much copy/paste code, or decorating many unit tests with allowed types. I like the current solution because it is compact. But developer education was missing; new unit tests that work on CUDA and do not use the abstracted types will not work on ROCm.

jeffdaily · 2024-11-18T19:38:55Z

Left a few comments, I think it looks good. Would be good to also note the increase in binary size from these PR

Binary size increased by 5.4MB.

jeffdaily · 2024-11-21T00:15:49Z

Before landing need to verify this still builds okay for gfx1100 etc.

aten/src/ATen/native/hip/ck_kernels/fp8_rowwise_kernel_manifest.h

atalman · 2024-12-05T01:50:01Z

@pytorchmergebot revert -c ghfirst -m "Failing internal build"

pytorchmergebot · 2024-12-05T01:51:33Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 291626f. Reverted #140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](#140856 (comment)))

pytorchmergebot · 2024-12-05T01:51:43Z

@jeffdaily your PR has been successfully reverted.

facebook-github-bot · 2024-12-05T01:52:58Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jwfromm

This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise Pull Request resolved: pytorch#140856 Approved by: https://github.com/drisspg, https://github.com/atalman

This reverts commit 291626f. Reverted pytorch#140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](pytorch#140856 (comment)))

@jwfromm

This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise Pull Request resolved: pytorch#140856 Approved by: https://github.com/drisspg, https://github.com/atalman

This reverts commit 291626f. Reverted pytorch#140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](pytorch#140856 (comment)))

drisspg · 2024-12-10T17:57:52Z

So I think the problem with the PR as stands is this line: https://github.com/pytorch/pytorch/pull/140856/files#diff-19b256efe989af74ad429ef2a1eb6e075784aa18aea04c7d36bb0e790e9a8170R19

Including of torch.h is typically done for external c++ extensions and messes w/ some internal build systems. Proper fix would be to refine which headers are needed for building

cc @jeffdaily

jeffdaily · 2024-12-11T00:00:57Z

@drisspg removing the #include of torch.h had no effect on the cmake build. Would you be able to check the internal build?

drisspg · 2024-12-11T00:21:31Z

@jeffdaily Yeah will do

drisspg · 2024-12-12T20:31:22Z

@pytorchbot rebase

pytorchmergebot · 2024-12-12T20:32:54Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-12-12T20:32:56Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/140856/head returned non-zero exit code 1

Rebasing (1/7)
Auto-merging aten/src/ATen/CMakeLists.txt
CONFLICT (content): Merge conflict in aten/src/ATen/CMakeLists.txt
Auto-merging aten/src/ATen/native/cuda/Blas.cpp
Auto-merging test/test_matmul_cuda.py
CONFLICT (content): Merge conflict in test/test_matmul_cuda.py
error: could not apply 1276c3dce54... [ROCm] CK f8 rowwise gemm
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 1276c3dce54... [ROCm] CK f8 rowwise gemm

Raised by https://github.com/pytorch/pytorch/actions/runs/12304294940

drisspg · 2024-12-12T20:35:47Z

@jeffdaily could you help rebase and then I can import and land internally

jeffdaily · 2024-12-12T21:16:07Z

@jeffdaily could you help rebase and then I can import and land internally

Done.

facebook-github-bot · 2024-12-12T21:49:15Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jeffdaily · 2024-12-16T18:03:07Z

@jeffdaily could you help rebase and then I can import and land internally

Any status update?

drisspg · 2024-12-16T21:00:38Z

@jeffdaily Trying to get a amd gpu to test, there are some issues but I wanna see if I can patch them internally

Summary: This ports (copies) FBGEMM's implementation from jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu Pull Request resolved: pytorch#140856 Reviewed By: atalman Differential Revision: D66797096 Pulled By: drisspg

drisspg · 2024-12-17T19:49:54Z

Had to unlink and re-export: #143416

jeffdaily · 2025-01-02T16:21:54Z

Closing in favor of #143416.

jeffdaily added 2 commits November 15, 2024 00:33

[ROCm] CK f8 rowwise gemm

1276c3d

This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise

heuristic selector, move small K to top

820e2aa

jeffdaily added module: rocm AMD GPU support for Pytorch rocm This tag is for PRs from ROCm team ciflow/rocm Trigger "default" config CI on ROCm labels Nov 15, 2024

jeffdaily requested a review from jwfromm November 15, 2024 23:35

jeffdaily requested review from eqy and syed-ahmed as code owners November 15, 2024 23:35

jeffdaily changed the title ~~Ck rowwise f8 fbgemm~~ [ROCm] port CK rowwise F8 from fbgemm Nov 15, 2024

pytorchbot added the open source label Nov 15, 2024

cpuhrsch requested a review from drisspg November 16, 2024 00:38

drisspg added release notes: nn release notes category module: floatx (formerly float8) For torch.float8_e5m2 and torch.float8_e4m3 and other sub 8-bit float types skip-pr-sanity-checks labels Nov 16, 2024

drisspg reviewed Nov 16, 2024

View reviewed changes

...ernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip Show resolved Hide resolved

jeffdaily added 2 commits November 18, 2024 18:35

lint

b16952b

use full path to fp8_rowwise_common.h header, not quotes

bf80247

vkuzo reviewed Nov 18, 2024

View reviewed changes

lint

9fba837

drisspg added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 18, 2024

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 19, 2024

jeffdaily mentioned this pull request Nov 20, 2024

[ROCm] hipblaslt rowwise f8 gemm #137271

Closed

drisspg reviewed Nov 21, 2024

View reviewed changes

aten/src/ATen/native/hip/ck_kernels/fp8_rowwise_kernel_manifest.h Show resolved Hide resolved

jeffdaily mentioned this pull request Nov 21, 2024

[ROCm] torchao.float8 should work properly on ROCm pytorch/ao#1066

Closed

wrap new functions in namespace

4e9e37f

jeffdaily requested a review from drisspg November 22, 2024 20:45

pytorchmergebot added a commit that referenced this pull request Dec 5, 2024

Revert "[ROCm] port CK rowwise F8 from fbgemm (#140856)"

dbd7b82

This reverts commit 291626f. Reverted #140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](#140856 (comment)))

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Dec 5, 2024

pytorchmergebot reopened this Dec 5, 2024

jeffdaily added 2 commits December 10, 2024 23:59

Merge branch 'main' into ck_rowwise_f8_fbgemm

79305b5

remove include of torch.h

3f13d43

Merge branch 'main' into ck_rowwise_f8_fbgemm

64cef29

jeffdaily closed this Jan 2, 2025

Conversation

jeffdaily commented Nov 15, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140856

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

drisspg commented Nov 16, 2024

Uh oh!

vkuzo Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

jeffdaily Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

jeffdaily commented Nov 18, 2024

Uh oh!

jeffdaily commented Nov 21, 2024

Uh oh!

Uh oh!

atalman commented Dec 5, 2024

Uh oh!

pytorchmergebot commented Dec 5, 2024

Uh oh!

pytorchmergebot commented Dec 5, 2024

Uh oh!

facebook-github-bot commented Dec 5, 2024

Uh oh!

drisspg commented Dec 10, 2024

Uh oh!

jeffdaily commented Dec 11, 2024

Uh oh!

drisspg commented Dec 11, 2024

Uh oh!

drisspg commented Dec 12, 2024

Uh oh!

pytorchmergebot commented Dec 12, 2024

Uh oh!

pytorchmergebot commented Dec 12, 2024

Uh oh!

drisspg commented Dec 12, 2024

Uh oh!

jeffdaily commented Dec 12, 2024

Uh oh!

facebook-github-bot commented Dec 12, 2024

Uh oh!

jeffdaily commented Dec 16, 2024

Uh oh!

drisspg commented Dec 16, 2024

Uh oh!

drisspg commented Dec 17, 2024

Uh oh!

jeffdaily commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

jeffdaily commented Nov 15, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 15, 2024 •

edited

Loading