[ROCm] port CK rowwise F8 from fbgemm#140856
[ROCm] port CK rowwise F8 from fbgemm#140856jeffdaily wants to merge 10 commits intopytorch:mainfrom
Conversation
This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140856
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 64cef29 with merge base a7509e9 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
...ernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip
Show resolved
Hide resolved
|
Left a few comments, I think it looks good. Would be good to also note the increase in binary size from these PR |
|
|
||
| x_fp8 = x.to(torch.float8_e4m3fn) | ||
| y_fp8 = y.to(torch.float8_e4m3fn).t() | ||
| x_fp8 = x.to(e4m3_type) |
There was a problem hiding this comment.
I saw that the code to automatically set this was added in #117822 - IMO in a separate PR we should change these dtypes to be set explicitly by the user / testing framework, to follow the convention used elsewhere in similar files and make it crystal clear which dtypes are being tested where.
There was a problem hiding this comment.
I disagree but am open to being convinced otherwise. For F8 types there are effectively two sets (e4m3fnuz/e5m2fnuz and e4m3fn/e5m2) and the abstraction introduced in #117822 is useful to avoid much copy/paste code, or decorating many unit tests with allowed types. I like the current solution because it is compact. But developer education was missing; new unit tests that work on CUDA and do not use the abstracted types will not work on ROCm.
Binary size increased by 5.4MB. |
|
Before landing need to verify this still builds okay for gfx1100 etc. |
|
@pytorchmergebot revert -c ghfirst -m "Failing internal build" |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 291626f. Reverted #140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](#140856 (comment)))
|
@jeffdaily your PR has been successfully reverted. |
|
@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise Pull Request resolved: pytorch#140856 Approved by: https://github.com/drisspg, https://github.com/atalman
This reverts commit 291626f. Reverted pytorch#140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](pytorch#140856 (comment)))
This ports (copies) FBGEMM's implementation from @jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise Pull Request resolved: pytorch#140856 Approved by: https://github.com/drisspg, https://github.com/atalman
This reverts commit 291626f. Reverted pytorch#140856 on behalf of https://github.com/atalman due to Failing internal build ([comment](pytorch#140856 (comment)))
|
So I think the problem with the PR as stands is this line: https://github.com/pytorch/pytorch/pull/140856/files#diff-19b256efe989af74ad429ef2a1eb6e075784aa18aea04c7d36bb0e790e9a8170R19 Including of torch.h is typically done for external c++ extensions and messes w/ some internal build systems. Proper fix would be to refine which headers are needed for building cc @jeffdaily |
|
@drisspg removing the #include of torch.h had no effect on the cmake build. Would you be able to check the internal build? |
|
@jeffdaily Yeah will do |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/12304294940 |
|
@jeffdaily could you help rebase and then I can import and land internally |
Done. |
|
@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Any status update? |
|
@jeffdaily Trying to get a amd gpu to test, there are some issues but I wanna see if I can patch them internally |
Summary: This ports (copies) FBGEMM's implementation from jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu Pull Request resolved: pytorch#140856 Reviewed By: atalman Differential Revision: D66797096 Pulled By: drisspg
|
Had to unlink and re-export: #143416 |
|
Closing in favor of #143416. |
This ports (copies) FBGEMM's implementation from @jwfromm.
https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise
cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @yanbing-j @vkuzo @albanD @kadeng @penguinwu