[ROCm] port CK rowwise F8 from fbgemm (#140856)#143416
[ROCm] port CK rowwise F8 from fbgemm (#140856)#143416drisspg wants to merge 1 commit intopytorch:mainfrom
Conversation
Summary: This ports (copies) FBGEMM's implementation from jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu Pull Request resolved: pytorch#140856 Reviewed By: atalman Differential Revision: D66797096 Pulled By: drisspg
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143416
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 3 Cancelled JobsAs of commit 93ea684 with merge base b16f020 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D66797096 |
|
Why? We already have fbgemm as a submodule and a dependency? |
The code we want is not compiled by default from fbgemm under its experimental sources. Though we could change how we build fbgemm as a submodule, there is a cutlass implementation in pytorch today for rowwise f8 gemm but not a CK [ROCm] implementation, but the 'experimental' CK one in fbgemm was stable enough to port to pytorch so that we can have parity. |
|
We have to be more and more careful with dependency management to stop major issues we have around release and packaging.
Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons. |
Thanks for the comment. We will evaluate this further. |
|
We should either:
Third option. Why don't we instead consider this a migration of this CK implementation that is under the experimental portion of fbgemm into pytorch and deprecate the fbgemm experimental? Consider the feature graduated? |
|
@albanD any opinion on my third option above? |
|
Sure but then I would have quite a few questions from a quick look at the code from the point of view of it living in core:
|
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary:
author @jeffdaily
This ports (copies) FBGEMM's implementation from jwfromm.
https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise
cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu
Reviewed By: atalman
Differential Revision: D66797096
Pulled By: drisspg
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @albanD