[ROCm] port CK rowwise F8 from fbgemm (#140856) by drisspg · Pull Request #143416 · pytorch/pytorch

drisspg · 2024-12-17T19:46:58Z

Summary:

author @jeffdaily
This ports (copies) FBGEMM's implementation from jwfromm.

https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise

cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu

Reviewed By: atalman

Differential Revision: D66797096

Pulled By: drisspg

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @albanD

Summary: This ports (copies) FBGEMM's implementation from jwfromm. https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu Pull Request resolved: pytorch#140856 Reviewed By: atalman Differential Revision: D66797096 Pulled By: drisspg

pytorch-bot · 2024-12-17T19:47:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143416

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 3 Cancelled Jobs

As of commit 93ea684 with merge base b16f020 ():

NEW FAILURES - The following jobs have failed:

Lint / pr-sanity-checks (gh)
Process completed with exit code 1.
windows-binary-wheel / wheel-py3_11-cuda11_8-upload / upload (gh)
The process '/usr/bin/git' failed with exit code 1
windows-binary-wheel / wheel-py3_12-cuda11_8-upload / upload (gh)
The process '/usr/bin/git' failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

linux-binary-libtorch-cxx11-abi / libtorch-rocm6_1-shared-with-deps-cxx11-abi-build / build (gh)
##[error]The operation was canceled.
linux-s390x-binary-manywheel / manywheel-py3_10-cpu-s390x-build / build (gh)
##[error]The operation was canceled.
linux-s390x-binary-manywheel / manywheel-py3_9-cpu-s390x-upload / upload (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-12-17T19:47:24Z

This pull request was exported from Phabricator. Differential Revision: D66797096

albanD · 2024-12-18T14:43:50Z

Why? We already have fbgemm as a submodule and a dependency?
Why would we copy code from it as well?

jeffdaily · 2024-12-18T17:00:39Z

Why? We already have fbgemm as a submodule and a dependency? Why would we copy code from it as well?

The code we want is not compiled by default from fbgemm under its experimental sources. Though we could change how we build fbgemm as a submodule, there is a cutlass implementation in pytorch today for rowwise f8 gemm but not a CK [ROCm] implementation, but the 'experimental' CK one in fbgemm was stable enough to port to pytorch so that we can have parity.

hongxiayang · 2024-12-19T21:21:32Z

Hi, @drisspg @atalman : Any plan to land this PR? We have other work items depending on it.

albanD · 2024-12-20T14:36:32Z

We have to be more and more careful with dependency management to stop major issues we have around release and packaging.
While I completely understand that copy pasting the piece of code you want and manually modifying it is the most straighforward way to get this done, I don't think it is a good solution for PyTorch mid term.
We should either:

Update the fbgemm repo to be able to pull in what we need properly as a submodule.
Remove the fbgemm submodule altogether if it doesn't work and copy paste the few kernels we care about.

Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons.

hongxiayang · 2024-12-20T17:03:30Z

We have to be more and more careful with dependency management to stop major issues we have around release and packaging. While I completely understand that copy pasting the piece of code you want and manually modifying it is the most straighforward way to get this done, I don't think it is a good solution for PyTorch mid term. We should either:

Update the fbgemm repo to be able to pull in what we need properly as a submodule.

Remove the fbgemm submodule altogether if it doesn't work and copy paste the few kernels we care about.

Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons.

Thanks for the comment. We will evaluate this further.

jeffdaily · 2024-12-20T17:14:32Z

We should either:

Update the fbgemm repo to be able to pull in what we need properly as a submodule.

Remove the fbgemm submodule altogether if it doesn't work and copy paste the few kernels we care about.

Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons.

Third option. Why don't we instead consider this a migration of this CK implementation that is under the experimental portion of fbgemm into pytorch and deprecate the fbgemm experimental? Consider the feature graduated?

jeffdaily · 2025-01-08T20:51:27Z

@albanD any opinion on my third option above?

albanD · 2025-01-08T21:13:27Z

Sure but then I would have quite a few questions from a quick look at the code from the point of view of it living in core:

Is it actually used in this PR?
Is this code tested?
Is this really just about doing template instantiation for every single shape used in a specific llama version? If so what's the impact on binary size, what is the plan for future model releases, why llama and not others?

github-actions · 2025-03-09T21:34:03Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

drisspg requested review from eqy and syed-ahmed as code owners December 17, 2024 19:46

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Dec 17, 2024

facebook-github-bot added the fb-exported label Dec 17, 2024

drisspg added the topic: not user facing topic category label Dec 17, 2024

drisspg mentioned this pull request Dec 17, 2024

[ROCm] port CK rowwise F8 from fbgemm #140856

Closed

drisspg requested a review from atalman December 17, 2024 19:50

atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Dec 17, 2024

drisspg added the skip-pr-sanity-checks label Dec 17, 2024

atalman approved these changes Dec 18, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 18, 2024

jeffdaily mentioned this pull request Jan 8, 2025

[ROCm] hipblaslt rowwise f8 gemm #144432

Closed

tjtanaa mentioned this pull request Jan 16, 2025

[Issue]: lld: error: undefined hidden symbol: unsigned short ck::atomic_add ROCm/composable_kernel#1773

Closed

github-actions bot added the Stale label Mar 9, 2025

github-actions bot closed this Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] port CK rowwise F8 from fbgemm (#140856)#143416

[ROCm] port CK rowwise F8 from fbgemm (#140856)#143416
drisspg wants to merge 1 commit intopytorch:mainfrom
drisspg:export-D66797096

drisspg commented Dec 17, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Dec 17, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 17, 2024

Uh oh!

albanD commented Dec 18, 2024 •

edited

Loading

Uh oh!

jeffdaily commented Dec 18, 2024

Uh oh!

hongxiayang commented Dec 19, 2024

Uh oh!

albanD commented Dec 20, 2024

Uh oh!

hongxiayang commented Dec 20, 2024

Uh oh!

jeffdaily commented Dec 20, 2024

Uh oh!

jeffdaily commented Jan 8, 2025

Uh oh!

albanD commented Jan 8, 2025

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

drisspg commented Dec 17, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143416

❌ 3 New Failures, 3 Cancelled Jobs

Uh oh!

facebook-github-bot commented Dec 17, 2024

Uh oh!

albanD commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffdaily commented Dec 18, 2024

Uh oh!

hongxiayang commented Dec 19, 2024

Uh oh!

albanD commented Dec 20, 2024

Uh oh!

hongxiayang commented Dec 20, 2024

Uh oh!

jeffdaily commented Dec 20, 2024

Uh oh!

jeffdaily commented Jan 8, 2025

Uh oh!

albanD commented Jan 8, 2025

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

drisspg commented Dec 17, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 17, 2024 •

edited

Loading

albanD commented Dec 18, 2024 •

edited

Loading