Skip to content

[ROCm] port CK rowwise F8 from fbgemm (#140856)#143416

Closed
drisspg wants to merge 1 commit intopytorch:mainfrom
drisspg:export-D66797096
Closed

[ROCm] port CK rowwise F8 from fbgemm (#140856)#143416
drisspg wants to merge 1 commit intopytorch:mainfrom
drisspg:export-D66797096

Conversation

@drisspg
Copy link
Contributor

@drisspg drisspg commented Dec 17, 2024

Summary:

author @jeffdaily
This ports (copies) FBGEMM's implementation from jwfromm.

https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise

cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu

Reviewed By: atalman

Differential Revision: D66797096

Pulled By: drisspg

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @albanD

Summary:
This ports (copies) FBGEMM's implementation from jwfromm.

https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise

cc sunway513 jithunnair-amd pruthvistony ROCmSupport dllehr-amd jataylo hongxiayang naromero77amd yanbing-j vkuzo albanD kadeng penguinwu

Pull Request resolved: pytorch#140856

Reviewed By: atalman

Differential Revision: D66797096

Pulled By: drisspg
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143416

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 3 Cancelled Jobs

As of commit 93ea684 with merge base b16f020 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Dec 17, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66797096

@drisspg drisspg added the topic: not user facing topic category label Dec 17, 2024
@drisspg drisspg requested a review from atalman December 17, 2024 19:50
@atalman atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Dec 17, 2024
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 18, 2024
@albanD
Copy link
Collaborator

albanD commented Dec 18, 2024

Why? We already have fbgemm as a submodule and a dependency?
Why would we copy code from it as well?

@jeffdaily
Copy link
Collaborator

Why? We already have fbgemm as a submodule and a dependency? Why would we copy code from it as well?

The code we want is not compiled by default from fbgemm under its experimental sources. Though we could change how we build fbgemm as a submodule, there is a cutlass implementation in pytorch today for rowwise f8 gemm but not a CK [ROCm] implementation, but the 'experimental' CK one in fbgemm was stable enough to port to pytorch so that we can have parity.

@hongxiayang
Copy link
Collaborator

Hi, @drisspg @atalman : Any plan to land this PR? We have other work items depending on it.

@albanD
Copy link
Collaborator

albanD commented Dec 20, 2024

We have to be more and more careful with dependency management to stop major issues we have around release and packaging.
While I completely understand that copy pasting the piece of code you want and manually modifying it is the most straighforward way to get this done, I don't think it is a good solution for PyTorch mid term.
We should either:

  • Update the fbgemm repo to be able to pull in what we need properly as a submodule.
  • Remove the fbgemm submodule altogether if it doesn't work and copy paste the few kernels we care about.

Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons.

@hongxiayang
Copy link
Collaborator

We have to be more and more careful with dependency management to stop major issues we have around release and packaging. While I completely understand that copy pasting the piece of code you want and manually modifying it is the most straighforward way to get this done, I don't think it is a good solution for PyTorch mid term. We should either:

  • Update the fbgemm repo to be able to pull in what we need properly as a submodule.
  • Remove the fbgemm submodule altogether if it doesn't work and copy paste the few kernels we care about.

Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons.

Thanks for the comment. We will evaluate this further.

@jeffdaily
Copy link
Collaborator

We should either:

  • Update the fbgemm repo to be able to pull in what we need properly as a submodule.
  • Remove the fbgemm submodule altogether if it doesn't work and copy paste the few kernels we care about.

Having both at the same time sounds like a recipe for disaster down the road both for maintenance and binary conflict reasons.

Third option. Why don't we instead consider this a migration of this CK implementation that is under the experimental portion of fbgemm into pytorch and deprecate the fbgemm experimental? Consider the feature graduated?

@jeffdaily
Copy link
Collaborator

@albanD any opinion on my third option above?

@albanD
Copy link
Collaborator

albanD commented Jan 8, 2025

Sure but then I would have quite a few questions from a quick look at the code from the point of view of it living in core:

  • Is it actually used in this PR?
  • Is this code tested?
  • Is this really just about doing template instantiation for every single shape used in a specific llama version? If so what's the impact on binary size, what is the plan for future model releases, why llama and not others?

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Mar 9, 2025
@github-actions github-actions bot closed this Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request fb-exported module: rocm AMD GPU support for Pytorch skip-pr-sanity-checks Stale topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants