Skip to content

[reland2][ROCm] preshuffled weight mm#2207

Merged
mxz297 merged 7 commits into
pytorch:mainfrom
jeffdaily:rocm_swizzle_reland2
May 28, 2025
Merged

[reland2][ROCm] preshuffled weight mm#2207
mxz297 merged 7 commits into
pytorch:mainfrom
jeffdaily:rocm_swizzle_reland2

Conversation

@jeffdaily

Copy link
Copy Markdown
Contributor

No description provided.

@pytorch-bot

pytorch-bot Bot commented May 13, 2025

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2207

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit aca48ed with merge base 1017c7e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2025
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot

pytorch-bot Bot commented May 14, 2025

Copy link
Copy Markdown

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot Bot removed the ciflow/rocm label May 14, 2025
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297

mxz297 commented May 14, 2025

Copy link
Copy Markdown

@jeffdaily i am having issues of importing this PR. Can you first try to resolve the build errors?

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297

mxz297 commented May 15, 2025

Copy link
Copy Markdown

@jeffdaily there is a linter failure

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297

mxz297 commented May 15, 2025

Copy link
Copy Markdown

@jeffdaily there is also a failure in rocm test

module = Linear(in_features=32, out_features=128, bias=False)
config = MXFPInferenceConfig(block_size=32, activation_dtype=torch.float4_e2m1fn_x2, weight_dtype=torch.float4_e2m1fn_x2, gemm_kernel_choice=<MXGemmKernelChoice.CUTLASS: 'cutlass'>, set_inductor_config=False)

    @register_quantize_module_handler(MXFPInferenceConfig)
    def _mx_inference_linear_transform(
        module: torch.nn.Module, config: MXFPInferenceConfig
    ):
        # TODO Sm120 has slightly more restrictive reqs
        # TODO handle AMD
>       assert is_sm_at_least_100(), "MXFP is only supported on sm100 machiens for now"
E       AssertionError: MXFP is only supported on sm100 machiens for now

but this looks like the test should even not be run on AMD?

cc @drisspg @atalman @jerryzh168

@pytorch-bot pytorch-bot Bot removed the ciflow/rocm label May 15, 2025
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@drisspg drisspg added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label May 16, 2025
@drisspg

drisspg commented May 16, 2025

Copy link
Copy Markdown
Contributor

@mxz297 yeah this should be skipped, can you rebase past: #2209

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297

mxz297 commented May 16, 2025

Copy link
Copy Markdown

@pytorchbot run all

@pytorch-bot

pytorch-bot Bot commented May 16, 2025

Copy link
Copy Markdown

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'run' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@mxz297

mxz297 commented May 16, 2025

Copy link
Copy Markdown

@pytorchbot drci

@mxz297

mxz297 commented May 16, 2025

Copy link
Copy Markdown

@drisspg @atalman @jerryzh168

Seems to have some CUDA test failures where arch string parsing has some issue. Feels unlikely caused by this PR but want to double check with you folks:

Processing /pytorch/ao
  Preparing metadata (setup.py) ... 25l-� �error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [13 lines of output]
      W0516 16:40:07.414810 215 site-packages/torch/utils/cpp_extension.py:118] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.6'
      W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
      W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 35, in <module>
        File "/pytorch/ao/setup.py", line 544, in <module>
          ext_modules=get_extensions(),
        File "/pytorch/ao/setup.py", line 432, in get_extensions
          cuda_arch_flags = _get_cuda_arch_flags()
        File "/opt/conda/envs/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2434, in _get_cuda_arch_flags
          arch_list[-1] += '+PTX'
      IndexError: list index out of range

@mxz297

mxz297 commented May 16, 2025

Copy link
Copy Markdown

Also a noob question: how should i restart ci or ci is always automatically restarted after a new code commit push?

@drisspg

drisspg commented May 16, 2025

Copy link
Copy Markdown
Contributor

@mxz297 so if you are a meta employee it will automatically restart on commit push but unfortunately for everyone else you will need to manually kick it off

@mxz297

mxz297 commented May 19, 2025

Copy link
Copy Markdown

@drisspg @atalman @jerryzh168

Any insight on the following error?

Processing /pytorch/ao
Preparing metadata (setup.py) ... 25l-� �error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [13 lines of output]
W0516 16:40:07.414810 215 site-packages/torch/utils/cpp_extension.py:118] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.6'
W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Traceback (most recent call last):
File "", line 2, in
File "", line 35, in
File "/pytorch/ao/setup.py", line 544, in
ext_modules=get_extensions(),
File "/pytorch/ao/setup.py", line 432, in get_extensions
cuda_arch_flags = _get_cuda_arch_flags()
File "/opt/conda/envs/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2434, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range

@drisspg

drisspg commented May 19, 2025

Copy link
Copy Markdown
Contributor

Taking a look

@drisspg

drisspg commented May 19, 2025

Copy link
Copy Markdown
Contributor

Okay so this is coming from this line;

>>> from torch.utils.cpp_extension import _get_cuda_arch_flags
>>> _get_cuda_arch_flags()
/Users/drisspg/.conda/envs/nightly/lib/python3.13/site-packages/torch/utils/cpp_extension.py:2410:
UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilat
ion.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    _get_cuda_arch_flags()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/drisspg/.conda/envs/nightly/lib/python3.13/site-packages/torch/utils/cpp_extension.p
y", line 2430, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
    ~~~~~~~~~^^^^
IndexError: list index out of range

When you are calling get_arch_list with no args and the default system arch is not picked up with this logic:

https://github.com/pytorch/pytorch/blob/6487ea30b3fb3fe550d0e8e7feaf25bc3cffb626/torch/utils/cpp_extension.py#L2360

@drisspg

drisspg commented May 22, 2025

Copy link
Copy Markdown
Contributor

@jeffdaily Can you rebase I am still alittle confused by this CI

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297 mxz297 merged commit 63f2e51 into pytorch:main May 28, 2025
36 of 37 checks passed
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
* [reland2][ROCm] preshuffled weight mm

* remove debug print statements

* remove duplicate registrations caused by patch fuzzing

* lint

* ruff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. device: rocm topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants