[ATen][CUDA] Add sm_121a flag for RowwiseScaledMM by Aidyn-A · Pull Request #167734 · pytorch/pytorch

Aidyn-A · 2025-11-13T16:40:06Z

This PR add a sm_121a flag for row-wise scaled matmuls on DGX Spark.

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv

pytorch-bot · 2025-11-13T16:40:11Z

The label module: cuda is only applicable to issues and has been removed. Please only use this label on issues.

pytorch-bot · 2025-11-13T16:40:12Z

The label module: floatx (formerly float8) is only applicable to issues and has been removed. Please only use this label on issues.

pytorch-bot · 2025-11-13T16:40:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167734

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 10 Unrelated Failures

As of commit c162c57 with merge base 2c846bb ():

NEW FAILURES - The following jobs have failed:

linux-binary-libtorch / libtorch-cpu-shared-with-deps-release-test / test (gh)
/tmp/libtorch/include/ATen/core/TensorBase.h:12:2: error: #error "TensorBase.h should not be included when TORCH_STABLE_ONLY compile flag is passed"
linux-binary-libtorch / libtorch-cuda12_6-shared-with-deps-release-test / test (gh)
/tmp/libtorch/include/ATen/core/TensorBase.h:12:2: error: #error "TensorBase.h should not be included when TORCH_STABLE_ONLY compile flag is passed"
linux-binary-libtorch / libtorch-cuda12_8-shared-with-deps-release-test / test (gh)
/tmp/libtorch/include/ATen/core/TensorBase.h:12:2: error: #error "TensorBase.h should not be included when TORCH_STABLE_ONLY compile flag is passed"
linux-binary-libtorch / libtorch-cuda12_9-shared-with-deps-release-test / test (gh)
/tmp/libtorch/include/ATen/core/TensorBase.h:12:2: error: #error "TensorBase.h should not be included when TORCH_STABLE_ONLY compile flag is passed"
linux-binary-libtorch / libtorch-cuda13_0-shared-with-deps-release-test / test (gh)
/tmp/libtorch/include/ATen/core/TensorBase.h:12:2: error: #error "TensorBase.h should not be included when TORCH_STABLE_ONLY compile flag is passed"
linux-binary-libtorch / libtorch-rocm7_0-shared-with-deps-release-test (gh)
/tmp/libtorch/include/ATen/core/TensorBase.h:12:2: error: #error "TensorBase.h should not be included when TORCH_STABLE_ONLY compile flag is passed"
linux-binary-libtorch / libtorch-rocm7_1-shared-with-deps-release-build / build (gh)
ninja: build stopped: subcommand failed

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

linux-binary-manywheel / manywheel-py3_10-rocm7_0-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_11-rocm7_1-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_13t-rocm7_0-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_13t-rocm7_1-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_14-rocm7_0-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_14-rocm7_1-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_14t-rocm7_0-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
linux-binary-manywheel / manywheel-py3_14t-rocm7_1-test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/var/home/pytorchci/actions-runner/_work/pytorch/pytorch/.github/actions/setup-rocm'. Did you forget to run actions/checkout before running your local action?
trunk / win-vs2022-cpu-py3 / test (default, 1, 4, windows.4xlarge.nonephemeral) (gh) (similar failure)
[ FAILED ] PyTorchStreamWriterAndReader.LoadWithMultiThreads

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge, unstable) (gh) (#166072)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-11-13T18:55:09Z

They need their own SM arch? ;-;

Aidyn-A · 2025-11-14T08:35:35Z

They need their own SM arch? ;-;

Yes, the arch specific instructions are not compatible between 12.0 and 12.1.

Aidyn-A · 2025-11-14T08:36:22Z

The executorch test failures are unrelated.

@pytorchbot merge -i

pytorchmergebot · 2025-11-14T08:38:21Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Aidyn-A · 2025-11-17T07:53:49Z

@pytorchbot revert -m "fails on CUDA 12.8"

pytorch-bot · 2025-11-17T07:53:52Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}

Try @pytorchbot --help for more info.

Aidyn-A · 2025-11-17T07:54:34Z

@pytorchbot revert -m "fails on CUDA 12.8" -c nosignal

pytorchmergebot · 2025-11-17T07:56:26Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 226850c. Reverted #167734 on behalf of https://github.com/Aidyn-A due to fails on CUDA 12.8 ([comment](#167734 (comment)))

pytorchmergebot · 2025-11-17T07:56:31Z

@Aidyn-A your PR has been successfully reverted.

tinglvv · 2025-11-17T08:42:01Z

Thanks for reverting, this should restore the nightly wheel from the below error. we should add the ciflow/binaries label next time..
nvcc fatal : Unsupported gpu architecture 'compute_121a'
cc @atalman , could you help restart the https://github.com/pytorch/pytorch/actions/runs/19421948119/job/55560842460 when you get a chance, I was not able to restart the 12.8 build

atalman · 2025-11-17T14:19:48Z

HI @tinglvv and @Aidyn-A thank you for revert. We will wait for tomorrow build to confirm.

Looks like it caused all CUDA 12.8 builds to fail, Linux x86, aarch64 and Windows x86

Aidyn-A · 2025-11-18T08:06:06Z

Yeah, those failures are certainly not related to the sm_121a flag.

@pytorchbot merge -i

pytorchmergebot · 2025-11-18T08:08:23Z

Merge started

Your change will be merged while ignoring the following 17 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge, unstable), trunk / win-vs2022-cpu-py3 / test (default, 1, 4, windows.4xlarge.nonephemeral), linux-binary-manywheel / manywheel-py3_10-rocm7_0-test, linux-binary-manywheel / manywheel-py3_13t-rocm7_0-test, linux-binary-manywheel / manywheel-py3_14-rocm7_0-test, linux-binary-manywheel / manywheel-py3_14t-rocm7_0-test, linux-binary-manywheel / manywheel-py3_11-rocm7_1-test, linux-binary-manywheel / manywheel-py3_14-rocm7_1-test, linux-binary-manywheel / manywheel-py3_13t-rocm7_1-test, linux-binary-manywheel / manywheel-py3_14t-rocm7_1-test, linux-binary-libtorch / libtorch-rocm7_1-shared-with-deps-release-build / build, linux-binary-libtorch / libtorch-cpu-shared-with-deps-release-test / test, linux-binary-libtorch / libtorch-cuda12_6-shared-with-deps-release-test / test, linux-binary-libtorch / libtorch-cuda13_0-shared-with-deps-release-test / test, linux-binary-libtorch / libtorch-cuda12_8-shared-with-deps-release-test / test, linux-binary-libtorch / libtorch-cuda12_9-shared-with-deps-release-test / test, linux-binary-libtorch / libtorch-rocm7_0-shared-with-deps-release-test

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR add a sm_121a flag for row-wise scaled matmuls on DGX Spark. Pull Request resolved: pytorch#167734 Approved by: https://github.com/eqy, https://github.com/cyyever

…7734)" This reverts commit 226850c. Reverted pytorch#167734 on behalf of https://github.com/Aidyn-A due to fails on CUDA 12.8 ([comment](pytorch#167734 (comment)))

add sm_121a for RowwiseScaledMM

da69173

Aidyn-A requested review from cyyever and eqy November 13, 2025 16:40

Aidyn-A self-assigned this Nov 13, 2025

Aidyn-A added the module: cuda Related to torch.cuda, and CUDA support in general label Nov 13, 2025

Aidyn-A added this to PyTorch + CUDA Nov 13, 2025

Aidyn-A added topic: not user facing topic category module: floatx (formerly float8) For torch.float8_e5m2 and torch.float8_e4m3 and other sub 8-bit float types labels Nov 13, 2025

pytorch-bot bot removed module: cuda Related to torch.cuda, and CUDA support in general module: floatx (formerly float8) For torch.float8_e5m2 and torch.float8_e4m3 and other sub 8-bit float types labels Nov 13, 2025

pytorchbot added the open source label Nov 13, 2025

eqy approved these changes Nov 13, 2025

View reviewed changes

Aidyn-A added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 13, 2025

cyyever approved these changes Nov 14, 2025

View reviewed changes

pytorchmergebot added the merging label Nov 14, 2025

pytorchmergebot closed this in 226850c Nov 14, 2025

pytorchmergebot added the Merged label Nov 14, 2025

github-project-automation bot moved this to Done in PyTorch + CUDA Nov 14, 2025

pytorchmergebot removed the merging label Nov 14, 2025

tinglvv mentioned this pull request Nov 14, 2025

[CD] Add libopenblas to dep list for AArch64+CPU whl #167841

Closed

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 17, 2025

pytorchmergebot reopened this Nov 17, 2025

gate against CUDA version

c162c57

tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Nov 17, 2025

pytorchmergebot added the merging label Nov 18, 2025

pytorchmergebot closed this in 2f023bf Nov 18, 2025

pytorchmergebot removed the merging label Nov 18, 2025

Conversation

Aidyn-A commented Nov 13, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 13, 2025

Uh oh!

pytorch-bot bot commented Nov 13, 2025

Uh oh!

pytorch-bot bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167734

❌ 7 New Failures, 10 Unrelated Failures

Uh oh!

Skylion007 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aidyn-A commented Nov 14, 2025

Uh oh!

Aidyn-A commented Nov 14, 2025

Uh oh!

pytorchmergebot commented Nov 14, 2025

Merge started

Uh oh!

Aidyn-A commented Nov 17, 2025

Uh oh!

pytorch-bot bot commented Nov 17, 2025

Uh oh!

Aidyn-A commented Nov 17, 2025

Uh oh!

pytorchmergebot commented Nov 17, 2025

Uh oh!

pytorchmergebot commented Nov 17, 2025

Uh oh!

tinglvv commented Nov 17, 2025

Uh oh!

atalman commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aidyn-A commented Nov 18, 2025

Uh oh!

pytorchmergebot commented Nov 18, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Aidyn-A commented Nov 13, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 13, 2025 •

edited

Loading

Skylion007 commented Nov 13, 2025 •

edited

Loading

atalman commented Nov 17, 2025 •

edited

Loading