[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS by malfet · Pull Request #161999 · pytorch/pytorch

malfet · 2025-09-02T19:54:18Z

Stack from ghstack (oldest at bottom):

Followup after #154012

Fixes CPU part of #160841

cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @lezcano

[ghstack-poisoned]

pytorch-bot · 2025-09-02T19:54:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161999

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 3 Pending

As of commit c3df264 with merge base 6737e2c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Followup after #154012 Fixes CPU part of #160841 ghstack-source-id: 966f5fd Pull Request resolved: #161999

pytorchmergebot · 2025-09-03T14:30:41Z

Starting merge as part of PR stack under #162001

Followup after #154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: #162001 Approved by: https://github.com/drisspg ghstack dependencies: #161999

jeanschmidt · 2025-09-04T19:55:06Z

@pytorchbot revert -m "break a few internal tests" -c ghfirst

pytorchmergebot · 2025-09-04T19:56:33Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit b40d943. Reverted #162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](#161999 (comment)))

This reverts commit 02c83f1. Reverted #161999 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](#161999 (comment)))

pytorchmergebot · 2025-09-04T19:56:55Z

@malfet your PR has been successfully reverted.

malfet · 2025-09-04T23:33:30Z

@pytorchbot merge -f "Not sure why it was reverted in the 1st place..."

pytorchmergebot · 2025-09-04T23:35:09Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

This reverts commit b40d943. Reverted pytorch#162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

This reverts commit 02c83f1. Reverted pytorch#161999 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

Followup after pytorch#154012 Fixes CPU part of pytorch#160841 Pull Request resolved: pytorch#161999 Approved by: https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Discovered while debugging pytorch#160841 where sdpa returned NaNs, because during the computation intermediate values were cast back to fp16 before normalization, which was fixed by pytorch#161999 ) Pull Request resolved: pytorch#162401 Approved by: https://github.com/Skylion007, https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Followup after pytorch#154012 Fixes CPU part of pytorch#160841 Pull Request resolved: pytorch#161999 Approved by: https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

This reverts commit b40d943. Reverted pytorch#162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

This reverts commit 02c83f1. Reverted pytorch#161999 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

Followup after pytorch#154012 Fixes CPU part of pytorch#160841 Pull Request resolved: pytorch#161999 Approved by: https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Discovered while debugging pytorch#160841 where sdpa returned NaNs, because during the computation intermediate values were cast back to fp16 before normalization, which was fixed by pytorch#161999 ) Pull Request resolved: pytorch#162401 Approved by: https://github.com/Skylion007, https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

This reverts commit b40d943. Reverted pytorch#162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

This reverts commit 02c83f1. Reverted pytorch#161999 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

Followup after pytorch#154012 Fixes CPU part of pytorch#160841 Pull Request resolved: pytorch#161999 Approved by: https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Discovered while debugging pytorch#160841 where sdpa returned NaNs, because during the computation intermediate values were cast back to fp16 before normalization, which was fixed by pytorch#161999 ) Pull Request resolved: pytorch#162401 Approved by: https://github.com/Skylion007, https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Followup after pytorch#154012 Fixes CPU part of pytorch#160841 Pull Request resolved: pytorch#161999 Approved by: https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

This reverts commit b40d943. Reverted pytorch#162001 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

This reverts commit 02c83f1. Reverted pytorch#161999 on behalf of https://github.com/jeanschmidt due to break a few internal tests ([comment](pytorch#161999 (comment)))

Followup after pytorch#154012 Fixes CPU part of pytorch#160841 Pull Request resolved: pytorch#161999 Approved by: https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Discovered while debugging pytorch#160841 where sdpa returned NaNs, because during the computation intermediate values were cast back to fp16 before normalization, which was fixed by pytorch#161999 ) Pull Request resolved: pytorch#162401 Approved by: https://github.com/Skylion007, https://github.com/drisspg

Followup after pytorch#154012 Since the introduction of `gemm_no_downcast_stub` it's no longer necessary to allocate temporary array and then manually implement the `beta` logic in the codebase Pull Request resolved: pytorch#162001 Approved by: https://github.com/drisspg ghstack dependencies: pytorch#161999

Update

c3df264

[ghstack-poisoned]

malfet added a commit that referenced this pull request Sep 2, 2025

[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS

971b672

Followup after #154012 Fixes CPU part of #160841 ghstack-source-id: 966f5fd Pull Request resolved: #161999

malfet requested review from Skylion007 and drisspg September 2, 2025 19:54

malfet mentioned this pull request Sep 2, 2025

[BE] Cleanup stale comments/copy from gemm #162001

Closed

malfet added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul topic: bug fixes topic category ciflow/trunk Trigger trunk jobs on your pull request labels Sep 2, 2025

malfet mentioned this pull request Sep 2, 2025

[CPUBLas] Add specialization to maintain good perf #162008

Closed

malfet added release notes: linalg_frontend release notes category and removed module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Sep 3, 2025

drisspg approved these changes Sep 3, 2025

View reviewed changes

pytorchmergebot closed this in 02c83f1 Sep 3, 2025

pytorchmergebot added the Merged label Sep 3, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Sep 4, 2025

pytorchmergebot reopened this Sep 4, 2025

pytorchmergebot added the merging label Sep 4, 2025

github-actions bot deleted the gh/malfet/504/head branch October 5, 2025 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS#161999

[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS#161999
malfet wants to merge 1 commit intogh/malfet/504/basefrom
gh/malfet/504/head

malfet commented Sep 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

jeanschmidt commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Uh oh!

malfet commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

malfet commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161999

⏳ No Failures, 3 Pending

Uh oh!

pytorchmergebot commented Sep 3, 2025

Uh oh!

jeanschmidt commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Uh oh!

malfet commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

malfet commented Sep 2, 2025 •

edited

Loading

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading