Add gemm for fp32_a_int8_b matmul kernel by kimishpatel · Pull Request #2039 · pytorch/ao

kimishpatel · 2025-04-10T19:48:40Z

Summary: Using gemmv for prefill is extremely slow. As it turns out, shown later in the stack, dequantizing v matrix is still better because at 3k context prefill we are heavily bound by compute

Reviewed By: metascroy

Differential Revision: D71833070

Summary: Using gemmv for prefill is extremely slow. As it turns out, shown later in the stack, dequantizing v matrix is still better because at 3k context prefill we are heavily bound by compute Reviewed By: metascroy Differential Revision: D71833070

pytorch-bot · 2025-04-10T19:48:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2039

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 2e476fa with merge base a3b857f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-10T19:48:52Z

This pull request was exported from Phabricator. Differential Revision: D71833070

Differential Revision: D71833070 Pull Request resolved: #2039

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2025

facebook-github-bot added the fb-exported label Apr 10, 2025

kimishpatel requested a review from metascroy April 10, 2025 19:49

kimishpatel added topic: new feature Use this tag if this PR adds a new feature topic: performance Use this tag if this PR improves the performance of a feature labels Apr 10, 2025

guangy10 approved these changes Apr 10, 2025

View reviewed changes

facebook-github-bot merged commit 5cb1fa1 into pytorch:main Apr 10, 2025

liangel-02 pushed a commit that referenced this pull request Aug 25, 2025

Add gemm for fp32_a_int8_b matmul kernel

fe9b051

Differential Revision: D71833070 Pull Request resolved: #2039

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gemm for fp32_a_int8_b matmul kernel#2039

Add gemm for fp32_a_int8_b matmul kernel#2039
facebook-github-bot merged 1 commit into
pytorch:mainfrom
kimishpatel:export-D71833070

kimishpatel commented Apr 10, 2025

Uh oh!

pytorch-bot Bot commented Apr 10, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kimishpatel commented Apr 10, 2025

Uh oh!

pytorch-bot Bot commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2039

⏳ No Failures, 2 Pending

Uh oh!

facebook-github-bot commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented Apr 10, 2025 •

edited

Loading