[Windows][ROCm] Fix missing native header includes causing DLL export… by stsokolo · Pull Request #179138 · pytorch/pytorch

stsokolo · 2026-04-02T11:50:23Z

Three operations — torch._int_mm, torch._grouped_mm, and torch._scaled_mm_v2 — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:

Blas.cpp was missing _int_mm_native.h
GroupedBlas.cpp was missing _grouped_mm_native.h
ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

… failures

pytorch-bot · 2026-04-02T11:50:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179138

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f6b4087 with merge base 02521a0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-04-02T11:50:32Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

slayton58 · 2026-04-02T13:37:25Z

Good catch!

jeffdaily · 2026-04-02T13:47:31Z

@pytorchbot merge

pytorchmergebot · 2026-04-02T13:49:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2026-04-03T22:30:07Z

LGTM, though PR title is a bit misleading

#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: #179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily

pytorch#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: pytorch#179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily

#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: #179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily

pytorch#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: pytorch#179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily

…3160) Cherry pick of pytorch#179138 Fixes: ROCm/TheRock#4086 ROCm/rocm-libraries#5205 ROCm/TheRock#4079 Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

[Windows][ROCm] Fix missing native header includes causing DLL export…

f6b4087

… failures

stsokolo requested review from drisspg and slayton58 as code owners April 2, 2026 11:50

pytorch-bot Bot added the module: rocm AMD GPU support for Pytorch label Apr 2, 2026

pytorchbot added the open source label Apr 2, 2026

stsokolo mentioned this pull request Apr 2, 2026

[Issue]: [hipblaslt] gfx1200: Segfault in hipblasLtMatmul instead of HIPBLAS_STATUS_NOT_SUPPORTED ROCm/rocm-libraries#5205

Closed

tvukovic-amd mentioned this pull request Apr 2, 2026

[ROCm][Windows] Fix FP8 _scaled_mm_v2 crash due to ArrayRef ABI mismatch #178764

Closed

jithunnair-amd added the release notes: rocm mandatorylabel label Apr 2, 2026

jithunnair-amd approved these changes Apr 2, 2026

View reviewed changes

jithunnair-amd added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 2, 2026

tvukovic-amd mentioned this pull request Apr 2, 2026

Fatal access violation (0xC0000005) in _scaled_mm_v2 on ROCm Windows Comfy-Org/comfy-kitchen#32

Closed

0xDELUXA mentioned this pull request Apr 2, 2026

[Issue]: [Windows] Fatal access violation (0xC0000005) in torch.nn.functional.scaled_mm (_scaled_mm_v2) on gfx1200 ROCm/TheRock#4079

Closed

slayton58 added the topic: not user facing topic category label Apr 2, 2026

slayton58 approved these changes Apr 2, 2026

View reviewed changes

tvukovic-amd mentioned this pull request Apr 2, 2026

[Issue]: [Windows] torch._grouped_mm access violation (0xC0000005) in torch_hip.dll ROCm/TheRock#4086

Closed

jeffdaily approved these changes Apr 2, 2026

View reviewed changes

pytorchmergebot added the merging label Apr 2, 2026

pytorchmergebot added the Merged label Apr 2, 2026

pytorchmergebot closed this in 382011c Apr 2, 2026

pytorchmergebot removed the merging label Apr 2, 2026

Apophis3158 mentioned this pull request Apr 6, 2026

[Windows] merge upstream fix access violation (0xC0000005) for _scaled_mm_v2 _grouped_mm _int_mm ROCm/pytorch#3132

Open

tvukovic-amd mentioned this pull request Apr 15, 2026

[DRAFT] Barebones ROCM support Comfy-Org/comfy-aimdo#2

Closed

2 tasks

tvukovic-amd mentioned this pull request Apr 16, 2026

[release/2.11] Fix missing native header includes causing DLL export ROCm/pytorch#3160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows][ROCm] Fix missing native header includes causing DLL export…#179138

[Windows][ROCm] Fix missing native header includes causing DLL export…#179138
stsokolo wants to merge 1 commit intopytorch:mainfrom
stsokolo:windows-rocm-missing-native-header-exports

stsokolo commented Apr 2, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 2, 2026

Uh oh!

slayton58 commented Apr 2, 2026

Uh oh!

jeffdaily commented Apr 2, 2026

Uh oh!

pytorchmergebot commented Apr 2, 2026

Uh oh!

malfet commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

stsokolo commented Apr 2, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179138

✅ No Failures

Uh oh!

pytorch-bot Bot commented Apr 2, 2026

This PR needs a release notes: label

Uh oh!

slayton58 commented Apr 2, 2026

Uh oh!

jeffdaily commented Apr 2, 2026

Uh oh!

pytorchmergebot commented Apr 2, 2026

Merge started

Uh oh!

malfet commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

stsokolo commented Apr 2, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Apr 2, 2026 •

edited

Loading

This PR needs a `release notes:` label