Skip to content

[Windows][ROCm] Fix missing native header includes causing DLL export…#179138

Closed
stsokolo wants to merge 1 commit intopytorch:mainfrom
stsokolo:windows-rocm-missing-native-header-exports
Closed

[Windows][ROCm] Fix missing native header includes causing DLL export…#179138
stsokolo wants to merge 1 commit intopytorch:mainfrom
stsokolo:windows-rocm-missing-native-header-exports

Conversation

@stsokolo
Copy link
Copy Markdown
Contributor

@stsokolo stsokolo commented Apr 2, 2026

Three operations — torch._int_mm, torch._grouped_mm, and torch._scaled_mm_v2 — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:

  • Blas.cpp was missing _int_mm_native.h
  • GroupedBlas.cpp was missing _grouped_mm_native.h
  • ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179138

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f6b4087 with merge base 02521a0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the module: rocm AMD GPU support for Pytorch label Apr 2, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 2, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@slayton58
Copy link
Copy Markdown
Contributor

Good catch!

@jeffdaily
Copy link
Copy Markdown
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Apr 3, 2026

LGTM, though PR title is a bit misleading

weifengpy pushed a commit that referenced this pull request Apr 7, 2026
#179138)

Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:
- Blas.cpp was missing _int_mm_native.h
- GroupedBlas.cpp was missing _grouped_mm_native.h
- ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:
- ROCm/TheRock#4086
- ROCm/rocm-libraries#5205
- ROCm/TheRock#4079

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

Pull Request resolved: #179138
Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
nklshy-aws pushed a commit to nklshy-aws/pytorch that referenced this pull request Apr 7, 2026
pytorch#179138)

Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:
- Blas.cpp was missing _int_mm_native.h
- GroupedBlas.cpp was missing _grouped_mm_native.h
- ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:
- ROCm/TheRock#4086
- ROCm/rocm-libraries#5205
- ROCm/TheRock#4079

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

Pull Request resolved: pytorch#179138
Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
bobrenjc93 pushed a commit to bobrenjc93/pytorch that referenced this pull request Apr 9, 2026
pytorch#179138)

Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:
- Blas.cpp was missing _int_mm_native.h
- GroupedBlas.cpp was missing _grouped_mm_native.h
- ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:
- ROCm/TheRock#4086
- ROCm/rocm-libraries#5205
- ROCm/TheRock#4079

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

Pull Request resolved: pytorch#179138
Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
pytorch-bot Bot pushed a commit that referenced this pull request Apr 10, 2026
#179138)

Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:
- Blas.cpp was missing _int_mm_native.h
- GroupedBlas.cpp was missing _grouped_mm_native.h
- ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:
- ROCm/TheRock#4086
- ROCm/rocm-libraries#5205
- ROCm/TheRock#4079

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

Pull Request resolved: #179138
Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
tvukovic-amd pushed a commit to ROCm/pytorch that referenced this pull request Apr 16, 2026
pytorch#179138)

Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation.
The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.

Three source files were missing their corresponding headers:
- Blas.cpp was missing _int_mm_native.h
- GroupedBlas.cpp was missing _grouped_mm_native.h
- ScaledBlas.cpp was missing _scaled_mm_v2_native.h

Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.

Fixes:
- ROCm/TheRock#4086
- ROCm/rocm-libraries#5205
- ROCm/TheRock#4079

The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.

Pull Request resolved: pytorch#179138
Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source release notes: rocm mandatorylabel topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants