[Windows][ROCm] Fix missing native header includes causing DLL export…#179138
[Windows][ROCm] Fix missing native header includes causing DLL export…#179138stsokolo wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179138
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f6b4087 with merge base 02521a0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Good catch! |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
LGTM, though PR title is a bit misleading |
#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: #179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
pytorch#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: pytorch#179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
pytorch#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: pytorch#179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: #179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
pytorch#179138) Three operations — `torch._int_mm`,` torch._grouped_mm`, and `torch._scaled_mm_v2` — crash immediately on Windows ROCm builds with an access violation. The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it. Three source files were missing their corresponding headers: - Blas.cpp was missing _int_mm_native.h - GroupedBlas.cpp was missing _grouped_mm_native.h - ScaledBlas.cpp was missing _scaled_mm_v2_native.h Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash. Fixes: - ROCm/TheRock#4086 - ROCm/rocm-libraries#5205 - ROCm/TheRock#4079 The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now. Pull Request resolved: pytorch#179138 Approved by: https://github.com/jithunnair-amd, https://github.com/slayton58, https://github.com/jeffdaily
…3160) Cherry pick of pytorch#179138 Fixes: ROCm/TheRock#4086 ROCm/rocm-libraries#5205 ROCm/TheRock#4079 Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Three operations —
torch._int_mm,torch._grouped_mm, andtorch._scaled_mm_v2— crash immediately on Windows ROCm builds with an access violation.The problem comes down to how Windows DLLs handle function visibility. Windows requires functions to be explicitly marked for export. In PyTorch, this happens through a somewhat indirect mechanism: when a source file includes a function's _native.h header, the compiler sees the TORCH_API declaration, recognizes it's defining that function locally, and automatically exports it.
Three source files were missing their corresponding headers:
Without these includes, the functions were compiled as internal symbols and never exported. Meanwhile, the auto-generated dispatch code tried to call them through import thunks that couldn't resolve — resulting in a jump to address zero and an immediate crash.
Fixes:
torch._grouped_mmaccess violation (0xC0000005) intorch_hip.dllROCm/TheRock#4086HIPBLAS_STATUS_NOT_SUPPORTEDROCm/rocm-libraries#52050xC0000005) intorch.nn.functional.scaled_mm(_scaled_mm_v2) ongfx1200ROCm/TheRock#4079The fix was tested on all reproduction examples mentioned in the issues above, and all of them are passing now.
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang