Problem Description
Calling hipblasLtMatmul on a gfx1200 GPU (AMD Radeon RX 9060 XT) causes a hard segfault (Exception Code: 0xC0000005) inside torch_hip.dll at at::cuda::_int_mm. The library does not return HIPBLAS_STATUS_NOT_SUPPORTED or any other graceful error - it crashes the process entirely, bypassing PyTorch's cublas fallback path at HIPBlas.cpp:602.
Operating System
Windows 11
CPU
Intel Core i5
GPU
AMD Radeon RX 9060 XT
ROCm Version
7.12.0a20260304
ROCm Component
hipblaslt
PyTorch Version
2.12.0a0+rocm7.12.0a20260304
Steps to Reproduce
import torch
print(torch.cuda.get_device_name(0)) # AMD Radeon RX 9060 XT
print(torch.__version__)
a = torch.randint(-128, 127, (1024, 1024), device="cuda", dtype=torch.int8)
b = torch.randint(-128, 127, (1024, 1024), device="cuda", dtype=torch.int8)
torch._int_mm(a, b) # segfaults with 0xC0000005
Additional Information
Crash output:
Exception Code: 0xC0000005
#1 at::cuda::_int_mm() (torch_hip.dll+0x2646a6d)
#3 torch::autograd::autogradNotImplementedFallback (torch_cpu.dll+0x4d42d55)
#5 at::_ops::_int_mm::call() (torch_cpu.dll+0x125c9f6)
Expected Behavior
hipblasLt should return HIPBLAS_STATUS_NOT_SUPPORTED for unsupported architectures, allowing the caller to fall back gracefully to cublas.
Problem Description
Calling
hipblasLtMatmulon a gfx1200 GPU (AMD Radeon RX 9060 XT) causes a hard segfault (Exception Code: 0xC0000005) insidetorch_hip.dllatat::cuda::_int_mm. The library does not returnHIPBLAS_STATUS_NOT_SUPPORTEDor any other graceful error - it crashes the process entirely, bypassing PyTorch's cublas fallback path atHIPBlas.cpp:602.Operating System
Windows 11
CPU
Intel Core i5
GPU
AMD Radeon RX 9060 XT
ROCm Version
7.12.0a20260304ROCm Component
hipblaslt
PyTorch Version
2.12.0a0+rocm7.12.0a20260304Steps to Reproduce
Additional Information
Crash output:
Expected Behavior
hipblasLt should return
HIPBLAS_STATUS_NOT_SUPPORTEDfor unsupported architectures, allowing the caller to fall back gracefully to cublas.