torch._scaled_mm (works):
OK: torch.Size([16, 16])
torch.nn.functional.scaled_mm (segfaults):
Exception Code: 0xC0000005
#0 0x00007fffa3e70000 (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_hip.dll+0x0)
#1 0x00007fffa64af287 at::cuda::_scaled_mm_v2(class at::Tensor const &, class at::Tensor const &, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class std::optional<class at::Tensor> const &, class std::optional<enum c10::ScalarType>, class c10::ArrayRef<__int64>, bool) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_hip.dll+0x263f287)
#2 0x00007fffa663ad21 at::cuda::_fused_adagrad_(class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<class at::Tensor>, class at::Tensor const &, double, double, double, bool, class std::optional<class at::Tensor> const &, class std::optional<class at::Tensor> const &) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_hip.dll+0x27cad21)
#3 0x00007fffb00f39a5 torch::autograd::autogradNotImplementedFallback(void) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_cpu.dll+0x4d839a5)
#4 0x00007fffac62dcc1 at::_ops::_scaled_mm_v2::call(class at::Tensor const &, class at::Tensor const &, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class std::optional<class at::Tensor> const &, class std::optional<enum c10::ScalarType>, class c10::ArrayRef<__int64>, bool) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_cpu.dll+0x12bdcc1)
#5 0x00007fffac62c63b at::_ops::_scaled_mm_v2::call(class at::Tensor const &, class at::Tensor const &, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class std::optional<class at::Tensor> const &, class std::optional<enum c10::ScalarType>, class c10::ArrayRef<__int64>, bool) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_cpu.dll+0x12bc63b)
#6 0x00007fffa285717a THPPointer<struct _object>::release(void) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_python.dll+0x1f717a)
#7 0x00007ff801cc79e4 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x379e4)
#8 0x00007ff801d12018 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x82018)
#9 0x00007ff801d119c5 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x819c5)
#10 0x00007ff801d12ea5 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x82ea5)
#11 0x00007ff801cc81e4 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x381e4)
#12 0x00007ff801cc6b66 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x36b66)
#13 0x00007ff801d7e838 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0xee838)
#14 0x00007ff801d7ea70 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0xeea70)
#15 0x00007ff801d7ebdf (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0xeebdf)
#16 0x00007ff801e18871 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x188871)
#17 0x00007ff801e187ff (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x1887ff)
#18 0x00007ff801cb8ee2 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x28ee2)
#19 0x00007ff801cb8700 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x28700)
#20 0x00007ff801cb86e3 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x286e3)
#21 0x00007ff69c321230 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python.exe+0x1230)
#22 0x00007ff8c40de8d7 (C:\Windows\System32\KERNEL32.DLL+0x2e8d7)
#23 0x00007ff8c592c48c (C:\Windows\SYSTEM32\ntdll.dll+0x8c48c)
Problem Description
torch.nn.functional.scaled_mmcrashes with a fatal access violation (0xC0000005) intorch_hip.dllatat::cuda::_scaled_mm_v2ongfx1200. The crash is not catchable in Python.Reproducer
Output
Full log
Notes
torch._scaled_mmworks correctly on this GPU. Only_scaled_mm_v2is broken.This was discovered while debugging a ComfyUI crash with FP8 models: Comfy-Org/comfy-kitchen#32
Environment details
Operating System
Windows 11
CPU
Intel Core i5
GPU
AMD Radeon RX 9060 XT
ROCm Version
7.13.0a20260318PyTorch Version
2.12.0a0+rocm7.13.0a20260318