Skip to content

[Issue]: [Windows] Fatal access violation (0xC0000005) in torch.nn.functional.scaled_mm (_scaled_mm_v2) on gfx1200 #4079

@0xDELUXA

Description

@0xDELUXA

Problem Description

torch.nn.functional.scaled_mm crashes with a fatal access violation (0xC0000005) in torch_hip.dll at at::cuda::_scaled_mm_v2 on gfx1200. The crash is not catchable in Python.

Reproducer

import torch
from torch.nn.functional import ScalingType, SwizzleType

a = torch.ones(16, 16, dtype=torch.float8_e4m3fn, device='cuda')
b = torch.ones(16, 16, dtype=torch.float8_e4m3fn, device='cuda').t()
scale = torch.ones((), device='cuda', dtype=torch.float32)

# Works fine
print('torch._scaled_mm (works):')
out = torch._scaled_mm(a, b, scale_a=scale, scale_b=scale, out_dtype=torch.bfloat16)
print('  OK:', out.shape)

# Fatal access violation
print('torch.nn.functional.scaled_mm (segfaults):')
out2 = torch.nn.functional.scaled_mm(a, b, scale_a=scale, scale_recipe_a=ScalingType.TensorWise,
    scale_b=scale, scale_recipe_b=ScalingType.TensorWise,
    swizzle_a=SwizzleType.NO_SWIZZLE, swizzle_b=SwizzleType.NO_SWIZZLE,
    output_dtype=torch.bfloat16)
print('  OK:', out2.shape)

Output

Full log
torch._scaled_mm (works):
  OK: torch.Size([16, 16])
torch.nn.functional.scaled_mm (segfaults):
Exception Code: 0xC0000005
 #0 0x00007fffa3e70000 (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_hip.dll+0x0)
 #1 0x00007fffa64af287 at::cuda::_scaled_mm_v2(class at::Tensor const &, class at::Tensor const &, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class std::optional<class at::Tensor> const &, class std::optional<enum c10::ScalarType>, class c10::ArrayRef<__int64>, bool) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_hip.dll+0x263f287)
 #2 0x00007fffa663ad21 at::cuda::_fused_adagrad_(class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<class at::Tensor>, class at::Tensor const &, double, double, double, bool, class std::optional<class at::Tensor> const &, class std::optional<class at::Tensor> const &) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_hip.dll+0x27cad21)
 #3 0x00007fffb00f39a5 torch::autograd::autogradNotImplementedFallback(void) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_cpu.dll+0x4d839a5)
 #4 0x00007fffac62dcc1 at::_ops::_scaled_mm_v2::call(class at::Tensor const &, class at::Tensor const &, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class std::optional<class at::Tensor> const &, class std::optional<enum c10::ScalarType>, class c10::ArrayRef<__int64>, bool) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_cpu.dll+0x12bdcc1)
 #5 0x00007fffac62c63b at::_ops::_scaled_mm_v2::call(class at::Tensor const &, class at::Tensor const &, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<class at::Tensor>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class std::optional<class at::Tensor> const &, class std::optional<enum c10::ScalarType>, class c10::ArrayRef<__int64>, bool) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_cpu.dll+0x12bc63b)
 #6 0x00007fffa285717a THPPointer<struct _object>::release(void) (C:\ComfyUI\venv\Lib\site-packages\torch\lib\torch_python.dll+0x1f717a)
 #7 0x00007ff801cc79e4 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x379e4)
 #8 0x00007ff801d12018 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x82018)
 #9 0x00007ff801d119c5 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x819c5)
#10 0x00007ff801d12ea5 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x82ea5)
#11 0x00007ff801cc81e4 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x381e4)
#12 0x00007ff801cc6b66 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x36b66)
#13 0x00007ff801d7e838 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0xee838)
#14 0x00007ff801d7ea70 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0xeea70)
#15 0x00007ff801d7ebdf (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0xeebdf)
#16 0x00007ff801e18871 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x188871)
#17 0x00007ff801e187ff (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x1887ff)
#18 0x00007ff801cb8ee2 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x28ee2)
#19 0x00007ff801cb8700 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x28700)
#20 0x00007ff801cb86e3 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python312.dll+0x286e3)
#21 0x00007ff69c321230 (C:\Users\deluxa\AppData\Local\Programs\Python\Python312\python.exe+0x1230)
#22 0x00007ff8c40de8d7 (C:\Windows\System32\KERNEL32.DLL+0x2e8d7)
#23 0x00007ff8c592c48c (C:\Windows\SYSTEM32\ntdll.dll+0x8c48c)

Notes

torch._scaled_mm works correctly on this GPU. Only _scaled_mm_v2 is broken.

This was discovered while debugging a ComfyUI crash with FP8 models: Comfy-Org/comfy-kitchen#32

Environment details

Operating System

Windows 11

CPU

Intel Core i5

GPU

AMD Radeon RX 9060 XT

ROCm Version

7.13.0a20260318

PyTorch Version

2.12.0a0+rocm7.13.0a20260318

Metadata

Metadata

Assignees

Labels

status: triageIndicates an issue has been assigned for investigation.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions