[Issue]: [Windows] `torch._grouped_mm` access violation (0xC0000005) in `torch_hip.dll`

### Problem Description


Calling `torch._grouped_mm` on a CUDA tensor crashes the Python process with a fatal access violation (`0xC0000005`) inside `torch_hip.dll`. The observed stack trace implicates `JitDecompRegisterer` in `torch_cpu.dll` dispatching into `at::cuda::_grouped_mm`, which then faults inside `torch_hip.dll`. `_fused_adagrad_` also appears on the same crashing stack frame, though whether it is a co-trigger or incidental is unclear.

> **Theory:** `JitDecompRegisterer`'s constructor dispatches `_grouped_mm` through the CUDA backend before any Python-level code runs, and the HIP kernel behind this op faults on Windows. This is inferred from the stack trace, not confirmed from source.

The crash is reproducible with torch + ROCm SDK only — no other packages required.

### Operating System

Windows 11 Pro for Workstations (10.0.26200)

### CPU

AMD Ryzen 9 5950X

### GPU

AMD Radeon RX 9070 XT (gfx1201)

### ROCm Version

7.13.0a20260318 (nightly; `torch.version.hip` reports `7.2.0` — HIP runtime version, differs from ROCm SDK version in wheel filename)

### ROCm Component

_No response_

### Steps to Reproduce



All packages sourced from the TheRock nightly index at `rocm.nightlies.amd.com`.

`pyproject.toml`:

```toml
[project]
name = "repro-grouped-mm"
version = "0.1.0"
requires-python = "==3.12.*"
dependencies = [
    "torch",
    "rocm",
    "rocm-sdk-core",
    "rocm-sdk-libraries-gfx120x-all",
    "typing_extensions", "filelock", "jinja2", "networkx", "sympy", "fsspec",
]

[tool.uv.sources]
torch                          = { url = "https://rocm.nightlies.amd.com/v2/gfx120X-all/torch-2.10.0%2Brocm7.13.0a20260318-cp312-cp312-win_amd64.whl" }
rocm                           = { url = "https://rocm.nightlies.amd.com/v2/gfx120X-all/rocm-7.13.0a20260318.tar.gz" }
rocm-sdk-core                  = { url = "https://rocm.nightlies.amd.com/v2/gfx120X-all/rocm_sdk_core-7.13.0a20260318-py3-none-win_amd64.whl" }
rocm-sdk-libraries-gfx120x-all = { url = "https://rocm.nightlies.amd.com/v2/gfx120X-all/rocm_sdk_libraries_gfx120x_all-7.13.0a20260318-py3-none-win_amd64.whl" }

[[tool.uv.dependency-metadata]]
name = "torch"
version = "2.10.0+rocm7.13.0a20260318"
requires-dist = []
```

`crash_test.py`:

```python
import torch

A = torch.randn(4, 4, device="cuda")
B = torch.randn(4, 4, device="cuda")
torch._grouped_mm(A, B)
```

```
uv sync
uv run python crash_test.py
```

---

### (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

_No response_

### Additional Information

Observed stack trace:

```
Exception Code: 0xC0000005
torch_hip.dll + 0x25BEB89, ?_grouped_mm@cuda@at@@...
torch_hip.dll + 0x274BA11, ?_fused_adagrad_@cuda@at@@...
torch_cpu.dll + 0x3BEE612, ??0JitDecompRegisterer@impl@autograd@torch@@...
torch_cpu.dll + 0x12B8192, ?call@_grouped_mm@_ops@at@@...
```

`torch._C._grouped_mm` does not exist as a `torch._C` attribute (`hasattr(torch._C, '_grouped_mm')` returns `False`). The op is only reachable via `torch._grouped_mm`.

Workaround: avoid calling `torch._grouped_mm` on a CUDA tensor on Windows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: [Windows] `torch._grouped_mm` access violation (0xC0000005) in `torch_hip.dll` #4086

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: [Windows] torch._grouped_mm access violation (0xC0000005) in torch_hip.dll #4086

Description

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Issue]: [Windows] `torch._grouped_mm` access violation (0xC0000005) in `torch_hip.dll` #4086