[release/2.12] Fix MIOpen CTC loss crash on Windows (#179264) by jeffdaily · Pull Request #3180 · ROCm/pytorch

jeffdaily · 2026-04-25T00:51:48Z

Fix MIOpen CTC loss access violation on Windows discrete GPUs

Problem

A failing unit test on Windows started showing a couple weeks ago and a missing #include was added in [](https://github.com/pytorch/pull/178284), but CI on TheRock kept failing. The fix was tested on gfx1151 (APU), where the test passed, but CI showed failures on gfx1100.

test_CTCLoss_no_batch_dim (and any code path hitting miopen_ctc_loss) crashes with a fatal access violation on Windows systems with discrete AMD GPUs:

Windows fatal exception: access violation Exception Code: 0xC0000005
#0 miopen::CTCLossDescriptor::GetCTCLossWorkspaceSize (MIOpen.dll+0x14fde4) #1 miopenGetCTCLossWorkspaceSize (MIOpen.dll+0x150912) #2 at::native::miopen_ctc_loss (torch_hip.dll)

Root Cause

miopenGetCTCLossWorkspaceSize and miopenCTCLoss read the labels, label_lengths, and input_lengths arrays on the host side to plan the computation and calculate workspace requirements. The existing code copies these arrays to GPU memory and passes device pointers:

Tensor labels_gpu = targets_t.to(Device(at::kCUDA), at::kInt); // ... hipMemcpy to GPU ...
MIOPEN_CHECK(miopenGetCTCLossWorkspaceSize(...,
    labels_gpu.data_ptr<int>(),          // device pointer
    label_lengths_gpu.data_ptr<int>(),   // device pointer
    input_lengths_gpu.data_ptr<int>()    // device pointer
));

This works on:

Linux — HSA (Heterogeneous System Architecture) maps GPU allocations into the process virtual address space, making device pointers host-readable
Windows APUs — CPU and iGPU share system RAM, so device pointers point to host-accessible memory

This crashes on:

Windows dGPUs — GPU has dedicated VRAM across PCIe; device pointers are opaque handles that cannot be dereferenced from host code

Verification

Tested on gfx1201:

Check	Result
`hipDeviceAttributeIntegrated`	`0` (discrete GPU)
`hipDeviceAttributeCanUseHostPointerForRegisteredMem`	`0`
`hipDeviceAttributeManagedMemory`	`0x7FFFFFFF` (unsupported)
`hipDeviceAttributeUnifiedAddressing`	`0x7FFFFFFF` (unsupported)
Host read of `hipMalloc` pointer via `ctypes`	Access violation
CTC loss with CPU pointers	Pass (forward + backward)

Fix

Use host pointers since this is what MIOpen expects should be used.

Testing

Run all existing CTCLoss unit tests.

Pull Request resolved: pytorch#179264
Approved by: https://github.com/jeffdaily

<h2>Fix MIOpen CTC loss access violation on Windows discrete GPUs</h2> <h3>Problem</h3> <p>A failing unit test on Windows started showing a couple weeks ago and a missing <code>#include</code> was added in [](pytorch#178284), but CI on TheRock kept failing. The fix was tested on gfx1151 (APU), where the test passed, but CI showed failures on gfx1100. </p> <p><code>test_CTCLoss_no_batch_dim</code> (and any code path hitting <code>miopen_ctc_loss</code>) crashes with a fatal access violation on Windows systems with discrete AMD GPUs:</p> <pre><code>Windows fatal exception: access violation Exception Code: 0xC0000005 #0 miopen::CTCLossDescriptor::GetCTCLossWorkspaceSize (MIOpen.dll+0x14fde4) #1 miopenGetCTCLossWorkspaceSize (MIOpen.dll+0x150912) #2 at::native::miopen_ctc_loss (torch_hip.dll) </code></pre> <h3>Root Cause</h3> <p><code>miopenGetCTCLossWorkspaceSize</code> and <code>miopenCTCLoss</code> read the <code>labels</code>, <code>label_lengths</code>, and <code>input_lengths</code> arrays <strong>on the host side</strong> to plan the computation and calculate workspace requirements. The existing code copies these arrays to GPU memory and passes device pointers:</p> <pre><code>Tensor labels_gpu = targets_t.to(Device(at::kCUDA), at::kInt); // ... hipMemcpy to GPU ... MIOPEN_CHECK(miopenGetCTCLossWorkspaceSize(..., labels_gpu.data_ptr<int>(), // device pointer label_lengths_gpu.data_ptr<int>(), // device pointer input_lengths_gpu.data_ptr<int>() // device pointer )); </code></pre> <p>This works on:</p> <ul> <li><strong>Linux</strong> — HSA (Heterogeneous System Architecture) maps GPU allocations into the process virtual address space, making device pointers host-readable</li> <li><strong>Windows APUs</strong> — CPU and iGPU share system RAM, so device pointers point to host-accessible memory</li> </ul> <p>This crashes on:</p> <ul> <li><strong>Windows dGPUs</strong> — GPU has dedicated VRAM across PCIe; device pointers are opaque handles that cannot be dereferenced from host code</li> </ul> <h3>Verification</h3> <p>Tested on gfx1201:</p> <table border="1" cellpadding="6" cellspacing="0"> <tr><th>Check</th><th>Result</th></tr> <tr><td><code>hipDeviceAttributeIntegrated</code></td><td><code>0</code> (discrete GPU)</td></tr> <tr><td><code>hipDeviceAttributeCanUseHostPointerForRegisteredMem</code></td><td><code>0</code></td></tr> <tr><td><code>hipDeviceAttributeManagedMemory</code></td><td><code>0x7FFFFFFF</code> (unsupported)</td></tr> <tr><td><code>hipDeviceAttributeUnifiedAddressing</code></td><td><code>0x7FFFFFFF</code> (unsupported)</td></tr> <tr><td>Host read of <code>hipMalloc</code> pointer via <code>ctypes</code></td><td>Access violation</td></tr> <tr><td>CTC loss with CPU pointers</td><td>Pass (forward + backward)</td></tr> </table> <h3>Fix</h3> <p>Use host pointers since this is what MIOpen expects should be used.</p> <h3>Testing</h3> <p>Run all existing CTCLoss unit tests.</p> Pull Request resolved: pytorch#179264 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

jeffdaily requested a review from jithunnair-amd as a code owner April 25, 2026 00:51

jeffdaily merged commit fb07e36 into release/2.12 Apr 25, 2026
48 checks passed

jeffdaily deleted the release/2.12_cp_f91370d8694c1d2b36592430c3f250e66948c74b branch April 25, 2026 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/2.12] Fix MIOpen CTC loss crash on Windows (#179264)#3180

[release/2.12] Fix MIOpen CTC loss crash on Windows (#179264)#3180
jeffdaily merged 1 commit intorelease/2.12from
release/2.12_cp_f91370d8694c1d2b36592430c3f250e66948c74b

jeffdaily commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffdaily commented Apr 25, 2026

Fix MIOpen CTC loss access violation on Windows discrete GPUs

Problem

Root Cause

Verification

Fix

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants