Skip to content

[release/2.12] Fix MIOpen CTC loss crash on Windows (#179264)#3180

Merged
jeffdaily merged 1 commit intorelease/2.12from
release/2.12_cp_f91370d8694c1d2b36592430c3f250e66948c74b
Apr 25, 2026
Merged

[release/2.12] Fix MIOpen CTC loss crash on Windows (#179264)#3180
jeffdaily merged 1 commit intorelease/2.12from
release/2.12_cp_f91370d8694c1d2b36592430c3f250e66948c74b

Conversation

@jeffdaily
Copy link
Copy Markdown
Collaborator

Fix MIOpen CTC loss access violation on Windows discrete GPUs

Problem

A failing unit test on Windows started showing a couple weeks ago and a missing #include was added in [](https://github.com/pytorch/pull/178284), but CI on TheRock kept failing. The fix was tested on gfx1151 (APU), where the test passed, but CI showed failures on gfx1100.

test_CTCLoss_no_batch_dim (and any code path hitting miopen_ctc_loss) crashes with a fatal access violation on Windows systems with discrete AMD GPUs:

Windows fatal exception: access violation Exception Code: 0xC0000005
#0 miopen::CTCLossDescriptor::GetCTCLossWorkspaceSize (MIOpen.dll+0x14fde4) #1 miopenGetCTCLossWorkspaceSize (MIOpen.dll+0x150912) #2 at::native::miopen_ctc_loss (torch_hip.dll)

Root Cause

miopenGetCTCLossWorkspaceSize and miopenCTCLoss read the labels, label_lengths, and input_lengths arrays on the host side to plan the computation and calculate workspace requirements. The existing code copies these arrays to GPU memory and passes device pointers:

Tensor labels_gpu = targets_t.to(Device(at::kCUDA), at::kInt); // ... hipMemcpy to GPU ...
MIOPEN_CHECK(miopenGetCTCLossWorkspaceSize(...,
    labels_gpu.data_ptr<int>(),          // device pointer
    label_lengths_gpu.data_ptr<int>(),   // device pointer
    input_lengths_gpu.data_ptr<int>()    // device pointer
));

This works on:

  • Linux — HSA (Heterogeneous System Architecture) maps GPU allocations into the process virtual address space, making device pointers host-readable
  • Windows APUs — CPU and iGPU share system RAM, so device pointers point to host-accessible memory

This crashes on:

  • Windows dGPUs — GPU has dedicated VRAM across PCIe; device pointers are opaque handles that cannot be dereferenced from host code

Verification

Tested on gfx1201:

CheckResult
hipDeviceAttributeIntegrated0 (discrete GPU)
hipDeviceAttributeCanUseHostPointerForRegisteredMem0
hipDeviceAttributeManagedMemory0x7FFFFFFF (unsupported)
hipDeviceAttributeUnifiedAddressing0x7FFFFFFF (unsupported)
Host read of hipMalloc pointer via ctypesAccess violation
CTC loss with CPU pointersPass (forward + backward)

Fix

Use host pointers since this is what MIOpen expects should be used.

Testing

Run all existing CTCLoss unit tests.

Pull Request resolved: pytorch#179264
Approved by: https://github.com/jeffdaily

<h2>Fix MIOpen CTC loss access violation on Windows discrete GPUs</h2>

<h3>Problem</h3>

<p>A failing unit test on Windows started showing a couple weeks ago and a missing <code>#include</code> was added in [](pytorch#178284), but CI on TheRock kept failing. The fix was tested on gfx1151 (APU), where the test passed, but CI showed failures on gfx1100. </p>

<p><code>test_CTCLoss_no_batch_dim</code> (and any code path hitting <code>miopen_ctc_loss</code>) crashes with a fatal access violation on Windows systems with discrete AMD GPUs:</p>

<pre><code>Windows fatal exception: access violation
Exception Code: 0xC0000005
#0 miopen::CTCLossDescriptor::GetCTCLossWorkspaceSize (MIOpen.dll+0x14fde4)
#1 miopenGetCTCLossWorkspaceSize (MIOpen.dll+0x150912)
#2 at::native::miopen_ctc_loss (torch_hip.dll)
</code></pre>

<h3>Root Cause</h3>

<p><code>miopenGetCTCLossWorkspaceSize</code> and <code>miopenCTCLoss</code> read the <code>labels</code>, <code>label_lengths</code>, and <code>input_lengths</code> arrays <strong>on the host side</strong> to plan the computation and calculate workspace requirements. The existing code copies these arrays to GPU memory and passes device pointers:</p>

<pre><code>Tensor labels_gpu = targets_t.to(Device(at::kCUDA), at::kInt);
// ... hipMemcpy to GPU ...
MIOPEN_CHECK(miopenGetCTCLossWorkspaceSize(...,
    labels_gpu.data_ptr&lt;int&gt;(),          // device pointer
    label_lengths_gpu.data_ptr&lt;int&gt;(),   // device pointer
    input_lengths_gpu.data_ptr&lt;int&gt;()    // device pointer
));
</code></pre>

<p>This works on:</p>
<ul>
<li><strong>Linux</strong> — HSA (Heterogeneous System Architecture) maps GPU allocations into the process virtual address space, making device pointers host-readable</li>
<li><strong>Windows APUs</strong> — CPU and iGPU share system RAM, so device pointers point to host-accessible memory</li>
</ul>

<p>This crashes on:</p>
<ul>
<li><strong>Windows dGPUs</strong> — GPU has dedicated VRAM across PCIe; device pointers are opaque handles that cannot be dereferenced from host code</li>
</ul>

<h3>Verification</h3>

<p>Tested on gfx1201:</p>

<table border="1" cellpadding="6" cellspacing="0">
<tr><th>Check</th><th>Result</th></tr>
<tr><td><code>hipDeviceAttributeIntegrated</code></td><td><code>0</code> (discrete GPU)</td></tr>
<tr><td><code>hipDeviceAttributeCanUseHostPointerForRegisteredMem</code></td><td><code>0</code></td></tr>
<tr><td><code>hipDeviceAttributeManagedMemory</code></td><td><code>0x7FFFFFFF</code> (unsupported)</td></tr>
<tr><td><code>hipDeviceAttributeUnifiedAddressing</code></td><td><code>0x7FFFFFFF</code> (unsupported)</td></tr>
<tr><td>Host read of <code>hipMalloc</code> pointer via <code>ctypes</code></td><td>Access violation</td></tr>
<tr><td>CTC loss with CPU pointers</td><td>Pass (forward + backward)</td></tr>
</table>

<h3>Fix</h3>

<p>Use host pointers since this is what MIOpen expects should be used.</p>

<h3>Testing</h3>

<p>Run all existing CTCLoss unit tests.</p>

Pull Request resolved: pytorch#179264
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
@jeffdaily jeffdaily merged commit fb07e36 into release/2.12 Apr 25, 2026
48 checks passed
@jeffdaily jeffdaily deleted the release/2.12_cp_f91370d8694c1d2b36592430c3f250e66948c74b branch April 25, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants