Note
Also reported at ROCm/rocm-libraries#1860
Problem Description
Using a gfx1200 GPU, the first image generation in Stable Diffusion goes quickly - I get like 2.5 it/s - but at the end, during the VAE decode stage, it crashes the GPU driver, occasionally giving OOM errors in the console.
I’m using average/default generation parameters - basically every UI’s base values (1024×1024 resolution, 20 steps, Euler a or DPM++ 2M karras, etc.).
I’ve tried the well-known UIs:
ComfyUI, SD.Next, Stable Diffusion WebUI reForge, etc., and they all behave the same.
Subsequent generations usually work, but if I change the resolution to anything else, the problem repeats.
For example, using the krita-ai-diffusion plugin with Krita triggers the same issue every single time, because there the resolution and other parameters often change. This doesn’t seem reasonable.
I’ve tried every flag I could think of, for example in Comfy:
--use-pytorch-cross-attention, --disable-smart-memory, --reserve-vram 8, --fp16-vae, --bf16-vae, tiled VAE node, etc., but nothing helps.
/ I figured out, along with other users, that disabling MIOpen entirely by hard-coding torch.backends.cudnn.enabled = False in the script generally prevents driver crashes and OOM issues, but it’s just a workaround, not a real solution. /
Operating System
Windows 11
CPU
Intel Core i5
GPU
AMD Radeon RX 9060 XT 16 GB
ROCm Version
7.0.0rc20250917
Steps to Reproduce
Install the latest wheels:
python -m pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ torch torchvision torchaudio
Then open any SD UI, and generate an image.
(I'm using the TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 environment variable every time to enable AOTriton on the gfx1200.)
Additional Information
I’ve seen other AMD users mention this VAE issue in several other places online.
Problem Description
Using a gfx1200 GPU, the first image generation in Stable Diffusion goes quickly - I get like 2.5 it/s - but at the end, during the VAE decode stage, it crashes the GPU driver, occasionally giving OOM errors in the console.
I’m using average/default generation parameters - basically every UI’s base values (1024×1024 resolution, 20 steps, Euler a or DPM++ 2M karras, etc.).
I’ve tried the well-known UIs:
ComfyUI, SD.Next, Stable Diffusion WebUI reForge, etc., and they all behave the same.
Subsequent generations usually work, but if I change the resolution to anything else, the problem repeats.
For example, using the krita-ai-diffusion plugin with Krita triggers the same issue every single time, because there the resolution and other parameters often change. This doesn’t seem reasonable.
I’ve tried every flag I could think of, for example in Comfy:
--use-pytorch-cross-attention,--disable-smart-memory,--reserve-vram 8,--fp16-vae,--bf16-vae, tiled VAE node, etc., but nothing helps./ I figured out, along with other users, that disabling MIOpen entirely by hard-coding
torch.backends.cudnn.enabled = Falsein the script generally prevents driver crashes and OOM issues, but it’s just a workaround, not a real solution. /Operating System
Windows 11
CPU
Intel Core i5
GPU
AMD Radeon RX 9060 XT 16 GB
ROCm Version
7.0.0rc20250917
Steps to Reproduce
Install the latest wheels:
python -m pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ torch torchvision torchaudioThen open any SD UI, and generate an image.
(I'm using the
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1environment variable every time to enable AOTriton on the gfx1200.)Additional Information
I’ve seen other AMD users mention this VAE issue in several other places online.