Skip to content

CRE-2025-0162: Stable Diffusion WebUI CUDA Out of Memory Detection#146

Merged
tonymeehan merged 2 commits intoprequel-dev:mainfrom
piyzard:feature/cre-2025-0162-sd-webui-cuda-oom
Sep 3, 2025
Merged

CRE-2025-0162: Stable Diffusion WebUI CUDA Out of Memory Detection#146
tonymeehan merged 2 commits intoprequel-dev:mainfrom
piyzard:feature/cre-2025-0162-sd-webui-cuda-oom

Conversation

@piyzard
Copy link
Copy Markdown
Contributor

@piyzard piyzard commented Aug 31, 2025

CRE-2025-0162: Stable Diffusion WebUI CUDA Out of Memory Detection

closes #130
/claim #130

🎯 Overview

This PR introduces a comprehensive detection rule for Stable Diffusion WebUI CUDA Out of Memory errors - addressing critical GPU memory exhaustion that causes complete image generation failures and WebUI crashes. The rule identifies VRAM shortage during model loading and image generation, requiring immediate intervention to prevent service disruption.

CRE Playground Links

CRE-2025-0162 Playground: Test Rule

📊 SD WebUI Issues Covered

# Issue Type Example Error Pattern
1 Model Loading OOM torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB
2 Critical Model Failure Stable Diffusion model failed to load: OutOfMemoryError
3 Image Generation Failure Failed to generate image: CUDA out of memory
4 VRAM Allocation Error GPU 0 has a total capacity of 6.00 GiB of which 1.20 GiB is free
5 Runtime Memory Error RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB
6 Cache Clear Attempts torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.50 GiB
7 Generation After Retry Image generation failed after retry
8 WebUI Shutdown WebUI shutting down due to memory error

🧪 Testing & Validation

Screenshot from 2025-08-31 16-15-26

🎬 Demo Environment

Repo link: https://github.com/piyzard/cre-2025-0162

Screencast.from.2025-08-31.17-01-45.mp4

💡 Mitigation Strategies

Immediate Actions:

  • Restart Stable Diffusion WebUI
  • Clear GPU memory: nvidia-smi --gpu-reset
  • Add memory optimization flags: --medvram or --lowvram

Configuration Fixes:

  • For 4-6GB VRAM: Add --medvram to webui-user.bat
  • For 2-4GB VRAM: Add --lowvram to webui-user.bat
  • Enable xformers: --xformers for memory efficiency

Runtime Adjustments:

  • Reduce image resolution (512x512 instead of 1024x1024)
  • Decrease batch size to 1
  • Set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

References

@tonymeehan tonymeehan merged commit c414539 into prequel-dev:main Sep 3, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stable Diffusion Web UI: Reproduce A High-Severity Failure & Write a CRE Rule [Multiple Winners] [Submit by August 31 11:59 pm ET]

2 participants