-
Notifications
You must be signed in to change notification settings - Fork 27.4k
3x perf slow down in nightly build Torch 2.0.0.dev2023xxxx+cu118 #92288
Description
🐛 Describe the bug
This github repo doesn't have a discussions tab like automatic1111 has so I'll use this. Forgive me if this is wrong.
Stable Diffusion A1111 image generation using typical defaults like 20 steps euler_a simple prompts, sd 2.1 512 model.
Using the Linux nightly torch 2.0 on my 4090 only gives about 11 to 13 it/s.
With the WIndows nightly torch 2.0 build a 4090 gives about 35 to 38 it/s.
I have multiple confirmations of this from other folks.
However if you build pytorch locally on Linux you get about a 3X perf increase to the same perf seen on WIndows.
Today an ex-cto of a cloud company with GPU resources contacted me to try this on one of his cloud server he loaned me. It also sped up his 4090 and he will test an A4000 GPU tomorrow.
As a suggestion you might check if architecture sm_89 is one of the selected architectures listed in the linux build output.
If there was a simply py inference perf test I'd be willing to run it as a "repro" but my REPRO is the entirety of SD AUTOMATIC1111. I have no simple stand alone pytorch perf test. Let me know how I can help. Good night.
Versions
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: version 3.25.0
Libc version: glibc-2.35
Python version: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.17.0-1019-oem-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA Graphics Device
Nvidia driver version: 520.61.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.3
[pip3] open-clip-torch==2.7.0
[pip3] pytorch-lightning==1.7.6
[pip3] pytorch-triton==2.0.0+0d7e753227
[pip3] torch==2.0.0.dev20230113+cu118
[pip3] torchdiffeq==0.2.3
[pip3] torchmetrics==0.11.0
[pip3] torchsde==0.2.5
[pip3] torchvision==0.15.0.dev20230116+cu118
[conda] Could not collect
~
cc @ezyang @gchanan @zou3519 @ngimel @peterjc123 @mszhanyi @skyline75489 @nbcsm @csarofeen @ptrblck @xwang233 @seemethere @malfet
Metadata
Metadata
Assignees
Labels
Type
Projects
Status