3x perf slow down in nightly build Torch 2.0.0.dev2023xxxx+cu118

### 🐛 Describe the bug

This github repo doesn't have a discussions tab like automatic1111 has so I'll use this.  Forgive me if this is wrong.

Stable Diffusion A1111 image generation using typical defaults like 20 steps euler_a simple prompts, sd 2.1 512 model.
Using the Linux nightly torch 2.0 on my 4090 only gives about 11 to 13 it/s.
With the WIndows nightly torch 2.0 build a 4090 gives about 35 to 38 it/s.
I have multiple confirmations of this from other folks.

However if you build pytorch locally on Linux you get about a 3X perf increase to the same perf seen on WIndows.
Today an ex-cto of a cloud company with GPU resources contacted me to try this on one of his cloud server he loaned me.  It also sped up his 4090 and he will test an A4000 GPU tomorrow.

As a suggestion you might check if architecture sm_89 is one of the selected architectures listed in the linux build output.
If there was a simply py inference perf test I'd be willing to run it as a "repro" but my REPRO is the entirety of SD AUTOMATIC1111.  I have no simple stand alone pytorch perf test.  Let me know how I can help.  Good night.

### Versions

CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: version 3.25.0
Libc version: glibc-2.35

Python version: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.17.0-1019-oem-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA Graphics Device
Nvidia driver version: 520.61.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.3
[pip3] open-clip-torch==2.7.0
[pip3] pytorch-lightning==1.7.6
[pip3] pytorch-triton==2.0.0+0d7e753227
[pip3] torch==2.0.0.dev20230113+cu118
[pip3] torchdiffeq==0.2.3
[pip3] torchmetrics==0.11.0
[pip3] torchsde==0.2.5
[pip3] torchvision==0.15.0.dev20230116+cu118
[conda] Could not collect
~                                 

cc @ezyang @gchanan @zou3519 @ngimel @peterjc123 @mszhanyi @skyline75489 @nbcsm @csarofeen @ptrblck @xwang233 @seemethere @malfet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3x perf slow down in nightly build Torch 2.0.0.dev2023xxxx+cu118 #92288

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

3x perf slow down in nightly build Torch 2.0.0.dev2023xxxx+cu118 #92288

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions