update torch base environment by vladmandic · Pull Request #9191 · AUTOMATIC1111/stable-diffusion-webui

vladmandic · 2023-03-30T15:04:37Z

this pr is a single-step update of pytorch base environment:

from torch 1.13.1 with cuda 11.7 and cudnn 8.5.0
to torch 2.0.0 with cuda 11.8 and cudnn 8.7.0

this allows usage of sdp cross-optimization, better multi-gpu support with accelerate and avoids large number of performance issues due to broken cudnn in some environments

it updates all required packaged, but avoids any prereleases:

torchvision (plus silence future deprecation warning)
xformers (update follows torch)
accelerate (required to support new torch)
numpy (update of numpy is required by new accelerate)

note:

since accelerate changed format of the config file to avoid warnings (non-critical), run accelerate config once
collab updated to torch 2.0 so having webui still using older torch is causing issues for users running webui in hosted environments

yes, updating torch is a major step, but will have to be done sooner or later as there are more and more reports of issues installing old torch version

catboxanon · 2023-03-30T19:25:28Z

requirements.txt

@@ -1,3 +1,4 @@
+astunparse


Any particular reason this is included?

its an indirect dependency that can cause runtime errors if user has tensorflow installed as new transformers will try to check it and then complain that astunparse is missing even if tensorflow is not used.

File "/home/vlado/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 33, in <module> import tensorflow as tf ... from tensorflow.python.autograph.pyct import parser File "/home/vlado/.local/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/parser.py", line 29, in <module> import astunparse ModuleNotFoundError: No module named 'astunparse'

The TensorFlow pip packages should automatically install astunparse as a required dependency as listed in their package metadata and same with setup.py when installing from source.

Metadata-Version: 2.1 Name: tensorflow Version: 2.12.0 ... Requires-Dist: astunparse (>=1.6.0) ...

Metadata-Version: 2.1 Name: tensorflow-intel Version: 2.12.0 ... Requires-Dist: astunparse (>=1.6.0) ...

So this seems redundant when the webui base itself doesn't even install TensorFlow, only some extensions do. None the less, if you feel this is really needed, it would be wise to pin the same version range as TensorFlow itself.

The TensorFlow pip packages should automatically install astunparse as a required dependency as listed in their package metadata and same with [setup.py]

yup, but there was a buggy installer in tf 2.11 and 2.12 only came out very recently.

it would be wise to pin the same version range as TensorFlow itself.

i hate specifying version range where absolutely not needed - last version of astunparse is from 2019, its unlikely that a brand new breaking version is going to popup out-of-the-blue.

So this seems redundant when the webui base itself doesn't even install TensorFlow, only some extensions do.

very true. i'm just trying to make it as least error prone as possible for average users as i've seen this happen on multiple systems.

if you feel this is really needed

i don't - i'm just trying to make it error-proof - but i'm open to suggestions
(and removing astunparse from dependencies if desired).

there was a buggy installer in tf 2.11

If this is the case, then maybe astunparse could be moved to the launch.py and be conditional on tensorflow being installed? Something like:

if is_installed("tensorflow") or is_installed("tensorflow-intel") or is_installed("tensorflow-gpu"): if not is_installed("astunparse"): run_pip(f"install {astunparse}", "astunparse")

Though astunparse is a tiny package which hasn't been updated since 2019, so it probably isn't a big deal if it gets mistakenly installed when not needed. I'll leave judgement up to you and automatic, since I have no firm opinion about this.

glass-ships · 2023-03-30T20:27:45Z

Testing these changes out - things seem to work "out of the box",
but I still get the No module 'xformers'. Proceeding without it message when starting up
Not sure if this can be ignored, as it's seemingly included in torch 2

vladmandic · 2023-03-30T21:11:47Z

but I still get the No module 'xformers'. Proceeding without it message when starting up

its not "included", its just not necessary given new sdp is available
(depending on the use-case, low-end gpus are still better with xformers).

the remaining message comes from external repo - repositories/stable-diffusion-stability-ai and removing warning would cause that repo to get out-of-sync. and unfortunately, its not posted with logger so it can be filtered out, but a simple print statement.

FurkanGozukara · 2023-03-30T21:32:07Z

have you tested these changes on unix? runpod?

vladmandic · 2023-03-30T21:38:42Z

runpod

linux yes. runpod no. there are thousands of gpu cloud providers, cannot test each one like that.

drax-xard · 2023-03-30T22:25:10Z

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

FurkanGozukara · 2023-03-30T23:42:14Z

runpod

linux yes. runpod no. there are thousands of gpu cloud providers, cannot test each one like that.

ok list me 20 :)

anyway i am just saying that covering as many as possible widely used scenario is good

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

correct and i solved this problem with downloading and uploading torch 1 version wheel 0.0.18dev489. they are also still compiling them thankfully. i think automatic1111 can do same way. the wheel and such things can be hosted on hugging face i think . currently they removed all 0.0.14 and 0.0.17 for torch 1 from pip installation.

Cyberbeing · 2023-03-31T00:05:02Z

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

This is true only for wheels posted to pypi. You can find a wide range of pre-built xformers wheel builds in their Github action artifacts, if you still needed a wheel for older torch. Not at simple as keeping up to date via pypi, but useful in a pinch.
Screenshot example of available builds below:

Just keep in mind you need to be logged into Github to download artifacts.

vladmandic · 2023-03-31T00:25:03Z

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

Just keep in mind you need to be logged into Github to download artifacts.

sorry, can we use discussions for this and keep pr comments as pr comments? i'd love to collect/implement anything thats required, but this is not pr related at all.

wywywywy · 2023-03-31T23:10:02Z

environment-wsl2.yaml

+  - cudatoolkit=11.8
+  - pytorch=2.0
+  - torchvision=0.15
+  - numpy=1.23


@vladmandic Didn't you say Torch 2.0 requires Numpy 1.24+ instead of 1.23?

Beta did, but they relaxed it for GA.
And some other dependencies require 1.23 and are not compatible with 1.24, so it's not that clean to use 1.24 just yet. Thus my recommendation of latest from 1.23.

EfourC · 2023-04-03T02:27:11Z

I'm running into a really strange problem. Any advice of how I should narrow down the root cause?

Edit: Oops, forgot to say my startup arguments:
--xformers --opt-channelslast --no-half-vae

I was just trying this PR out (as-is, plus exporting xformers==0.0.18 in launch.py).

Everything upgraded and ran smoothly for the most part, but when I tried to generate a larger image (e.g. 1024x1024), I realized there is a problem for me -- It seemed to hang at 100% GPU for 4 minutes! The sampler steps had completed, but the image had not saved to file yet. Then after 4 mins, the image finally completed and saved.

After this, the problem goes away if I generate more at the same resolution (until the WebUI process is restarted).

However, if I change the image resolution to anything different, e.g. to 1024x1088, the same very delay happens, and again only for the first run at that resolution.

After investigating, I realized there were also delays for smaller images, but the delay grows exponentially as resolution scales up.

Here is a quick table showing times I measured.
Note: These measurements also reflect the typical 'warm-up' time before steps progress that was already a little annoying. After warm-up, the gen time for small images is much faster.

First Runs w/ 5 steps (Euler a):

Gen Time = time for all steps completed

Size	Gen Time	Total Time
512x512	0:03	0:04
640x640	0:01	0:06
704x704	0:02	0:08
768x768	0:02	0:09
832x832	0:03	0:11
896x832	0:03	0:12
896x896	0:03	2:23
1024x1024	0:04	4:14

Here are some screenshots of the output I saw

System: Win10, RTX 2070S 8GB, Intel 3770k

Version info:

Sakura-Luna · 2023-04-03T03:39:42Z

@EfourC Very strange problem, since the recent update is not stable, you can try to see if this problem is reproduced on the old version, for example git checkout a9fed7c.

EfourC · 2023-04-03T04:19:43Z

With that commit, and the current master, the problem doesn't happen -- I only see it after everything is upgraded for Torch2.

Behavior Ok:

EfourC · 2023-04-03T04:40:09Z

I did some more permutations of testing, especially to see if --opt-sdp-attention instead of xformers made a difference with the Torch 2 venv.

What I found out is that the problem is actually --opt-channelslast causing the massive delay for me in with Torch 2.

Both of these startup args work ok:
--xformers --no-half-vae
--opt-sdp-attention --no-half-vae

Using --opt-channelslast with either of the above creates the problem delay for me.

I haven't looked at (or used previously) any of the other performance optimization switches, but it's probably worth people trying them out on different types of systems (since I blundered into an issue with this one).
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

mariaWitch · 2023-04-04T15:41:28Z

Honestly if there is going to be a move to Torch 2.0.0, it should wait until after Torch 2.0.1 is released as there is currently a major bug that made it into GA that breaks compatibility with WebUI when using torch.compile.
See: pytorch/pytorch#97862 and pytorch/pytorch#93405

vladmandic · 2023-04-04T16:41:46Z

I'm aware of that issue, but WebUI does not use torch.compile on its own and anyone that is experienced enough to use it would hand pick torch version manually anyhow.

Torch 2.1 has no benefits for normal WebUI user. And existing Torch 1.13 is showing it's teeth with quite a few install issues lately.

Whole point of the PR is not to enable experimental use, but to make it simpler for normal users.

vladmandic · 2023-04-04T21:29:42Z

fyi, i've initially updated xformers to 0.0.18, but there are frequent reports of NaN values, especially during hires operations, so i've downgraded to 0.0.17. performance wise, i don't see any major difference, so this is not a big loss. like i said before, goal of this PR is to get cleanest out-of-the-box environment where least number of users have issues, not just go with latest&greatest.

mariaWitch · 2023-04-04T21:54:19Z

Torch 2.1 has no benefits for normal WebUI user. And existing Torch 1.13 is showing it's teeth with quite a few install issues lately.

Whole point of the PR is not to enable experimental use, but to make it simpler for normal users.

It isn't 2.1, we aren't waiting a whole major release, Torch 2.0.1 came out of phase 0 yesterday. I still believe that Torch 2.0 should not be merged until the blocking issue upstream is resolved in the next minor update as I believe PyTorch botched the initial GA release of 2.0, and we shouldn't be running that version of Pytorch until it is more mature.

vladmandic · 2023-04-05T14:43:29Z

Torch 2.1 has no benefits for normal WebUI user. And existing Torch 1.13 is showing it's teeth with quite a few install issues lately.
Whole point of the PR is not to enable experimental use, but to make it simpler for normal users.

It isn't 2.1, we aren't waiting a whole major release, Torch 2.0.1 came out of phase 0 yesterday. I still believe that Torch 2.0 should not be merged until the blocking issue upstream is resolved in the next minor update as I believe PyTorch botched the initial GA release of 2.0, and we shouldn't be running that version of Pytorch until it is more mature.

and then we'd have to wait for xformers to publish new wheels, etc...
again, torch.compile is not used by webui so there is no benefit standard user of waiting for torch 2.0.1.
and collab upgraded to torch 2.0 and so did many other hosted environments, so right now running webui requires additional manual steps - which is far more important to resolve than "wait for ideal version".

mariaWitch · 2023-04-05T16:54:56Z

and then we'd have to wait for xformers to publish new wheels, etc...
again, torch.compile is not used by webui so there is no benefit standard user of waiting for torch 2.0.1.
and collab upgraded to torch 2.0 and so did many other hosted environments, so right now running webui requires additional manual steps - which is far more important to resolve than "wait for ideal version".

As it stands right now, the only people you are claiming are effected are people using cloud setups, which most likely have already done the manual work to support PyTorch 2.0.0. There is no reason for PyTorch to be upgraded to 2.0.0 when it is very clearly NOT stable. It is not worth risking adding even more bugs to the code base as it currently stands.

vladmandic · 2023-04-05T17:24:15Z

very clearly NOT stable

that is a very strong statement. can you substantiate this? all errors i've seen so far have been related to torch.compile and yes, that feature is pretty much broken.

on the other hand, there are hundreds users using torch 2.0 with webui without issues.

mariaWitch · 2023-04-06T14:00:29Z

that is a very strong statement. can you substantiate this?

To name a few:
pytorch/pytorch#97031
pytorch/pytorch#97041
pytorch/pytorch#97226
pytorch/pytorch#97576
pytorch/pytorch#97021

And not only that, I disagree with moving to 2.0.0 on principle as .0.0 software is generally never stable. Waiting for 2.0.1 has no downsides whereas 2.0.0 is an unstable mess that they are still trying to get stable. The last thing this repo needs is more instability which causes more issues to flood in.

vladmandic · 2023-04-06T14:17:07Z

that is a very strong statement. can you substantiate this?

To name a few: pytorch/pytorch#97031 pytorch/pytorch#97041 pytorch/pytorch#97226 pytorch/pytorch#97576 pytorch/pytorch#97021

Bugs relevant to WebUI are what matters - why list random things - this is going in a wrong direction?

For example, I don't think anyone is trying to run it on RaspberryPi, lets stay on topic?
And WebUI uses venv, not conda, so conda on osx-64 also doesn't really apply.
Or does torchvision have debug symbols or not? I'd consider that cosmetic issue at best.
Yes, libnvrtc packaging bug is relevant, but there is no PR associated with it, so should we wait until indefinitely?

And not only that, I disagree with moving to 2.0.0 on principle as .0.0 software is generally never stable. Waiting for 2.0.1 or even 2.0.2 has no downsides whereas 2.0.0 is an unstable mess that they are still trying to get stable. The last thing this repo needs is more instability which causes more issues to flood in.

That is a question of personal preference and risk vs reward. Issue is that Torch 1.13 wheels are getting obsoleted in many packages and/or environment causing failures to install. So whats the solution? Ignore current issues until some unknown time in the future?

PR stands as-is and I've been using Torch 2.0 on my branch for a while now (and users on my branch are not reporting issues relevant to WebUI). We can agree to disagree here.

mariaWitch · 2023-04-07T14:20:21Z

Convolutions being broken for Cuda 11.8 builds specifically affects users who use Pytorch 2.0 and --opt-channelslast. It basically negates any possible performance benefits from that option.

BasedAnon · 2023-04-08T01:02:22Z

.gitignore

What is the change?
If there is none, why is this here?
It's incredibly confusing.
(BTW: I am a relatively new user, please go easy on me)

Under certain conditions, there could be a cache.json.lock file in addition to cache.json.

So this change (appending an asterisk) will cover both files.

Exactly! And you're fast :)

DGdev91

The code for MacOS uses the --extra-index wich was meant for nvidia cards

DGdev91 · 2023-04-11T09:41:54Z

webui-macos-env.sh

Are you sure "--extra-index-url https://download.pytorch.org/whl/cu118" is actually ok for mac os?

I don't know why that code was using version 1.12.1 instead a newer one (probably it was just never updated) but probably the right way to install it in mac os is by just using "pip install torch torchvision" without the --extra-index, as mentioned in the official website https://pytorch.org/get-started/locally/

DGdev91 · 2023-04-11T10:05:42Z

Recently PyTorch changed it's install command, it now uses --index-url instead of --extra-index-url, as mentioned in #9483

Also, i noticed your code doesn't cover AMD cards (in that cate TORCH_COMMAND is set in webui.sh
But it's fine, i had some problems on my 5700XT on pytorch2 (see #8139) and already covered that part in #9404, i feel it's better to stay on 1.31.1 a few more, at least on AMD.

sjdevries · 2023-04-16T16:04:50Z

Been having a lot of issues trying to get things working with my 5700XT. The new torch 2.0 version failed to generate any images.

Using the latest rocm as below failed to generate images or any console output:
export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2"

Using the torch command from the pr works:
export TORCH_COMMAND="pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --index-url https://download.pytorch.org/whl/rocm5.2"

At least it is working on Commit hash: a9fed7c

vladmandic · 2023-04-16T16:07:16Z

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

PennyFranklin · 2023-04-16T16:20:26Z

5.6.0？maybe that will contain my 7900xtx

DGdev91 · 2023-04-17T14:39:56Z

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Really? Where? I see 5.4.3 as the last realease on https://github.com/RadeonOpenCompute/ROCm/releases

vladmandic · 2023-04-17T15:00:07Z

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Really? Where? I see 5.4.3 as the last realease on https://github.com/RadeonOpenCompute/ROCm/releases

https://rocmdocs.amd.com/projects/alpha/en/develop/deploy/install.html

DGdev91 · 2023-04-18T23:13:47Z

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Really? Where? I see 5.4.3 as the last realease on https://github.com/RadeonOpenCompute/ROCm/releases

https://rocmdocs.amd.com/projects/alpha/en/develop/deploy/install.html

uhm... it doen't seem that's publicly available

5.6.0？maybe that will contain my 7900xtx

There was indeed a docker image for rocm 5.6.0 with 7900xtx support around, but it's now offline so i guess that code was intended for internal testing and not supposed to be released yet. Anyway, there was a discussion here #9591

I'm not sure if that works on other gpus like the 5700xt too. but i wouldn't be surprised if pytorch2.0 starts to work when the next rocm version will be released.

I guess for 5700xt users the better choice is sticking to the old 1.13.1 version and wait for an official rocm release

AUTOMATIC1111 · 2023-04-29T09:38:03Z

I'm pretty sure --extra-index-url https://download.pytorch.org/whl/cu118 for OSX is wrong but I don't have a mac to try it on.

DGdev91 · 2023-04-29T09:52:34Z

I'm pretty sure --extra-index-url https://download.pytorch.org/whl/cu118 for OSX is wrong but I don't have a mac to try it on.

According to PyTorch's website it should be just pip3 install torch torchvision without extra arguments

update torch

d5063e0

vladmandic requested a review from AUTOMATIC1111 as a code owner March 30, 2023 15:04

catboxanon reviewed Mar 30, 2023

View reviewed changes

wywywywy reviewed Mar 31, 2023

View reviewed changes

cube-god mentioned this pull request Apr 1, 2023

[Bug]: xformers 0.0.18 requires torch==2.0.0, but you have torch 1.13.0 which is incompatible.【Ubuntu 18.04】 #9278

Closed

1 task

update xformers

4fa59b0

revert xformers

80752f4

BasedAnon reviewed Apr 8, 2023

View reviewed changes

DGdev91 suggested changes Apr 11, 2023

View reviewed changes

change index url

7fb72ed

AUTOMATIC1111 changed the base branch from master to dev April 29, 2023 08:56

AUTOMATIC1111 approved these changes Apr 29, 2023

View reviewed changes

Merge branch 'dev' into torch

f54cd3f

AUTOMATIC1111 merged commit 9eb49b0 into AUTOMATIC1111:dev Apr 29, 2023

DGdev91 mentioned this pull request Apr 29, 2023

Forcing PyTorch version for AMD GPUs automatic install #9404

Merged

vladmandic deleted the torch branch May 8, 2023 14:00

catboxanon mentioned this pull request Aug 7, 2023

[Feature Request]: Support Torch 2.0 now that it is GA. #8696

Closed

1 task

Conversation

vladmandic commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glass-ships commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FurkanGozukara commented Mar 30, 2023

Uh oh!

vladmandic commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drax-xard commented Mar 30, 2023

Uh oh!

FurkanGozukara commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyberbeing commented Mar 31, 2023

Uh oh!

vladmandic commented Mar 31, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vladmandic Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EfourC commented Apr 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sakura-Luna commented Apr 3, 2023

Uh oh!

EfourC commented Apr 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EfourC commented Apr 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mariaWitch commented Apr 4, 2023

Uh oh!

vladmandic commented Apr 4, 2023

Uh oh!

vladmandic commented Apr 4, 2023

Uh oh!

mariaWitch commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Apr 5, 2023

Uh oh!

mariaWitch commented Apr 5, 2023

Uh oh!

vladmandic commented Apr 5, 2023

Uh oh!

mariaWitch commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Apr 6, 2023

Uh oh!

mariaWitch commented Apr 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DGdev91 left a comment

vladmandic commented Mar 30, 2023 •

edited

Loading

glass-ships commented Mar 30, 2023 •

edited

Loading

vladmandic commented Mar 30, 2023 •

edited

Loading

vladmandic commented Mar 30, 2023 •

edited

Loading

FurkanGozukara commented Mar 30, 2023 •

edited

Loading

vladmandic Mar 31, 2023 •

edited

Loading

EfourC commented Apr 3, 2023 •

edited

Loading

EfourC commented Apr 3, 2023 •

edited

Loading

EfourC commented Apr 3, 2023 •

edited

Loading

mariaWitch commented Apr 4, 2023 •

edited

Loading

mariaWitch commented Apr 6, 2023 •

edited

Loading