[Bug]: AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

### Is there an existing issue for this?

- [X] I have searched the existing issues and checked the recent builds/commits

### What happened?

After following the "Automatic Installation" instructions from the wiki for AMD cards, I get the following error:
```
Traceback (most recent call last):
  File "/home/basil/stable-diffusion-webui/launch.py", line 355, in <module>
    prepare_environment()
  File "/home/basil/stable-diffusion-webui/launch.py", line 260, in prepare_environment
    run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'")
  File "/home/basil/stable-diffusion-webui/launch.py", line 121, in run_python
    return run(f'"{python}" -c "{code}"', desc, errdesc)
  File "/home/basil/stable-diffusion-webui/launch.py", line 97, in run
    raise RuntimeError(message)
RuntimeError: Error running command.
Command: "/home/basil/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'"
Error code: 1
stdout: <empty>
stderr: Traceback (most recent call last):
  File "<string>", line 1, in <module>
AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
```

If I pass `--skip-torch-cuda-test`, I then get the following error:
```
Error completing request
Arguments: ('task(ry0zf9rah332ozo)', 'caaa', '(low quality, worst quality:1.4),(bad_prompt:0.8), (monochrome:1.1), (greyscale)', [], 28, 16, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 768, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/home/basil/stable-diffusion-webui/modules/call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "/home/basil/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/home/basil/stable-diffusion-webui/modules/processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "/home/basil/stable-diffusion-webui/modules/processing.py", line 642, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/home/basil/stable-diffusion-webui/modules/processing.py", line 587, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/basil/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/basil/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/modules/sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/basil/stable-diffusion-webui/modules/sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/basil/stable-diffusion-webui/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 811, in forward
    return self.text_model(
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 721, in forward
    encoder_outputs = self.encoder(
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 650, in forward
    layer_outputs = encoder_layer(
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 378, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/basil/stable-diffusion-webui/venv/lib64/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
```

If I then pass `--skip-torch-cuda-test --precision full --no-half`, the UI begins using my CPU to generate images instead of using my GPU like it should.

With the Docker method, I get the same error about Torch not being able to use the GPU, add `--skip-torch-cuda`, etc.

### Steps to reproduce the problem

1. Follow the instructions [here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs).

### What should have happened?

The UI should have ran normally and utilized my GPU to generate images.

### Commit where the problem happens

https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/22bcc7be428c94e9408f589966c2040187245d81

### What platforms do you use to access the UI ?

Linux

### What browsers do you use to access the UI ?

Mozilla Firefox

### Command Line Arguments

```Shell
./webui.sh
./webui.sh --skip-torch-cuda-test
./webui.sh --skip-torch-cuda-test --precision full --no-half
```


### List of extensions

No

### Console logs

```Shell
./webui.sh 

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on basil user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.10 (main, Mar 01 2023, 21:10:14) [GCC]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing torch and torchvision
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/rocm5.2
Collecting torch
  Using cached torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
Collecting torchvision
  Using cached torchvision-0.15.1-cp310-cp310-manylinux1_x86_64.whl (6.0 MB)
Collecting nvidia-cudnn-cu11==8.5.0.96
  Using cached nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
Collecting nvidia-cufft-cu11==10.9.0.58
  Using cached nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
Collecting nvidia-cusolver-cu11==11.4.0.1
  Using cached nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
Collecting nvidia-nccl-cu11==2.14.3
  Using cached nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting nvidia-nvtx-cu11==11.7.91
  Using cached nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
Collecting networkx
  Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting nvidia-cuda-cupti-cu11==11.7.101
  Using cached nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
Collecting nvidia-curand-cu11==10.2.10.91
  Using cached nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
Collecting nvidia-cusparse-cu11==11.7.4.91
  Using cached nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
Collecting triton==2.0.0
  Using cached https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
Collecting filelock
  Using cached filelock-3.10.7-py3-none-any.whl (10 kB)
Collecting jinja2
  Using cached https://download.pytorch.org/whl/Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting nvidia-cublas-cu11==11.10.3.66
  Using cached nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
Collecting sympy
  Using cached https://download.pytorch.org/whl/sympy-1.11.1-py3-none-any.whl (6.5 MB)
Collecting nvidia-cuda-runtime-cu11==11.7.99
  Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
Collecting nvidia-cuda-nvrtc-cu11==11.7.99
  Using cached nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
Requirement already satisfied: wheel in ./venv/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.40.0)
Requirement already satisfied: setuptools in ./venv/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (65.5.0)
Collecting cmake
  Using cached cmake-3.26.1-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)
Collecting lit
  Using cached lit-16.0.0-py3-none-any.whl
Collecting requests
  Using cached requests-2.28.2-py3-none-any.whl (62 kB)
Collecting pillow!=8.3.*,>=5.3.0
  Using cached Pillow-9.5.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.4 MB)
Collecting numpy
  Using cached numpy-1.24.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Collecting MarkupSafe>=2.0
  Using cached https://download.pytorch.org/whl/MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)
Collecting idna<4,>=2.5
  Using cached https://download.pytorch.org/whl/idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting certifi>=2017.4.17
  Using cached https://download.pytorch.org/whl/certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting mpmath>=0.19
  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, nvidia-cusolver-cu11, nvidia-cudnn-cu11, jinja2, triton, torch, torchvision
Successfully installed MarkupSafe-2.1.2 certifi-2022.12.7 charset-normalizer-3.1.0 cmake-3.26.1 filelock-3.10.7 idna-3.4 jinja2-3.1.2 lit-16.0.0 mpmath-1.3.0 networkx-3.1 numpy-1.24.2 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 pillow-9.5.0 requests-2.28.2 sympy-1.11.1 torch-2.0.0 torchvision-0.15.1 triton-2.0.0 typing-extensions-4.5.0 urllib3-1.26.15
Traceback (most recent call last):
  File "/home/basil/stable-diffusion-webui/launch.py", line 355, in <module>
    prepare_environment()
  File "/home/basil/stable-diffusion-webui/launch.py", line 260, in prepare_environment
    run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'")
  File "/home/basil/stable-diffusion-webui/launch.py", line 121, in run_python
    return run(f'"{python}" -c "{code}"', desc, errdesc)
  File "/home/basil/stable-diffusion-webui/launch.py", line 97, in run
    raise RuntimeError(message)
RuntimeError: Error running command.
Command: "/home/basil/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'"
Error code: 1
stdout: <empty>
stderr: Traceback (most recent call last):
  File "<string>", line 1, in <module>
AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
```


### Additional information

When I was on Arch Linux (about a month ago), the "Automatic Installation" instructions worked fine; the UI would install pytorch+rocm normally and the UI would generate images normally using my GPU.

I am now on openSUSE Tumbleweed. My GPU is a 6700 XT.

Another thing worth noting: at the bottom of the UI, the PyTorch version is `torch: 2.0.0+cu117`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check #9402

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check #9402

Description

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions