[Performance 6/6] Add --precision half option to avoid casting during inference by huchenlei · Pull Request #15820 · AUTOMATIC1111/stable-diffusion-webui

huchenlei · 2024-05-17T00:12:51Z

Description

According to lllyasviel/stable-diffusion-webui-forge#716 (comment) , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.

This PR adds an option --precision half to disable autocasting and use all fp16 values during inference.

Screenshots/videos:

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

SLAPaper · 2024-05-17T03:48:42Z

will force-fp16 mode conflicting with fp8 unet?

AG-w · 2024-05-17T05:59:29Z

I'm not sure if this is related to using dynamic lora weight
but I got this error

      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "H:\AItest\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 522, in network_Conv2d_forward
        return originals.Conv2d_forward(self, input)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

wonder if it's related to this
#12205

feffy380 · 2024-05-17T07:59:19Z

Enabling --precision half breaks SD1.5 with the error mentioned here

feffy380 · 2024-05-17T11:47:46Z

Found the offending line. In ldm's openaimodel.py L795 in the UNetModel class we have:

        h = x.type(self.dtype)

while in sgm it is simply:

        # h = x.type(self.dtype)
        h = x

self.dtype is set by constructing the model with use_fp16. When enabling force_fp16, we need to make sure to set the model's dtype to fp16. The fact that it works with SDXL is purely an accident due to the missing cast.

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

AG-w · 2024-05-17T12:50:58Z

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?

def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True

this does fixed dtype mismatch error

huchenlei · 2024-05-17T17:34:56Z

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?
def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True
this does fixed dtype mismatch error

Thanks for digging out the solution! Verified that the solution works.

ThereforeGames · 2024-05-17T19:23:34Z

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

huchenlei · 2024-05-17T20:16:22Z

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

ThereforeGames · 2024-05-17T20:34:30Z

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

Sure, I tried a few:

anyloraCheckpoint_bakedvaeBlessedFp16.safetensors [ef49fbb25f]
v1-5-pruned.safetensors [1a189f0be6]
cyberrealisticPony_v20a.safetensors [41e77f7657]

Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:

All extensions disabled aside from built-ins.
Not using any LoRAs or extra networks.
Tried a bunch of different samplers and schedulers.
Using commandline args: --precision half --ckpt-dir "S:/stable_diffusion/checkpoints" --lora-dir "S:/stable_diffusion/lora"
Installed via your bundle PR

I'll write back if I figure out the cause.

Arvamer · 2024-05-18T08:47:34Z

I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting --no-half-vae. On my card, running VAE in fp16 always produces black images. So, the only way to get correct images with --precision half is to enable NaN checks and rely on A1111 automatic fallback to fp32 VAE decoding, which negates some of the performance gains from this PR.

w-e-w · 2024-07-02T04:13:27Z

another report of fp8 issue

[Bug]: Stable diffusion model failed to load when using --precision half in command line and enable FP8 #16122

FurkanGozukara · 2024-07-31T12:23:57Z

with using FP16 VAE i got almost double speed compared to no-half-vae nice

FP16 VAE is mandatory

Add --precision half cmd option

2a8a60c

huchenlei requested a review from AUTOMATIC1111 as a code owner May 17, 2024 00:12

huchenlei mentioned this pull request May 17, 2024

[DO NOT MERGE] All perf improvements bundle #15821

Closed

4 tasks

v0xie mentioned this pull request May 17, 2024

Fix: Identity matrix different dtype than output v0xie/sd-webui-incantations#42

Closed

drhead mentioned this pull request May 17, 2024

[Performance] LDM optimization patches #15824

Merged

4 tasks

huchenlei added 2 commits May 17, 2024 13:23

Fix SD15 dtype

dca9007

Proper fix of SD15 dtype

b57a70f

AG-w referenced this pull request in SLAPaper/sd-webui-negpip Jun 6, 2024

fix: work with fp8 storage

fc9ff14

AUTOMATIC1111 approved these changes Jun 8, 2024

View reviewed changes

AUTOMATIC1111 merged commit 33b73c4 into AUTOMATIC1111:dev Jun 8, 2024

This was referenced Jun 23, 2024

[Bug]: cannot load SD2 checkpoint after performance update in dev branch #16031

Open

Fix SD2 loading #16077

Closed

Fix SD2 loading #16078

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance 6/6] Add --precision half option to avoid casting during inference#15820

[Performance 6/6] Add --precision half option to avoid casting during inference#15820
AUTOMATIC1111 merged 3 commits intoAUTOMATIC1111:devfrom
huchenlei:force_half

huchenlei commented May 17, 2024

Uh oh!

SLAPaper commented May 17, 2024

Uh oh!

AG-w commented May 17, 2024 •

edited

Loading

Uh oh!

feffy380 commented May 17, 2024

Uh oh!

feffy380 commented May 17, 2024 •

edited

Loading

Uh oh!

AG-w commented May 17, 2024 •

edited

Loading

Uh oh!

huchenlei commented May 17, 2024

Uh oh!

ThereforeGames commented May 17, 2024

Uh oh!

huchenlei commented May 17, 2024

Uh oh!

ThereforeGames commented May 17, 2024 •

edited

Loading

Uh oh!

Arvamer commented May 18, 2024

Uh oh!

w-e-w commented Jul 2, 2024

Uh oh!

FurkanGozukara commented Jul 31, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

huchenlei commented May 17, 2024

Description

Screenshots/videos:

Checklist:

Uh oh!

SLAPaper commented May 17, 2024

Uh oh!

AG-w commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feffy380 commented May 17, 2024

Uh oh!

feffy380 commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AG-w commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huchenlei commented May 17, 2024

Uh oh!

ThereforeGames commented May 17, 2024

Uh oh!

huchenlei commented May 17, 2024

Uh oh!

ThereforeGames commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Arvamer commented May 18, 2024

Uh oh!

w-e-w commented Jul 2, 2024

Uh oh!

FurkanGozukara commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

AG-w commented May 17, 2024 •

edited

Loading

feffy380 commented May 17, 2024 •

edited

Loading

AG-w commented May 17, 2024 •

edited

Loading

ThereforeGames commented May 17, 2024 •

edited

Loading

FurkanGozukara commented Jul 31, 2024 •

edited

Loading